Skip to Content.
Sympa Menu

cado-nfs - Re: [cado-nfs] Unexpected Memory Use Change

Subject: Discussion related to cado-nfs

List archive

Re: [cado-nfs] Unexpected Memory Use Change


Chronological Thread 
  • From: Emmanuel Thomé <Emmanuel.Thome@inria.fr>
  • To: Ed Hall <ed_ka2fwj@yahoo.com>
  • Cc: "cado-nfs@inria.fr" <cado-nfs@inria.fr>
  • Subject: Re: [cado-nfs] Unexpected Memory Use Change
  • Date: Tue, 6 Apr 2021 17:47:33 +0200
  • Ironport-hdrordr: A9a23:yKgasqmaYeuyhURjIT+sBCvlzKXpDfKl3DAbvn1ZSRFFG/GwvcrGppgm/DXzjyscX2xltNCbIa+bQW7d85kd2/hyAZ6JWg76tGy0aLx45Yz5zDH6XwH4/OhR1aBvGpIObeHYJ158kMr8/U2EA88tqeP3k5yAqO/Cwx5WJj1CRLpn625CYDqzMkozfwVeAIp8KZz03LsimxOFWVA6Kvu2HWMEWe+rnb32vbbrewQPCRJiyCTmt12VwYX3GRSZwRsSOgknqYsKymTLnxf04a+uqZiAqyP07XPZ7JhdhbLapedrOcrksKUoFgk=

Thanks for your updates. I was able to reproduce your issue.

> I surmise that in the instance where memory runs out, you have many
> "adjustments" of the bucket memory sizes, which end up killing you.

This is indeed a correct guess as to what's going on.

Your issue specifically is tracked at:

https://gitlab.inria.fr/cado-nfs/cado-nfs/-/issues/30014

The proper fix is definitely part of what I've done in the
branch that is tied to this merge request.

https://gitlab.inria.fr/cado-nfs/cado-nfs/-/merge_requests/29

However it's still work-in-progress, and I'm not totally confident in
merging this right now. (furthermore, as Paul noticed, -v brings tons of
spurious warnings with this version, this should be fixed as well)

E.



On Tue, Apr 06, 2021 at 02:07:36PM +0200, Emmanuel Thomé wrote:
> Hi,
>
> Thanks for your report.
>
> Yes, there's something fishy going on.
>
> This might be caused by the fix of #30012
> (https://gitlab.inria.fr/cado-nfs/cado-nfs/-/issues/30012)
>
> Could you report on what happens in each of the following situations:
> - commit dc9309189 (before the fix of #30012)
> - current HEAD, but with --adjust-strategy 0
> - commit fef11f7aa (WIP branch that adresses a shortcoming that #30012
> was about, in a sense. To use this profitably, you might have to add
> arguments -lambda0 2.05 -lambda1 -3.05)
>
>
> Also, if you agree to share your polynomial, and/or the output files of
> the las program, that could be useful. I surmise that in the instance
> where memory runs out, you have many "adjustments" of the bucket memory
> sizes, which end up killing you.
>
> E.
>
> On Tue, Apr 06, 2021 at 11:23:00AM +0000, Ed Hall wrote:
> > Dear Team Members,
> >
> > I'm having a memory issue with the latest commit.  I have nearly my whole
> > farm running an earlier commit that uses memory based on client
> > processes, i.e. one process uses ~4GB for the current job regardless of
> > the number of threads.  With the latest commit, the memory use appears to
> > be related to the number of threads.
> >
> > I haven't tried intermediate commits, but here are some samples of the
> > earlier one I was using and then the latest I tried.
> >
> > Expected behavior from before:
> > =============================================================
> > commit 8ab2eea7202ffaf62825057c50f57359be244c29 (HEAD)
> > Merge: cfc03e7e4 7c10b68f0
> > Author: Emmanuel Thom� <emmanuel.thome@inria.fr>
> > Date:   Wed Oct 21 10:29:52 2020 +0200
> >
> > -------client using one process with 2 threads-------
> > ./cado-nfs-client.py --clientid=test --single --bindir=build/$HOSTNAME/
> > --server=http://math99.local:13573
> >
> > INFO:root:Running build/math89/sieve/las -poly download/hcn10+3_930.poly
> > -q0 533124000 -I 15 -q1 533126000 -lim0 536000000 -lim1 536000000 -lpb0
> > 32 -lpb1 32 -mfb0 64 -mfb1 94 -ncurves0 20 -ncurves1 20 -fb1
> > download/hcn10+3_930.roots1.gz -out
> > test.work/hcn10+3_930.533124000-533126000.gz -t 2 -sqside 1
> > -adjust-strategy 2 -stats-stderr
> >
> >     PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+
> > COMMAND  
> >   15157 math89    20   0 4148676   3.7g   7292 S 200.3  24.0  16:09.45
> > las
> > ----------------------------------------------------------------
> > -------client using one process with 6 threads-------
> > ./cado-nfs-client.py --clientid=test --single --bindir=build/$HOSTNAME/
> > --override t 6 --server=http://math99.local:13573
> >
> > INFO:root:Running build/math89/sieve/las -poly download/hcn10+3_930.poly
> > -q0 533366000 -I 15 -q1 533368000 -lim0 536000000 -lim1 536000000 -lpb0
> > 32 -lpb1 32 -mfb0 64 -mfb1 94 -ncurves0 20 -ncurves1 20 -fb1
> > download/hcn10+3_930.roots1.gz -out
> > test.work/hcn10+3_930.533366000-533368000.gz -t 6 -sqside 1
> > -adjust-strategy 2 -stats-stderr
> >
> >     PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+
> > COMMAND  
> >   15191 math89    20   0 4123972   3.6g   7200 S 551.0  23.3  16:04.17
> > las   
> >  
> > =============================================================
> > Behavior now:
> > =============================================================
> > commit 1d08d4325615139dbe007b43b31e734c4b7dec8d (HEAD -> master,
> > origin/master, origin/HEAD)
> > Merge: 60f1fe069 0b8ff834d
> > Author: Emmanuel Thom� <emmanuel.thome@inria.fr>
> > Date:   Sun Apr 4 19:55:46 2021 +0000
> >
> >     Merge branch 'lean-ci' into 'master'
> >     
> >     simplify ci structure, get rid of intermediary containers.
> >     
> >     See merge request cado-nfs/cado-nfs!30
> >
> > -------client using one process with 2 threads-------
> > ./cado-nfs-client.py --clientid=test --single --bindir=build/$HOSTNAME/
> > --server=http://math99.local:13573
> >
> > INFO:root:Running build/math89/sieve/las -poly download/hcn10+3_930.poly
> > -q0 534054000 -I 15 -q1 534056000 -lim0 536000000 -lim1 536000000 -lpb0
> > 32 -lpb1 32 -mfb0 64 -mfb1 94 -ncurves0 20 -ncurves1 20 -fb1
> > download/hcn10+3_930.roots1.gz -out
> > test.work/hcn10+3_930.534054000-534056000.gz -t 2 -sqside 1
> > -adjust-strategy 2 -stats-stderr
> >
> >     PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+
> > COMMAND  
> >   38473 math89    20   0 7884416   7.3g   7432 S 200.3  47.1  16:06.76
> > las  
> >  
> > ----------------------------------------------------------------
> > -------client using one process with 6 threads-------
> > ./cado-nfs-client.py --clientid=test --single --bindir=build/$HOSTNAME/
> > --override t 6 --server=http://math99.local:13573
> >
> > INFO:root:Running build/math89/sieve/las -poly download/hcn10+3_930.poly
> > -q0 534556000 -I 15 -q1 534558000 -lim0 536000000 -lim1 536000000 -lpb0
> > 32 -lpb1 32 -mfb0 64 -mfb1 94 -ncurves0 20 -ncurves1 20 -fb1
> > download/hcn10+3_930.roots1.gz -out
> > test.work/hcn10+3_930.534556000-534558000.gz -t 6 -sqside 1
> > -adjust-strategy 2 -stats-stderr
> >
> >     PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+
> > COMMAND  
> >   38668 math89    20   0   12.8g  12.5g   7308 S 587.4  80.1  16:05.73
> > las
> >
> > ==========================================================================
> >
> > If I use the full 12 cores without the additional 12 threads, it brings
> > the machine to a stand-still and I have to ssh a "pkill las" and wait a
> > few minutes for it to be honored.
> >
> > Thanks for all your work.
> >
> > Sincerely,
> > Edwin Hall



Archive powered by MHonArc 2.6.19+.

Top of Page