Skip to Content.
Sympa Menu

cado-nfs - Re: [cado-nfs] Unexpected Memory Use Change

Subject: Discussion related to cado-nfs

List archive

Re: [cado-nfs] Unexpected Memory Use Change


Chronological Thread 
  • From: Emmanuel Thomé <Emmanuel.Thome@inria.fr>
  • To: Ed Hall <ed_ka2fwj@yahoo.com>
  • Cc: "cado-nfs@inria.fr" <cado-nfs@inria.fr>
  • Subject: Re: [cado-nfs] Unexpected Memory Use Change
  • Date: Tue, 6 Apr 2021 14:07:34 +0200
  • Ironport-hdrordr: A9a23:1SY63KwewXwdZgk7KwC6KrPwzL1zdoIgy1knxilNYDZSddGVkN3roeQD2XbP6Qo5dXk8lbm7U5Wobmjb8fdOi7U5GZeHcE3YtHCzLIdkhLGN/xTFFzfl/uBQkYdMGpITNPTKAVJ3jdn37WCDer4d6eOa+6Olj/q29RhQZDxqcK1p4kNYDQuWAyRNNWt7LKc5D5aV6457oSOhcx0sH6GGL0QCNtKvm/T70LbvYRsLHHccmWqzpALtzqX7HRie1gofVD0K4Y5Kywj4rzA=

Hi,

Thanks for your report.

Yes, there's something fishy going on.

This might be caused by the fix of #30012
(https://gitlab.inria.fr/cado-nfs/cado-nfs/-/issues/30012)

Could you report on what happens in each of the following situations:
- commit dc9309189 (before the fix of #30012)
- current HEAD, but with --adjust-strategy 0
- commit fef11f7aa (WIP branch that adresses a shortcoming that #30012
was about, in a sense. To use this profitably, you might have to add
arguments -lambda0 2.05 -lambda1 -3.05)


Also, if you agree to share your polynomial, and/or the output files of
the las program, that could be useful. I surmise that in the instance
where memory runs out, you have many "adjustments" of the bucket memory
sizes, which end up killing you.

E.

On Tue, Apr 06, 2021 at 11:23:00AM +0000, Ed Hall wrote:
> Dear Team Members,
>
> I'm having a memory issue with the latest commit.  I have nearly my whole
> farm running an earlier commit that uses memory based on client processes,
> i.e. one process uses ~4GB for the current job regardless of the number of
> threads.  With the latest commit, the memory use appears to be related to
> the number of threads.
>
> I haven't tried intermediate commits, but here are some samples of the
> earlier one I was using and then the latest I tried.
>
> Expected behavior from before:
> =============================================================
> commit 8ab2eea7202ffaf62825057c50f57359be244c29 (HEAD)
> Merge: cfc03e7e4 7c10b68f0
> Author: Emmanuel Thom� <emmanuel.thome@inria.fr>
> Date:   Wed Oct 21 10:29:52 2020 +0200
>
> -------client using one process with 2 threads-------
> ./cado-nfs-client.py --clientid=test --single --bindir=build/$HOSTNAME/
> --server=http://math99.local:13573
>
> INFO:root:Running build/math89/sieve/las -poly download/hcn10+3_930.poly
> -q0 533124000 -I 15 -q1 533126000 -lim0 536000000 -lim1 536000000 -lpb0 32
> -lpb1 32 -mfb0 64 -mfb1 94 -ncurves0 20 -ncurves1 20 -fb1
> download/hcn10+3_930.roots1.gz -out
> test.work/hcn10+3_930.533124000-533126000.gz -t 2 -sqside 1
> -adjust-strategy 2 -stats-stderr
>
>     PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+
> COMMAND  
>   15157 math89    20   0 4148676   3.7g   7292 S 200.3  24.0  16:09.45 las
> ----------------------------------------------------------------
> -------client using one process with 6 threads-------
> ./cado-nfs-client.py --clientid=test --single --bindir=build/$HOSTNAME/
> --override t 6 --server=http://math99.local:13573
>
> INFO:root:Running build/math89/sieve/las -poly download/hcn10+3_930.poly
> -q0 533366000 -I 15 -q1 533368000 -lim0 536000000 -lim1 536000000 -lpb0 32
> -lpb1 32 -mfb0 64 -mfb1 94 -ncurves0 20 -ncurves1 20 -fb1
> download/hcn10+3_930.roots1.gz -out
> test.work/hcn10+3_930.533366000-533368000.gz -t 6 -sqside 1
> -adjust-strategy 2 -stats-stderr
>
>     PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+
> COMMAND  
>   15191 math89    20   0 4123972   3.6g   7200 S 551.0  23.3  16:04.17 las 
>  
>  
> =============================================================
> Behavior now:
> =============================================================
> commit 1d08d4325615139dbe007b43b31e734c4b7dec8d (HEAD -> master,
> origin/master, origin/HEAD)
> Merge: 60f1fe069 0b8ff834d
> Author: Emmanuel Thom� <emmanuel.thome@inria.fr>
> Date:   Sun Apr 4 19:55:46 2021 +0000
>
>     Merge branch 'lean-ci' into 'master'
>     
>     simplify ci structure, get rid of intermediary containers.
>     
>     See merge request cado-nfs/cado-nfs!30
>
> -------client using one process with 2 threads-------
> ./cado-nfs-client.py --clientid=test --single --bindir=build/$HOSTNAME/
> --server=http://math99.local:13573
>
> INFO:root:Running build/math89/sieve/las -poly download/hcn10+3_930.poly
> -q0 534054000 -I 15 -q1 534056000 -lim0 536000000 -lim1 536000000 -lpb0 32
> -lpb1 32 -mfb0 64 -mfb1 94 -ncurves0 20 -ncurves1 20 -fb1
> download/hcn10+3_930.roots1.gz -out
> test.work/hcn10+3_930.534054000-534056000.gz -t 2 -sqside 1
> -adjust-strategy 2 -stats-stderr
>
>     PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+
> COMMAND  
>   38473 math89    20   0 7884416   7.3g   7432 S 200.3  47.1  16:06.76 las  
>  
> ----------------------------------------------------------------
> -------client using one process with 6 threads-------
> ./cado-nfs-client.py --clientid=test --single --bindir=build/$HOSTNAME/
> --override t 6 --server=http://math99.local:13573
>
> INFO:root:Running build/math89/sieve/las -poly download/hcn10+3_930.poly
> -q0 534556000 -I 15 -q1 534558000 -lim0 536000000 -lim1 536000000 -lpb0 32
> -lpb1 32 -mfb0 64 -mfb1 94 -ncurves0 20 -ncurves1 20 -fb1
> download/hcn10+3_930.roots1.gz -out
> test.work/hcn10+3_930.534556000-534558000.gz -t 6 -sqside 1
> -adjust-strategy 2 -stats-stderr
>
>     PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+
> COMMAND  
>   38668 math89    20   0   12.8g  12.5g   7308 S 587.4  80.1  16:05.73 las
>
> ==========================================================================
>
> If I use the full 12 cores without the additional 12 threads, it brings the
> machine to a stand-still and I have to ssh a "pkill las" and wait a few
> minutes for it to be honored.
>
> Thanks for all your work.
>
> Sincerely,
> Edwin Hall



Archive powered by MHonArc 2.6.19+.

Top of Page