Skip to Content.
Sympa Menu

cado-nfs - Re: [Cado-nfs-discuss] Log: Aggregate statistics. Total CPU vs Intermediate CPU time data.

Subject: Discussion related to cado-nfs

List archive

Re: [Cado-nfs-discuss] Log: Aggregate statistics. Total CPU vs Intermediate CPU time data.


Chronological Thread 
  • From: eric.jeancolas@free.fr
  • To: Emmanuel Thomé <Emmanuel.Thome@inria.fr>
  • Cc: cado-nfs-discuss@lists.gforge.inria.fr
  • Subject: Re: [Cado-nfs-discuss] Log: Aggregate statistics. Total CPU vs Intermediate CPU time data.
  • Date: Tue, 22 Sep 2020 11:11:46 +0200 (CEST)
  • Authentication-results: mail3-smtp-sop.national.inria.fr; spf=None smtp.pra=eric.jeancolas@free.fr; spf=None smtp.mailfrom=eric.jeancolas@free.fr; spf=None smtp.helo=postmaster@smtp3-g21.free.fr
  • Ironport-phdr: 9a23:DT31YhSNRMeHPhw5Tt/v0vWCoNpsv+yvbD5Q0YIujvd0So/mwa6zYRGN2/xhgRfzUJnB7Loc0qyK6v+mADBLuM3J8ChbNsAVCVld0YRetjdjKfbNMVf8Iv/uYn5yN+V5f3ghwUuGN1NIEt31fVzYry76xzcTHhLiKVg9fbytScbdgMutyu+95YDYbRlWizqhe7NyKwi9oRnMusUMjoZuN6Q8xgHUrnZHdOha2G1lLk+Xkxrg+8u85pFu/zlRtv4768JMTaD2dLkkQLJFCzgrL3o779DxuxnZSguP6HocUmEInRdNHgPI8hL0UIrvvyXjruZy1zWUMsPwTbAvRDSt9LxrRwPyiCcGLDE27mfagdFtga1BoRKhoxt/w5PIYIyQKfFzcL/Rcc8cSGFcRctaSTBPDZ2gYIsOF+oBPPhXr4/hp1sVsBCyARCgCP7zxjNUg3P727Ax3eY8HgHcxAEuEdIAvmrXotr3KakdXvy6wqbJzTvEdP5WxS7955TUfh07vf2BX7R9etfRx0k1EAPFi02dppL/PzOUyusBqWqb4PR6WuKpl2EnrAVxrSK0xso3hInJnJwaxU7e+SV32oY4I8CzR0Fnb9C+CpRQqz2aOJVsQsMkW2xltiY3x7IGt5O7eCUG1IgqyhzfZvKJbYSF4BztWeaNLTl2mn9reKyyihSy/EW9y+DxSsa63llEoCdYnNfBsnMA2hrO4caJTft9+12u2TeJ1w3L8eFEPV07laneK5E82LIwkIQcsVjbEi/3nkX2lq6WeV8+9uim8eTqZKjtqJyEN4JslA3yL7oiltahDek6KAQDUHaX9f6y2bH74EH0QLFHgucrnqTarJzWP9oXq6C3DgNP3Isu5BCyBCq83tsCh3kINldFdQqHj4f3P1HOJ+j1Dey6g1SwiDdmyenKPrL/DpnQNHTDnq3hcqhj5EJGxgoz18hT55NSCr4fPPL+QlH9uMHGAhMjLQC5wfzrBdpj2o8EV2+DGqqZP7nTsV+M6OIvOe6MZIoNtTnjMfgl5uLhgWU7mVADZ6ap24cYaHe8Hvt8OEWUenTsgs0fHmgUpAY+QvbqhEWYUTFPf3ayQ7485jYjBY26F4jMW4etjKWH3Ce1BZ1af31JClGXEXj0aoWEQOwMZTmWIs9giTwESaOhR5Uu1RGgsw/60bVnIfTO9i0Zs5Ljztl16PfJmRE87zwnR/iahkyISmhyg24MDwM226X+6RhQz1KK17V1h7pkFdtW6ttIVB07PNjS1bopJcr1X1fMZNaCDlSrS9OtBjU4Vfo/2dIDJUhnSIbqtQzKwyf/W+xdrLeMHpFht/uEhyGgdfY48G7P0ewat3djR8JOMWO8gasupQbJCoOPnV/Lz//3J5RZ5zbE8SK49UTLvExcV1chA77AQWhEI0SM9JL3/E7PCbG0W+x+bllxjPWaI64PUeXHyE1cTaaxNs7TbiS/gTXoCA==
  • List-archive: <http://lists.gforge.inria.fr/pipermail/cado-nfs-discuss/>
  • List-id: A discussion list for Cado-NFS <cado-nfs-discuss.lists.gforge.inria.fr>

Hi

I made some further analyses on a log.
* when a full run is done without interruptions, the overall WTC (elapsed
time for entire factorization) is ok. But when I retart after an interrupt,
this counter starts from the restart. That's why I used the Total cpu counter
to have an idea of the cpu usage. In the log I studied, Total Usage / Total
WTC is 3,68. I've got a 4-core computer, so it's not too weird. But it is not
really exact. It's too much...
* in the log, the following is reported : Info:Lattice Sieving: Total time:
666693s. What is this time? From my log, the WTC is 334221,78, close to half
this value. When I looked at cpu usage with the top command, I saw two las
tasks with two threads. Well, two las processes at 198% CPU.
* when entries in the log are talking about WTC, the reported values are ok.
* I have an entry with Info:Linear Algebra: Total cpu/real time for bwc:
90816.1/23293.6. It seems to be the sum of Krylov + Lingen + Mksol. It is
true for WTC. So if there is a sum about total cpu, there should be total cpu
value for each of Krylov and Mksol, shouldn't it? If not, I don't understand
how the total is computed.
* when I see Jason's log: https://stdkmd.net/nrr/cont/7/77377.htm#N135_C183 ,
I see a ratio about total statistics 2.37178e+08/2.53203e+06 = 93.67. It
seems a lot, even for a AMD Ryzen Threadripper 2990WX 32-Core. That's why I
made this log analysis about time values...

I can PM you my log if you wish.

Best regards.

Eric

----- Mail original -----
De: "Emmanuel Thomé" <Emmanuel.Thome@inria.fr>
À: cado-nfs-discuss@lists.gforge.inria.fr
Envoyé: Jeudi 17 Septembre 2020 00:21:42
Objet: Re: [Cado-nfs-discuss] Log: Aggregate statistics. Total CPU vs
Intermediate CPU time data.

Hi,

Thanks for your report !

Our fault, for sure. Information is not very usable, here.

We report timings in two ways.

Timings that are printed as "Total cpu/real time for XXX" are counted by
cado-nfs.py itself, based on the cpu time of its child processes, and on
the clock difference.

In other cases (when we read "Aggregated timings for YYY"), we report what
the individual cado-nfs programs currently
report, based on regexps that we use to extract the timing info from the
data files (these regexps are in ./scripts/cadofactor/cadotask.py). This
may (as in the lattice sieving case) contain data that can be directly
interpreted towards a global cpu time count. In other cases (as in the
case of linear algebra), it's a bit quirky. But anyway for linear algebra
we do have the "cpu/real" time line anyway, so extra stats should only be
understood as "bonus".


There are several potential pitfalls:

- some programs report timing info differently from others.

- our regexps sometime fail because they don't follow the volution of
the program printout. I've just added a flag so that we can track such
divergence cases.

When you get messages such as:

> Warning:XXXXXXXXXXXX: some stats could not be displayed for YYYYYYYYYYY
> (see log file for debug info)

it means that the latter condition happened. Except that in your case,
there is no polynomial selection to work with, so it's pretty normal that
cado-nfs couldn't find any timing to report for polynomial selection.
I've fixed that as well.

You're also seeing an inconsistency in that the timing report for the
linear algebra stage is only giving you the WCT (wall-clock time) of the
krylov and mksol steps, not the aggregated CPU time. This was an
oversight, which I'm also fixing. However, this does count to the total
that the program prints, so it can't be the explanation of your problem.
And by the way, you've correctly omitted these counts from your sum.


> But the addition of intermediate results doesn't give the total.
> 4.72+102.51+785028+57.22+472.92+388.88+328.58+89.74+100411+82.81+680.74
> gives 887647.12, not the 1551370 see as a total.
>
> The total seems ok. So I may miss some intermediate results or I may
> misinterpret some data.


1551370 is ok ? Then yes, that's rather weird. How many threads did you
have per las program ? Would it be possible to share the log file so that
I can have a look ?

Best,

E.


On Sat, Sep 05, 2020 at 01:56:25PM +0200, eric.jeancolas@free.fr wrote:
> Hello
>
> For one number, a c160 with SNFS, I got the following end of sysout log.
>
> Info:Square Root: Factors:
> 626816785777451600758981838272270646850682251997773636331126303903
> 6472832583977168929436763320576275854833857768912955644720878207260226100095188285172984249397
> Warning:Polynomial Selection (size optimized): some stats could not be
> displayed for polyselect1 (see log file for debug info)
> Warning:Polynomial Selection (root optimized): some stats could not be
> displayed for polyselect2 (see log file for debug info)
> Info:Generate Factor Base: Total cpu/real time for makefb: 4.72/2.01359
> Info:Generate Free Relations: Total cpu/real time for freerel: 102.51/26.282
> Info:Lattice Sieving: Aggregate statistics:
> Info:Lattice Sieving: Total number of relations: 28042254
> Info:Lattice Sieving: Average J: 1894.87 for 3517286 special-q, max bucket
> fill -bkmult 1.0,1s:1.081640
> Info:Lattice Sieving: Total time: 785028s
> Info:Filtering - Duplicate Removal, splitting pass: Total cpu/real time for
> dup1: 57.22/119.639
> Info:Filtering - Duplicate Removal, splitting pass: Aggregate statistics:
> Info:Filtering - Duplicate Removal, splitting pass: CPU time for dup1:
> 118.9s
> Info:Filtering - Duplicate Removal, removal pass: Total cpu/real time for
> dup2: 472.92/438.735
> Info:Filtering - Duplicate Removal, removal pass: Aggregate statistics:
> Info:Filtering - Duplicate Removal, removal pass: CPU time for dup2:
> 378.59999999999997s
> Info:Filtering - Singleton removal: Total cpu/real time for purge:
> 388.88/400.862
> Info:Filtering - Merging: Merged matrix has 2351820 rows and total weight
> 404898763 (172.2 entries per row on average)
> Info:Filtering - Merging: Total cpu/real time for merge: 328.58/94.9078
> Info:Filtering - Merging: Total cpu/real time for replay: 89.74/77.1238
> Info:Linear Algebra: Total cpu/real time for bwc: 100411/25707.4
> Info:Linear Algebra: Aggregate statistics:
> Info:Linear Algebra: Krylov: WCT time 16545.96, iteration CPU time 0.21,
> COMM 0.01, cpu-wait 0.0, comm-wait 0.0 (73728 iterations)
> Info:Linear Algebra: Lingen CPU time 469.42, WCT time 136.78
> Info:Linear Algebra: Mksol: WCT time 8786.49, iteration CPU time 0.23, COMM
> 0.01, cpu-wait 0.0, comm-wait 0.0 (36864 iterations)
> Info:Quadratic Characters: Total cpu/real time for characters: 82.81/34.1738
> Info:Square Root: Total cpu/real time for sqrt: 680.74/214.047
> Info:HTTP server: Shutting down HTTP server
> Info:Complete Factorization / Discrete logarithm: Total cpu/elapsed time
> for entire factorization: 1.55137e+06/419762
> Info:root: Cleaning up computation data in /tmp/cado.ej7v99s4
>
> I use a 4-core CPU, so the ratio at the end Total cpu/elapsed time =
> 1.55137e+06/419762 = 3.7 is consistent.
> But the addition of intermediate results doesn't give the total.
> 4.72+102.51+785028+57.22+472.92+388.88+328.58+89.74+100411+82.81+680.74
> gives 887647.12, not the 1551370 see as a total.
>
> The total seems ok. So I may miss some intermediate results or I may
> misinterpret some data.
>
> Could you please help me understand these data?
>
> Best regards.
> _______________________________________________
> Cado-nfs-discuss mailing list
> Cado-nfs-discuss@lists.gforge.inria.fr
> https://lists.gforge.inria.fr/mailman/listinfo/cado-nfs-discuss
_______________________________________________
Cado-nfs-discuss mailing list
Cado-nfs-discuss@lists.gforge.inria.fr
https://lists.gforge.inria.fr/mailman/listinfo/cado-nfs-discuss




Archive powered by MHonArc 2.6.19+.

Top of Page