Skip to Content.
Sympa Menu

cado-nfs - Re: [cado-nfs] Square root fails with exit code -9

Subject: Discussion related to cado-nfs

List archive

Re: [cado-nfs] Square root fails with exit code -9


Chronological Thread 
  • From: Pierrick Gaudry <pierrick.gaudry@loria.fr>
  • To: Phil Pemberton <philpem@philpem.me.uk>
  • Cc: cado-nfs@inria.fr
  • Subject: Re: [cado-nfs] Square root fails with exit code -9
  • Date: Sun, 15 Aug 2021 20:19:13 +0200
  • Ironport-hdrordr: A9a23:cwBOu6+eIpNGFuHZ28huk+C/I+orL9Y04lQ7vn2ZLiY1TiX4ra6TdZsgpHzJYVoqKRMdcLO7V5VoP0m9yXcd2+B4Vt2ftWLd1FdAQrsO0bff

It could be that the 16 GB of RAM of your server are not enough for the
sqrt part. Could you try it again with just one thread with this command:

/home/philpem/cado-nfs/build/uncia/sqrt/sqrt -poly
/home/philpem/cado-nfs/workdir/c150.poly -prefix
/home/philpem/cado-nfs/workdir/c150.dep.gz -dep 0 -t 1 -side0 -side1 -gcd

... or copy the relevant files to another machine with more RAM and try
it there?

That's just a first guess, but it could be as simple as that. Otherwise,
we will probably need more information in order to investigate. But let's
investigate the memory problem before.

Regards,
Pierrick


On Sun, Aug 15, 2021 at 06:17:50PM +0100, Phil Pemberton wrote:
> I've just had a several-day 512-bit compute finish with what appears to be a
> fatal error.
>
> The compute setup was --
> Server - Xeon E3-1270 8-core 3.5GHz, 16GB RAM, 250GB SSD
> Clients -
> Xeon server running client threads - as above
> Intel i5-9400 6-core 2.9GHz, 16GB RAM, 2TB hard drive
> Intel i5-3570K 4-core 3.4GHz, 8GB RAM, 2TB hard drive
> AMD Ryzen 5 3600X 6-core, 16GB RAM, 500GB hard drive
>
>
> OS in use was a mix of Linux Mint 20 or Ubuntu 20.04 LTS.
>
>
> Here are the last lines of the error log:
>
>
> Info:Quadratic Characters: Starting
> Info:Quadratic Characters: Total cpu/real time for characters:
> 175.98/52.0178
> Info:Square Root: Starting
> Info:Square Root: Creating file of (a,b) values
> Warning:HTTP server: 127.0.0.1 code 410, message Distributed computation
> finished
> Warning:HTTP server: 127.0.0.1 code 410, message Distributed computation
> finished
> Warning:Command: Process with PID 1730797 finished with return code -9
> Error:Square Root: Program run on server failed with exit code -9
> Error:Square Root: Command line was:
> /home/philpem/cado-nfs/build/uncia/sqrt/sqrt -poly
> /home/philpem/cado-nfs/workdir/c150.poly -prefix
> /home/philpem/cado-nfs/workdir/c150.dep.gz -dep 0 -t 8 -side0 -side1 -gcd >
> /home/philpem/cado-nfs/workdir/c150.sqrt.stdout.4 2>
> /home/philpem/cado-nfs/workdir/c150.sqrt.stderr.4
> Error:Square Root: Stderr output (last 10 lines only) follow (stored in file
> /home/philpem/cado-nfs/workdir/c150.sqrt.stderr.4):
> Error:Square Root: Alg(5): fragment 1/64 of level 00 done by thread 0
> at wct=267.54s
> Error:Square Root: Alg(4): fragment 1/64 of level 00 done by thread 0
> at wct=267.66s
> Error:Square Root: Alg(2): fragment 25/64 of level 00 done by thread 3
> at wct=267.81s
> Error:Square Root: Alg(2): fragment 17/64 of level 00 done by thread 2
> at wct=268.02s
> Error:Square Root: Alg(2): fragment 1/64 of level 00 done by thread 0
> at wct=268.75s
> Error:Square Root: Alg(1): fragment 34/64 of level 00 done by thread 4
> at wct=299.10s
> Error:Square Root: Alg(1): fragment 18/64 of level 00 done by thread 2
> at wct=299.18s
> Error:Square Root: Alg(1): fragment 10/64 of level 00 done by thread 1
> at wct=300.55s
> Error:Square Root: Alg(1): fragment 26/64 of level 00 done by thread 3
> at wct=300.92s
> Error:Square Root:
> Traceback (most recent call last):
> File "./cado-nfs.py", line 202, in <module>
> factors = factorjob.run()
> File "/home/philpem/cado-nfs/./scripts/cadofactor/cadotask.py", line 6131,
> in run
> last_status = task.run()
> File "/home/philpem/cado-nfs/./scripts/cadofactor/cadotask.py", line 5045,
> in run
> raise Exception("Program failed")
> Exception: Program failed
>
>
> Server command line:
>
> ./cado-nfs.py NUMBER_REDACTED server.whitelist=0.0.0.0/0 --server --workdir
> $PWD/workdir
>
> Client command line:
>
> ./cado-nfs-client.py --basepath $PWD/wdir --server=https://SERVERNAME:36485
> --certsha1=CERTSHA_REDACTED --bindir $(eval `make show` ; echo $build_tree)
> --override -t 8
>
> The "-t" parameter was set to the number of physical cores on the system.
>
>
> Can anyone explain to me --
>
> What has gone wrong here? Have I hit a bug?
>
> Is the failed factorisation recoverable?
>
> What can I do to avoid this in future?
>
>
> CADO-NFS version is Git commit c5b20eac12ea225a325d582923ef058832cda28e,
> same on all machines.
>
> Thanks.
> --
> Phil.
> philpem@philpem.me.uk
> https://www.philpem.me.uk/



Archive powered by MHonArc 2.6.19+.

Top of Page