Subject: Discussion related to cado-nfs
List archive
- From: Phil Pemberton <philpem@philpem.me.uk>
- To: cado-nfs@inria.fr
- Subject: [cado-nfs] Square root fails with exit code -9
- Date: Sun, 15 Aug 2021 18:17:50 +0100
- Authentication-results: mail3-smtp-sop.national.inria.fr; spf=None smtp.pra=philpem@philpem.me.uk; spf=Pass smtp.mailfrom=philpem@philpem.me.uk; spf=None smtp.helo=postmaster@nick.sneptech.io
- Ironport-hdrordr: A9a23:b/baU6lUrdXSeyUIn2YTZCtJtU7pDfOlimdD5ihNYBxZY6Wkfp+V8cjzhCWftN9OYhodcIi7Sc+9qADnhOdICOgqTP+ftWzd1FdAQ7sSibcKrweAJ8SczJ8X6U4DSdkYNDSYNzET4qjHCWKDYrUdKay8gcWVbJDlvhVQpG9RC51I3kNcMEK2A0d2TA5JCd4SD5yH/PdKoDKmZDA+ctm7LmNtZZmPm/T70LbdJTIWDR8u7weDyRmy7qThLhSe1hACFxtS3LYZ93TfmQCR3NTujxj78G6S64bg1eUWpDLT8KoCOCVKsLlXFtzYsHfnWG2mYczBgNl6mpDr1L9gqqi1n/5pBbUJ15qWRBD+nfKl4Xib7B8+r3Dl0lOWmn3lvIjwQy87EdNIgcZDfgLe8FdIhqAK7Etn5RPti3NsN2K1oM093am4azh60k6v5XYym+8aiHJSFYMYdb9KtIQauEdYCo0JEi724J0uVLAGNrCU2N9GNVeBK3zJtGhmx9KhGnw1AxedW0AH/siYySJfknx1x1YRgMYfgnAD/pQgTIQs3ZWzDo140LVVCsMGZ6N0A+kMBcOxF2zWWBrJdHmfJFz2fZt3SU4la6SHk4ndwdvaBqDg4KFC5KgpYWkoxVLaIXiedPFm9Kc7jSzwfA==
- Ironport-phdr: A9a23:4hmYyBxkBfoNH53XCzIgylBlVkEcU1XcAAcZ59Idhq5Udez7ptK+ZhSZtaQm1gKBdL6YwsoMs/DRvaHkVD5Iyre6m1dGTqZxUQQYg94dhQ0qDZ3NI0T6KPn3c35yR5waBxdq8H6hLEdaBtv1aUHMrX2u9z4SHQj0ORZoKujvFYPekdi72/qx9pDXbAlEmjqwaq5uIRurqgncqtMYipZ4JKYrzRvJrHpIe+BIym5tOFmegRXy6Nqu8ZB66yhftO4v+MBGUaXhYqQ3VqdYAyg8M2A0/8Lkqx/ORhaS63QGU2UWlh1IAxXZ7Bz/Q5z8vDf2uvZ71SKHO8D9ULI6Vim476pzVhHmiDoJOT03/m7ZhcN/kK1VrQm9pxF82YPYfJ2ZOfR8c67bYNgURXBBXsFUVyFZGI28b4oPD+4cNuhCsYb9okABogWkBQmwA+PvzCJDi3ju3a060uQhFRnG0xIlH98VqHTUrNT1NKMKUe+ryanE1zDDYO1M1Tfg64jFaxYsquyDUrxsa8Te01UvFx/bgVWKr4zoJzOb2OoRvmab8udtSO2ihmA5pgxvoDWj2NohhIfHiI4JyF3K+zt0zYc6KNClVEN2Yt6pHYdQuSyHM4Z7QN4vTn9utS0nxLMGvpu7czILyJQh3xPfc/yHc4mM4hL7SumRJC10hHd7d76lmhay8k6twfD/WMmsyFtGsyhIn9rWun0MyRDf8MaKR/hn8kqj2juDzwPe5vxeLUwplqfWKYQtzqMxm5cdq0jPAyD7lUbwgaSLbEsr4PKo5P7iYrj+pp+TKYt0igbmP6Qom8ywHec4PhIVX2id5+u8zKHj8lPlT7VKlPE2k67ZvIjbJcQduKG5HxdY3psh5hu8FTur0coUkWMJIV9GYh6KjYvkN0nLIP/iDPe/h1qskC1sx/DDJrDhGpXNLnnHkLf5Y7l97lVRyBIzzdBe45JUEq8OIPfpVk/0qtPUFAI5Mw+sz+b9FNp9zp8eWX6IAqKBLKzdq0KH5uU2L+mKao8Voy3wK+Ml5v7rlX82g0URfaiv3ZsNaXC3BO5qI0uDYSmkvtBUGm4GuU84QvfCiVuYUDcVaWzhcbg742QDCIOiRb/OQ4qgmqCG2m/vBp1XYnhLD3iUEXbsb4iBHfIBLj+RdJwy2gcYXKSsHtdynSqlsxX3nuIPxgv8/CwTttT40tVt++DYlVcz8m4sZyxy+3mNTmVsk2ZOQjJwwaMt+CSVL3+b0al8m/FdU9leoe5KAF9SCA==
I've just had a several-day 512-bit compute finish with what appears to be a fatal error.
The compute setup was --
Server - Xeon E3-1270 8-core 3.5GHz, 16GB RAM, 250GB SSD
Clients -
Xeon server running client threads - as above
Intel i5-9400 6-core 2.9GHz, 16GB RAM, 2TB hard drive
Intel i5-3570K 4-core 3.4GHz, 8GB RAM, 2TB hard drive
AMD Ryzen 5 3600X 6-core, 16GB RAM, 500GB hard drive
OS in use was a mix of Linux Mint 20 or Ubuntu 20.04 LTS.
Here are the last lines of the error log:
Info:Quadratic Characters: Starting
Info:Quadratic Characters: Total cpu/real time for characters: 175.98/52.0178
Info:Square Root: Starting
Info:Square Root: Creating file of (a,b) values
Warning:HTTP server: 127.0.0.1 code 410, message Distributed computation finished
Warning:HTTP server: 127.0.0.1 code 410, message Distributed computation finished
Warning:Command: Process with PID 1730797 finished with return code -9
Error:Square Root: Program run on server failed with exit code -9
Error:Square Root: Command line was: /home/philpem/cado-nfs/build/uncia/sqrt/sqrt -poly /home/philpem/cado-nfs/workdir/c150.poly -prefix /home/philpem/cado-nfs/workdir/c150.dep.gz -dep 0 -t 8 -side0 -side1 -gcd > /home/philpem/cado-nfs/workdir/c150.sqrt.stdout.4 2> /home/philpem/cado-nfs/workdir/c150.sqrt.stderr.4
Error:Square Root: Stderr output (last 10 lines only) follow (stored in file /home/philpem/cado-nfs/workdir/c150.sqrt.stderr.4):
Error:Square Root: Alg(5): fragment 1/64 of level 00 done by thread 0 at wct=267.54s
Error:Square Root: Alg(4): fragment 1/64 of level 00 done by thread 0 at wct=267.66s
Error:Square Root: Alg(2): fragment 25/64 of level 00 done by thread 3 at wct=267.81s
Error:Square Root: Alg(2): fragment 17/64 of level 00 done by thread 2 at wct=268.02s
Error:Square Root: Alg(2): fragment 1/64 of level 00 done by thread 0 at wct=268.75s
Error:Square Root: Alg(1): fragment 34/64 of level 00 done by thread 4 at wct=299.10s
Error:Square Root: Alg(1): fragment 18/64 of level 00 done by thread 2 at wct=299.18s
Error:Square Root: Alg(1): fragment 10/64 of level 00 done by thread 1 at wct=300.55s
Error:Square Root: Alg(1): fragment 26/64 of level 00 done by thread 3 at wct=300.92s
Error:Square Root:
Traceback (most recent call last):
File "./cado-nfs.py", line 202, in <module>
factors = factorjob.run()
File "/home/philpem/cado-nfs/./scripts/cadofactor/cadotask.py", line 6131, in run
last_status = task.run()
File "/home/philpem/cado-nfs/./scripts/cadofactor/cadotask.py", line 5045, in run
raise Exception("Program failed")
Exception: Program failed
Server command line:
./cado-nfs.py NUMBER_REDACTED server.whitelist=0.0.0.0/0 --server --workdir $PWD/workdir
Client command line:
./cado-nfs-client.py --basepath $PWD/wdir --server=https://SERVERNAME:36485 --certsha1=CERTSHA_REDACTED --bindir $(eval `make show` ; echo $build_tree) --override -t 8
The "-t" parameter was set to the number of physical cores on the system.
Can anyone explain to me --
What has gone wrong here? Have I hit a bug?
Is the failed factorisation recoverable?
What can I do to avoid this in future?
CADO-NFS version is Git commit c5b20eac12ea225a325d582923ef058832cda28e, same on all machines.
Thanks.
--
Phil.
philpem@philpem.me.uk
https://www.philpem.me.uk/
- [cado-nfs] Square root fails with exit code -9, Phil Pemberton, 08/15/2021
- Re: [cado-nfs] Square root fails with exit code -9, Pierrick Gaudry, 08/15/2021
- Re: [cado-nfs] Square root fails with exit code -9, Phil Pemberton, 08/16/2021
- Re: [cado-nfs] Square root fails with exit code -9, Pierrick Gaudry, 08/15/2021
Archive powered by MHonArc 2.6.19+.