Subject: Discussion related to cado-nfs
List archive
- From: Pierrick Gaudry <pierrick.gaudry@loria.fr>
- To: Paul Leyland <paul.leyland@gmail.com>
- Cc: Paul Zimmermann <Paul.Zimmermann@inria.fr>, cado-nfs@inria.fr
- Subject: Re: [cado-nfs] Memory usage at server restart.
- Date: Mon, 20 Mar 2023 08:53:49 +0100
- Authentication-results: mail2-relais-roc.national.inria.fr; dkim=none (message not signed) header.i=none
Dear Paul,
I must say that I am not really enthusiastic about modifying the logic
behind the server / client interactions.
What I would do is write a small "watchdog" shell script that would
regularly check what's going on, and, depending on your needs, kill -STOP
the las jobs, or maybe kill all the cado-nfs-client.py (this should
propagate the signal to the las jobs, I think).
Another point worth to be mentioned in this thread is that nowadays, las
with the '-t auto' option is much better than before in memory handling,
even with many threads. It also take a new option to limit its memory
usage. Not sure this will save the situation for you, but maybe...
Regards,
Pierrick
On Sat, Mar 18, 2023 at 06:15:02PM +0000, Paul Leyland wrote:
> Thanks Paul.
>
> That is exactly what I do right now (-STOP that is) but sometimes I do not
> notice that the server has died until long afterwards, which wastes a lot of
> time everywhere, on server and the clients. Hence my question.
>
> After the OOM everything, sievers and server, is killed permanently. After
> restarting the server I have only a small window of time before the las
> processes grab and use so much RAM that they overflow the 2GB swap (standard
> on a Ubuntu system) and so clog up the system RAM. As I said, it is
> possible to work around this with rapid interactive typing but it would be
> nicer, IMO, if it could be fully automatic.
>
>
> Paul (the other one)
>
>
> On 18/03/2023 18:08, Paul Zimmermann wrote:
> > Hi Paul,
> >
> > > A system here has relatively little RAM. It can run six copies of las
> > > with ease. With no sievers running it can equally easily run duplicate
> > > and singleton removal. What it can not do is run both simultaneously.
> > >
> > > Is it already possible to kill the local sievers (with -KILL perhaps, so
> > > one can guarantee to get their attention) when it is time to evaluate
> > > progress made, and to restart them afterwards if needed? At the moment I
> > > have to do this by hand, usually after the server dies with a OOM error.
> > you can try kill -STOP on the sievers when the duplicate and singleton
> > removal starts (and kill -CONT afterwards).
> >
> > Otherwise with kill -KILL the sievers will die, and after some timeout
> > the server will restart them.
> >
> > Paul Z.
- [cado-nfs] Memory usage at server restart., Paul Leyland, 03/18/2023
- Re: [cado-nfs] Memory usage at server restart., Paul Zimmermann, 03/18/2023
- Re: [cado-nfs] Memory usage at server restart., Paul Leyland, 03/18/2023
- Re: [cado-nfs] Memory usage at server restart., Pierrick Gaudry, 03/20/2023
- Re: [cado-nfs] Memory usage at server restart., Paul Leyland, 03/18/2023
- Re: [cado-nfs] Memory usage at server restart., Paul Zimmermann, 03/18/2023
Archive powered by MHonArc 2.6.19+.