Skip to Content.
Sympa Menu

cado-nfs - Re: [cado-nfs] Memory usage at server restart.

Subject: Discussion related to cado-nfs

List archive

Re: [cado-nfs] Memory usage at server restart.


Chronological Thread 
  • From: Pierrick Gaudry <pierrick.gaudry@loria.fr>
  • To: Paul Leyland <paul.leyland@gmail.com>
  • Cc: Paul Zimmermann <Paul.Zimmermann@inria.fr>, cado-nfs@inria.fr
  • Subject: Re: [cado-nfs] Memory usage at server restart.
  • Date: Mon, 20 Mar 2023 08:53:49 +0100
  • Authentication-results: mail2-relais-roc.national.inria.fr; dkim=none (message not signed) header.i=none

Dear Paul,

I must say that I am not really enthusiastic about modifying the logic
behind the server / client interactions.

What I would do is write a small "watchdog" shell script that would
regularly check what's going on, and, depending on your needs, kill -STOP
the las jobs, or maybe kill all the cado-nfs-client.py (this should
propagate the signal to the las jobs, I think).

Another point worth to be mentioned in this thread is that nowadays, las
with the '-t auto' option is much better than before in memory handling,
even with many threads. It also take a new option to limit its memory
usage. Not sure this will save the situation for you, but maybe...

Regards,
Pierrick


On Sat, Mar 18, 2023 at 06:15:02PM +0000, Paul Leyland wrote:
> Thanks Paul.
>
> That is exactly what I do right now (-STOP that is) but sometimes I do not
> notice that the server has died until long afterwards, which wastes a lot of
> time everywhere, on server and the clients. Hence my question.
>
> After the OOM everything, sievers and server, is killed permanently. After
> restarting the server I have only a small window of time before the las
> processes grab and use so much RAM that they overflow the 2GB swap (standard
> on a Ubuntu system) and so clog up the system RAM.  As I said, it is
> possible to work around this with rapid interactive typing but it would be
> nicer, IMO, if it could be fully automatic.
>
>
> Paul (the other one)
>
>
> On 18/03/2023 18:08, Paul Zimmermann wrote:
> > Hi Paul,
> >
> > > A system here has relatively little RAM. It can run six copies of las
> > > with ease. With no sievers running it can equally easily run duplicate
> > > and singleton removal. What it can not do is run both simultaneously.
> > >
> > > Is it already possible to kill the local sievers (with -KILL perhaps, so
> > > one can guarantee to get their attention) when it is time to evaluate
> > > progress made, and to restart them afterwards if needed? At the moment I
> > > have to do this by hand, usually after the server dies with a OOM error.
> > you can try kill -STOP on the sievers when the duplicate and singleton
> > removal starts (and kill -CONT afterwards).
> >
> > Otherwise with kill -KILL the sievers will die, and after some timeout
> > the server will restart them.
> >
> > Paul Z.



Archive powered by MHonArc 2.6.19+.

Top of Page