Accéder au contenu.
Menu Sympa

starpu-devel - Re: [Starpu-devel] StarPU will cope by trying to purge

Objet : Developers list for StarPU

Archives de la liste

Re: [Starpu-devel] StarPU will cope by trying to purge


Chronologique Discussions 
  • From: Kadir Akbudak <kadir.akbudak@kaust.edu.sa>
  • To: starpu-devel@lists.gforge.inria.fr, Hatem Ltaief <hatem.ltaief@kaust.edu.sa>
  • Subject: Re: [Starpu-devel] StarPU will cope by trying to purge
  • Date: Thu, 22 Jun 2017 10:12:36 +0300
  • Authentication-results: mail2-smtp-roc.national.inria.fr; spf=None smtp.pra=kadir.akbudak@kaust.edu.sa; spf=Pass smtp.mailfrom=kadir.akbudak@kaust.edu.sa; spf=None smtp.helo=postmaster@mail-it0-f50.google.com
  • Ironport-phdr: 9a23:anvT4B/5SVScf/9uRHKM819IXTAuvvDOBiVQ1KB32+ocTK2v8tzYMVDF4r011RmSDNqds6oMotGVmpioYXYH75eFvSJKW713fDhBt/8rmRc9CtWOE0zxIa2iRSU7GMNfSA0tpCnjYgBaF8nkelLdvGC54yIMFRXjLwp1Ifn+FpLPg8it2e2//5/ebx9UiDahfLh/MAi4oQLNu8cMnIBsMLwxyhzHontJf+RZ22ZlLk+Nkhj/+8m94odt/zxftPw9+cFAV776f7kjQrxDEDsmKWE169b1uhTFUACC+2ETUmQSkhpPHgjF8BT3VYr/vyfmquZw3jSRMNboRr4oRzut86ZrSAfpiCgZMT457HrXgdF0gK5CvR6tuwBzz4vSbYqINvRxY7ndcMsYSmpPXshfWS9PDJ6iYYQTFOcBIfpUopPhq1cSsRezBw+hD/7vxD9SgX/22LU33eE7Hg7b3QwgBc8FvWjXrNruKacdTPq6zKrVxjjEbPNZwyry6InSchw7u/6MXql/cdfMxkY1FAPIlVaQppb4PzOOyuQBqXaU4Pd9Ve61kG4osRh8rz6yzckijYnJg5gaylHC9ShhwYY1I8e4SE9hbtK+HptQrSeXPJZ1TMM6W2xkpjo2x7kctZO4fCUG0oorywPQZvCdc4WE/hTuX/uLLzhinnJqYre/ig6y8Ue+zu38UdG50FNQoSpEltnAr3EN1wDP5sSeRPtw/lut1SyA1wDU7eFELkQ0mrTBJ5E9xb4wk4IfsUXFHiDohEX7lLGaelkg9+Sy6OnqYq/qqoGBO4J7kA3zMrgiltS6AesiMwgOW2ab+f671L3m5UD5Q69FgeA3kqnDqpzVP8cbqbWkAwBIyIkj6A2yDzS839QFhnkHMEhJdwyagIj0IV3OO+r3Ae+lg1uwiDdr2+zGPrr5D5XWMnjMiq3hfa5g60JF1QU8085f6IxQCrwaJPLzW1TxtMDDDhMjPAy0zeHnCM9y1owAQ26PA6mZMLnTsVCS/O4vLfOMN8cpv2PmN/E//+OrgXInlFs1eaiywYBRZ36/BPtrZUSfe3vlxNkbVS8qtxYzSaTOlFSDXDAbM3a7Q685oDUnAYOrA6/AT522nPqM0Dr9BZQANU5cDVXZMn7kdp/McvQGdS6bIYc1kTUIVKLnQowryB2ntyf5wqd8MqzZ9jBerpu1h4s93PHaiRxnrW88NM+ayWzYCjgsxm4=
  • List-archive: <http://lists.gforge.inria.fr/pipermail/starpu-devel/>
  • List-id: "Developers list. For discussion of new features, code changes, etc." <starpu-devel.lists.gforge.inria.fr>

Dear Samuel,
Thanks for your fast response.

My current application is in-memory and should stay in-memory.

I also used 4 nodes (4nodes x 128GiB/node=512GiB) and I am still
getting the same error for matrix size of 166464-by-166464. When I run
on 8 nodes (1024GiB), the error disappears.

On 4 nodes, when I use the following environment variables
export STARPU_DISK_SWAP=/tmp
export STARPU_DISK_SWAP_BACKEND=unistd_o_direct
I get lots of the following error and my program stalls forever:
[starpu][_starpu_mktemp] Could not create temporary file in directory
'/tmp/448/455', mskostemp failed with error 'Invalid argument'

The reason may be that the compute nodes of the Cray XC40 system that
I use may not have /tmp. In fact compute nodes do not have disks at
all.


Again on 4 nodes, when I use export STARPU_LIMIT_CPU_MEM=0, I get the
following message only then my program stalls forever:
[starpu][_starpu_memory_reclaim_generic] Not enough memory left on
node RAM 0. Your application data set seems too huge to fit on the
device, StarPU will cope by trying to purge 32735 MiB out. This
message will not be printed again for further purges


I will investigate throttling task submission with the
STARPU_LIMIT_MAX_SUBMITTED_TASKS environment variable.

Regarding the shared memory system with 256GiB, I only run unix top
program so I might have missed the spikes in memory consumption. What
is the best way to show real memory consumption of StarPU?
I tried to use starpu_data_display_memory_stats(), but it prints only
"Memory stats :" and nothing else. I used the following configure
command:

CFLAGS=-fPIC CXXFLAGS=-fPIC CC=cc CXX=CC FC=ftn ./configure
--prefix=/akbudak/codes/starpu-1.2.1/install --disable-cuda
--disable-opencl --with-mpicc=/opt/cray/craype/2.4.1/bin/cc
--disable-shared --disable-build-doc --disable-export-dynamic
--disable-mpi-check --enable-memory-stats


Thanks,
Kadir


On Tue, Jun 20, 2017 at 3:09 PM, Samuel Thibault
<samuel.thibault@inria.fr> wrote:
> Hello,
>
> Kadir Akbudak, on mar. 20 juin 2017 11:28:52 +0300, wrote:
>> [starpu][starpu_memchunk_tidy] Low memory left on node RAM 0 (6545MiB
>> over 130940MiB).
>
> So it's too tight.
>
>> Can I make disable Out-Of-Core (OOC) support of StarPU so that StarPU
>> does not try to purge memory?
>
> Well, purging memory is the cure, not the disease :)
>
> OOC is not enabled by default actually, so enabling it would actually
> fix the issue by letting StarPU push the data that doesn't fit in memory
> onto disk.
>
> export STARPU_DISK_SWAP=/tmp
> export STARPU_DISK_SWAP_BACKEND=unistd_o_direct
>
> Another way is to let StarPU just allocate all the memory it needs:
> export STARPU_LIMIT_CPU_MEM=0
> it will then not care that it doesn't fit, and it's the Operating System
> which will start swapping data out.
>
>> I have 31 threads on a single node. I use 2 nodes. There is 1 MPI
>> process on each node.
>
> Beware that in the MPI case, on each node there is memory needed for the
> reception of the data produced by the other node.
>
>> The following larger experiment (B) generates the above-mentioned warning:
>> M=N=166464 and MB=NB=1156. Total memory for matrices~=206GB,
>
> So that's only a few dozen GB for the MPI receptions, that may not be
> enough. One thing you can do is try to throttle task submission with the
> STARPU_LIMIT_MAX_SUBMITTED_TASKS environment variable. See our research
> paper on the memory consumption issue on
>
> https://hal.inria.fr/hal-01284004
>
> We plan to make this automatic according to memory consumption, but
> that's still on its way.
>
>> I ran Experiment (B) on a system with 256GB RAM. It worked without any
>> warning. On this system, I also ran for 1 MPI process and 2 MPI
>> processes. The amount of memory required for each process scales as
>> expected. That is 206 GB is required for 1 MPI process and 103GB is
>> required for each of 2 MPI processes.
>
> How did you measure it?
>
> There might be spikes that you didn't catch, and that StarPU wants to
> avoid. In the two-process case, I guess you didn't tell StarPU it has
> to use only half of the memory, so each instance thought it could affort
> 256GB, and was thus happy to, and the OS just had to swap data out so
> that it can actually fit.
>
> Samuel

--

------------------------------
This message and its contents, including attachments are intended solely
for the original recipient. If you are not the intended recipient or have
received this message in error, please notify me immediately and delete
this message from your computer system. Any unauthorized use or
distribution is prohibited. Please consider the environment before printing
this email.




Archives gérées par MHonArc 2.6.19+.

Haut de le page