Accéder au contenu.
Menu Sympa

starpu-devel - Re: [Starpu-devel] StarPU code hangs when using a lot of GPU memory

Objet : Developers list for StarPU

Archives de la liste

Re: [Starpu-devel] StarPU code hangs when using a lot of GPU memory


Chronologique Discussions 
  • From: Mirko Myllykoski <mirkom@cs.umu.se>
  • To: samuel.thibault@inria.fr
  • Cc: Starpu Devel <starpu-devel@lists.gforge.inria.fr>
  • Subject: Re: [Starpu-devel] StarPU code hangs when using a lot of GPU memory
  • Date: Fri, 01 Mar 2019 18:40:02 +0100
  • Authentication-results: mail2-smtp-roc.national.inria.fr; spf=None smtp.pra=mirkom@cs.umu.se; spf=Pass smtp.mailfrom=mirkom@cs.umu.se; spf=None smtp.helo=postmaster@mail.cs.umu.se
  • Ironport-phdr: 9a23:N++odxc3IPgMli6TQ4bg/eKelGMj4u6mDksu8pMizoh2WeGdxcuyZx7h7PlgxGXEQZ/co6odzbaO4+a4ASQp2tWoiDg6aptCVhsI2409vjcLJ4q7M3D9N+PgdCcgHc5PBxdP9nC/NlVJSo6lPwWB6nK94iQPFRrhKAF7Ovr6GpLIj8Swyuu+54Dfbx9HiTahYr5+Ngm6oRnMvcQKnIVuLbo8xAHUqXVSYeRWwm1oJVOXnxni48q74YBu/SdNtf8/7sBMSar1cbg2QrxeFzQmLns65Nb3uhnZTAuA/WUTX2MLmRdVGQfF7RX6XpDssivms+d2xSeXMdHqQb0yRD+v9LlgRgP2hygbNj456GDXhdJ2jKJHuxKquhhzz5fJbI2JKPZye6XQds4YS2VcRMZcTyNODZ+zYYUBD+QPI/tWoIvzp1UNohqxCxKhBP/txz9KmnP6wbc33/onHArb3AIgBdUOsHHModvyNacSS+O1zK7VxjvEb/JW3TP96YjLchAmuvGMXrNwetfWxEkqFgPFlFaQqYvgPz6OyusNqHKX7/dlVeKykWInsB9+ryGpy8wxiYfJnpoYxk3K+Cll2oo5O9O1RUphbdOrDJdcrT+WOotuTs88X21kpDs2x7gHtJGgYCQHzYooyhvQZvCbfIWE/hfuWeOQLDp7gn9uZaixiAyo8Ue6z+3xTsm030hOripCitTMs2oC1x3X6sSdVvR95V2t2SuK1wDO8O1EOl47mbLaK54n3LEwioIevVnNEyPqgkn6kqGbe0E+9uWn9+jreKvqq5+EO49xkA7+M6AumsKlAeQ/NwgDR22b+eWm1L3g+k35Ra5HgeEtkqXDrZDaINkbqrSiAwBLyooj8QqwDy+60NQEmnkKNElFeA6dgIjzI1HOPen0AuqhjFSyjjhrw+vLPrngApXWMnjDi63tfblz605b0gozws5Q64hVCrEHOvLzW1X+uMbWDh8jYESIxLP8AdBg24dYRWKOCKaENL/6sFmS5+tpLfPfSpUSvWPYKuIm4LbLnHs9iFwaZqqolc8SaWq7GfFOKF7ffH/xxMwMRzRZ9jEiRfDn3QXRGQVYYGy/CudlvmliWdCWSLzbT4Xou4SvmSKyH5lYfGdDUwneGmyuaoCZHewBOnvLfp1R1wccXL3kcLcPkAm0vVajmbF8aPfR52sDuMC7jYUn16jojRg3sAdMIYGd3mWKFj8mm2oJQ3k9x+ZipFE70VrRiaU=
  • List-archive: <http://lists.gforge.inria.fr/pipermail/starpu-devel/>
  • List-id: "Developers list. For discussion of new features, code changes, etc." <starpu-devel.lists.gforge.inria.fr>

Hi,

On 2019-03-01 18:18, Samuel Thibault wrote:

It actually is: nosubdataref and nodataref mean that all of the 5815 are
considered busy, and that's the reason why they can't be evicted.

Could it be that you have a *lot* of tasks which are ready? It might
just be that they all almost have their data on the GPU, but none of
them has all it needs, and thus none of them can be executed. It is a
fairly extreme situation that we did envision, but didn't put management
in StarPU yet. That's however precisely the kind of work I'll be having
a look at in the coming months :)

There is definitively a lot of tasks ready to be scheduled and these tasks would end up updating almost the entire matrix in small sections. The entire matrix cannot be fitted into the GPU memory so the situation you are describing is likely to happen with my code.

Is StarPU smart enough to move tasks from the CUDA worker to the CPU workers? The scheduling context should have a large pool of CPU workers at the point were the programs stalls.

- Mirko




Archives gérées par MHonArc 2.6.19+.

Haut de le page