Accéder au contenu.
Menu Sympa

starpu-devel - Re: [Starpu-devel] automatic RAM allocation and CUDA worker issue

Objet : Developers list for StarPU

Archives de la liste

Re: [Starpu-devel] automatic RAM allocation and CUDA worker issue


Chronologique Discussions 
  • From: Kevin Juilly <kevin.juilly@eolen.com>
  • To: <starpu-devel@lists.gforge.inria.fr>
  • Subject: Re: [Starpu-devel] automatic RAM allocation and CUDA worker issue
  • Date: Thu, 14 Dec 2017 16:14:00 +0100
  • Authentication-results: mail3-smtp-sop.national.inria.fr; spf=None smtp.pra=kevin.juilly@eolen.com; spf=None smtp.mailfrom=kevin.juilly@eolen.com; spf=None smtp.helo=postmaster@mail.leonix.fr
  • Ironport-phdr: 9a23:el8RJxbFshvqY75TY0QgzmH/LSx+4OfEezUN459isYplN5qZoMW+bnLW6fgltlLVR4KTs6sC17KP9fi4EUU7or+5+EgYd5JNUxJXwe43pCcHRPC/NEvgMfTxZDY7FskRHHVs/nW8LFQHUJ2mPw6arXK99yMdFQviPgRpOOv1BpTSj8Oq3Oyu5pHfeQpFiCagbb9oMBm6sRjau9ULj4dlNqs/0AbCrGFSe+RRy2NoJFaTkAj568yt4pNt8Dletuw4+cJYXqr0Y6o3TbpDDDQ7KG81/9HktQPCTQSU+HQRVHgdnwdSDAjE6BH6WYrxsjf/u+Fg1iSWIdH6QLYpUjmk8qxlSgLniD0fOjA38G/ZlNF+gqFVoB2uuxNw3oHab4ObO/p/Za7dYdAXSHBdUspNTSFMAIWxZJYPAeobOuZYqpHwqV4QohugBAmsAv7kxDtVhn/32a061+QgGhzB0QwjAd0OrnXUrNPvOKcQTOC1za3IzTHDbv5Nxzj974zIfQ4nof2WQb1wds/RxFApGgjYgFuQronlMCmU1uQLq2Wb7uxgVfiui2E9sQ1xrCKvy8ExgYfKnoIY0k3I+Cd6zYovO9G0VE12bcS4HJdKtyyWK5N6T8A8T21ypio21L4LtYSlcCUE1Zgr3QPTZv+bf4WO/xntTvyeIS1ii3JgYL+/hwi98UynyuDkU8m7yldKri5fntnIqH8N0BrT6smIS/dn8Eehwy6D1wHV6u5aPUA5jbfXJpwiz7IqiJYfrUfOEjXqlEnsjKKaal0o+u2y5OTmZrXmqIWcN4hxigzmMaQhhNK/AeU+MgQUXWib5OW81Lnn/Uz5W7hFkPo2kq3Hv5DcP8gUuqm5AwpN3oY59xm/Fyum0MgfnXQfN1JFZAiIj5LxO1HTOfz3EOmwg0qynzdv3P3GOrzhAo7RLnjYirvhcrh960lGyAo8099T/ZNUCrcbIPLyQED9rtLYDgVqezCzlrL8FNxnzp5bVW+RD6uxNKLJrUTO6ewoOeaBIo4Tojf0bfY/sa3Al3g8zHwQY6ivlbkKaXS8D7wyIEiHYHyqgMoAFWMblgc5SOHszlaFVGgAND6JQ6sg62RjW8qdBoDZS9X1jQ==
  • List-archive: <http://lists.gforge.inria.fr/pipermail/starpu-devel/>
  • List-id: "Developers list. For discussion of new features, code changes, etc." <starpu-devel.lists.gforge.inria.fr>

Hello,

I tested the starpu-1.2 branch with our test case, and the problem does not occur anymore.

Regards,
Kevin Juilly

Le 30 novembre 2017 à 11 h 33, Olivier Aumage a écrit :
Hello,

Thanks for the bug report. Revisions r22589 for the "starpu-1.2/" branch and r22590 for
the "trunk/" branch should fix the problem.

Best regards,
--
Olivier

Le 13 nov. 2017 à 14:28, Kevin Juilly <kevin.juilly@eolen.com> a écrit :

Hello,

On a node with a GPU, if a program asks for more memory than the GPU got, the
data are registered with -1 as home_node and all but the GPU worker are
disabled, StarPU 1.2 will abort on an assert (see assert.log) even if the
need of any task is well under the size of the GPU memory.

The problem doesn't seem to occur when STARPU_PREFETCH=0.

A reproducer is attached. This code allocate a lot of square matrices and
start task on them. The tasks themselves do not do any work and are only
cuda_func, their only purpose is to force memory management to occur on the
GPU

The program takes two optionnal parameters:
- the size of the matrices
- the number of matrices
When no argument are given, it will try to allocate enough matrices to use 3
times the size of the GPU memory. The assertion doesn't occur when not
allocating enough, even if it is more than the size of the GPU memory.

This reproduces the memory behaviour of the test case that triggered the bug.

Also attached you'll find : a config.log extract and the list of StarPU
environment variables used.


As a note, the same case produced incorrect behaviour with StarPU 1.1. In
this case the worker was stuck (it seems) in an infinite loop inside
_starpu_fetch_task_input (calling function to try to free memory). I haven't
been able to reproduce it recently and have no idea why.

Regards,
Kevin Juilly
AS+ groupe Eolen
<assert.log><alloc_assert.c><extract_config.log><env_repro.txt>_______________________________________________
Starpu-devel mailing list
Starpu-devel@lists.gforge.inria.fr
https://lists.gforge.inria.fr/mailman/listinfo/starpu-devel




  • Re: [Starpu-devel] automatic RAM allocation and CUDA worker issue, Kevin Juilly, 14/12/2017

Archives gérées par MHonArc 2.6.19+.

Haut de le page