Accéder au contenu.
Menu Sympa

starpu-devel - Re: [Starpu-devel] Assert : Number of copy requests left is not zero

Objet : Developers list for StarPU

Archives de la liste

Re: [Starpu-devel] Assert : Number of copy requests left is not zero


Chronologique Discussions 
  • From: Xavier Lacoste <xl64100@gmail.com>
  • To: Samuel Thibault <samuel.thibault@ens-lyon.org>
  • Cc: starpu-devel@lists.gforge.inria.fr
  • Subject: Re: [Starpu-devel] Assert : Number of copy requests left is not zero
  • Date: Tue, 4 Nov 2014 11:18:01 +0100
  • List-archive: <http://lists.gforge.inria.fr/pipermail/starpu-devel/>
  • List-id: "Developers list. For discussion of new features, code changes, etc." <starpu-devel.lists.gforge.inria.fr>


Le 4 nov. 2014 à 11:11, Samuel Thibault <samuel.thibault@ens-lyon.org> a
écrit :

> Xavier Lacoste, le Tue 04 Nov 2014 10:49:52 +0100, a écrit :
>>>> [starpu][_starpu_mpi_early_data_check_termination][assert failure]
>>>> Number of copy requests left is not zero
>>>>
>>>> Have you get an idea of what could cause this assert ?
>>>
>>> I have added a note in the message: did you forget to post a receive
>>> corresponding to a send?
>>
>> Hmm I'll have a look at it.
>> Can it be that I flush a data earlier than it should be ?
>
> Ah, I'm realizing: you are not using starpu_mpi_send, starpu_mpi_recv
> and alike explicitly, and always rely on the communications implicitly
> generated by starpu_mpi_task_insert?
Yes, indeed.
>
> Normally, if all MPI nodes are doing exactly the same submission loop,
> flushing data earlier is not a problem: the node needing it later will
> receive it again, and the node owning it will know that the node needing
> it later will need it, and thus sending it again.
I'm not doing the same submission loop on all MPI nodes.
I'm inserting only tasks using local data (as I tried to explain in the
algorithms in my previous mail).
Thus a mistake is possible here in my side.
> Perhaps there is a
> bug in StarPU-MPI, but we already test this scenario, so I'd rather
> first make sure that the application is really running the same
> submission loop first (perhaps you have made mistakes while pruning the
> submission, and thus the node owning the data doesn't know it has to
> send it again, or conversely).
Yes, I'll check my submission loops.
>
> Samuel

Attachment: signature.asc
Description: Message signed with OpenPGP using GPGMail




Archives gérées par MHonArc 2.6.19+.

Haut de le page