Accéder au contenu.
Menu Sympa

starpu-devel - Re: [Starpu-devel] StarPU on multiple communicator

Objet : Developers list for StarPU

Archives de la liste

Re: [Starpu-devel] StarPU on multiple communicator


Chronologique Discussions 
  • From: Xavier Lacoste <xavier.lacoste@inria.fr>
  • To: Nathalie Furmento <nathalie.furmento@labri.fr>
  • Cc: Mathieu Faverge <Mathieu.Faverge@inria.fr>, starpu-devel@lists.gforge.inria.fr
  • Subject: Re: [Starpu-devel] StarPU on multiple communicator
  • Date: Mon, 2 Feb 2015 15:41:10 +0100
  • List-archive: <http://lists.gforge.inria.fr/pipermail/starpu-devel/>
  • List-id: "Developers list. For discussion of new features, code changes, etc." <starpu-devel.lists.gforge.inria.fr>

The "more complicated" test worked too.
In Jorek, I just noticed that before the WATCHDOG message I also have a MPI_ERR_TRUNCATE in a MPI_Test.

XL.

Le 2 févr. 2015 à 15:13, Xavier Lacoste <xavier.lacoste@inria.fr> a écrit :

Hello,

thanks, it worked with my simple example in PaStiX factorizing two simple matrices (laplacian with n=1000) on two communicators.
But inside Jorek, I still trigger the watchdog after waiting more than 10.0 seconds (my tasks should be smaller than that).
I can have a stack with ddt but don't know what information is relevant in it.
Maybe I shouldn't use the watchdog ? I'll try the simple example with more complex matrices to try to reproduce it.

Cheers,

XL.


Le 30 janv. 2015 à 13:51, Nathalie Furmento <nathalie.furmento@labri.fr> a écrit :

The functionality should now be fully supported in the branch 1.1.

There is a small test in mpi/tests/comm.c

Please let me know if it works for you.

Cheers,

Nathalie

On 27/01/2015 17:07, Xavier Lacoste wrote:
Sorry,

I didn't see your answer,

Yes this is what I want to do.

Can I see your code to try to understand what is wrong in mine ?

Cheers,

XL


Le 26 janv. 2015 à 11:59, Nathalie Furmento <nathalie.furmento@labri.fr> a écrit :

I also tested with calling starpu_mpi_insert_task. Again it works fine with StarPU 1.1 but fails with the trunk.

The data is owned by the node 0 of each sub communicator. The call

  starpu_mpi_insert_task(newcomm, &mycodelet,
                 STARPU_RW, data,
                 STARPU_EXECUTE_ON_NODE, 1,
                 0);

results in the execution of the codelet on the processes having a rank 1 in each sub-communicator, which is if i understood correctly what you would like to have in your application.

Cheers,

Nathalie

On 26/01/2015 11:35, Nathalie Furmento wrote:
Hello,

I tried a simple application which splits in 2 communicators a group of 4 MPI processes,
and a communication is made between the rank 0 and the rank 1 of each new sub communicator.

This works fine with the branch 1.1 of StarPU. However it fails with the trunk version of
StarPU. Its new communication engine assumes MPI_COMM_WORLD to be the communicator for all
communications. I am going to have a look at how it can be fixed. I will keep you informed.

Cheers,

Nathalie

On Jan 23, 16:40, Xavier Lacoste wrote:
Hello,

I am using PaStiX in Jorek (A Tokamak simulation code).

Jorek uses sub-communicators to factorize different matrices on different communicators (i.e. P0 and P1 factorize mat1 while P2 and P3 factorize mat2). The two matrices have exactly the same shape, so the same shape block columns (my data unit in starpu), identified with the same tags, but these data are in fact different as they belong to different matrices (mat1 and mat2).

As far as I understand this cannot work in StarPU today ( I saw Nathalie who agrees with me on that ;) ).

P0 and P2 (resp. P1 and P3) will register their data with exactly the same TAG. Thus we have two different data with the same TAG which could be a problem.

An other issue is that the rank of the data is set using the sub-communicator rank which may not be what is expected by StarPU.

All this results in StarPU hanging in a deadlock, certainly waiting for communications...

I hope I am not too confusing in my explanations.

I'll try a hack in my code to fix that: use global rank when registering data and retrieve the number of communicators (here my communicators have the same size hopefully) and add an offset to the TAGS.

I think supporting communicator in StarPU would be a great feature.

Regards,

XL.


----------------------------------------
Xavier Lacoste
xavier.lacoste@inria.fr
INRIA Bordeaux Sud-Ouest
200, avenue de la Vieille Tour
33405 Talence Cedex
Tél : +33 (0)5 24 57 40 69








_______________________________________________
Starpu-devel mailing list
Starpu-devel@lists.gforge.inria.fr
http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/starpu-devel


_______________________________________________
Starpu-devel mailing list
Starpu-devel@lists.gforge.inria.fr
http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/starpu-devel




Archives gérées par MHonArc 2.6.19+.

Haut de le page