Objet : Developers list for StarPU
Archives de la liste
[Starpu-devel] Problem with starpu_mpi_wait_for_all and starpu_mpi_barrier functions
Chronologique Discussions
- From: Mirko Myllykoski <mirkom@cs.umu.se>
- To: Starpu Devel <starpu-devel@lists.gforge.inria.fr>
- Subject: [Starpu-devel] Problem with starpu_mpi_wait_for_all and starpu_mpi_barrier functions
- Date: Wed, 31 Oct 2018 19:46:24 +0100
- Authentication-results: mail3-smtp-sop.national.inria.fr; spf=None smtp.pra=mirkom@cs.umu.se; spf=Pass smtp.mailfrom=mirkom@cs.umu.se; spf=None smtp.helo=postmaster@mail.cs.umu.se
- Ironport-phdr: 9a23:GVMuqBcNQvlS93WU3GCp83KLlGMj4u6mDksu8pMizoh2WeGdxcS6Zh7h7PlgxGXEQZ/co6odzbaO7Oa4ASQp2tWoiDg6aptCVhsI2409vjcLJ4q7M3D9N+PgdCcgHc5PBxdP9nC/NlVJSo6lPwWB6nK94iQPFRrhKAF7Ovr6GpLIj8Swyuu+54Dfbx9HiTahY75+Ngm6oRnMvcQKnIVuLbo8xAHUqXVSYeRWwm1oJVOXnxni48q74YBu/SdNtf8/7sBMSar1cbg2QrxeFzQmLns65Nb3uhnZTAuA/WUTX2MLmRdVGQfF7RX6XpDssivms+d2xSeXMdHqQb0yRD+v9LlgRgP2hygbNj456GDXhdJ2jKJHuxKquhhzz5fJbI2JKPZye6XQds4YS2VcRMZcTyNODZ+yYYUPEeQPIOVWrobzqFYVqBuyGQusCfnzxjJSmHP727Ax3eQ7EQHB2QwtB9UAsHXOrNX2M6cZTOe7zanMzTXHb/JW2jD96JPLchAgvPqBWrdwccvLyUksEAPFi06dppD+Pz+Py+QNtnWb4/B7Ve2xkmMqrRx6rDu3xso0l4XEiJ4Zxkra+Sh3xIs5P9K1RFJhbdK6EJZcrz+WO5dqTs8/RmxluT01xqEDtJGleSUHx4gryhHDZ/GCdoWF4xHuWeCMKjlinn1lYqiwhxOq/Eig1OL8Us603U5PriVfk9nMsmoB2ADI6sSdV/t9+1qh1SyU1w/N8u1EIEY0mrTHK5M537I9mIYfvV7dEiL4nEj6lqCbelk+9uS26+nrerDmqYWdN49whAH+KKMumsmnDOQ9MwgORWub+eO51LD44UL5W69GjuAzkqnEqpzaPtoUqrajAwBJyoYj9wq/DzC+3dQDgXkINkhFeAqaj4TwJl7BPu74Aum7g1m3lDdrxuvGPqH6ApnXIXjDnrDhfaxy60FC0gYzzNZf54hVCr4bOv7zVFXx55TkCUoCLwWu2/uvJNhj24dWDXmTC7WBLebevEGF4sorIvKQf8katjHnJPVj5vj0jHZ/l0VLLoez2p5CTXGiHfMuBlSdZWvvhcwCEC9euwskT+vvoFaZFyNWejCpUvRvtXkAFIu6ANKbFciWi7ub0XL+R8UOPzEUOhW3CX7tMr68dbIJYSOWLNVml2VdB7O6DZIkyFe1uV2jkuY1Hq/v4iQd8Knb+p1t/eSKzEM57nppCtnbyGzfFzgpzFNNfCc/2eVEmWI4ylqH1vEm0flRFNgV7OgPTwInc4XRnbR3
- List-archive: <http://lists.gforge.inria.fr/pipermail/starpu-devel/>
- List-id: "Developers list. For discussion of new features, code changes, etc." <starpu-devel.lists.gforge.inria.fr>
Hi,
I have been experiencing problems with the starpu_mpi_wait_for_all and starpu_mpi_barrier functions. In some situations these functions cause my code to freeze. I have attached a test program that replicates the problem to this email. The test program insert a set of tasks:
45 for (int i = 0; i < TASKS; i++) {
46 starpu_mpi_task_insert(
47 MPI_COMM_WORLD, &codelet,
48 STARPU_EXECUTE_ON_NODE, i % world_size,
49 STARPU_RW, handles[i % HANDLES], 0);
50 }
And then calls the starpu_mpi_wait_for_all function:
62 starpu_mpi_wait_for_all(MPI_COMM_WORLD);
The test program freezes at that point.
However, if I first access the local data handles, then everything works just fine:
53 for (int i = 0; i < HANDLES; i++) {
54 int owner = i % world_size;
55 if (owner == my_rank) {
56 starpu_data_acquire(handles[i], STARPU_R);
57 starpu_data_release(handles[i]);
58 }
59 }
60
61 printf("Rank %d: Waint for all\n", my_rank);
62 starpu_mpi_wait_for_all(MPI_COMM_WORLD);
I have also experienced a similar problem with the starpu_mpi_barrier function but that seems to be a much more difficult thing to replicate.
This problems occurs with StarPU 1.2.6 and the latest StarPU 1.3 nightly tarball. I am using OpenMPI. I tested the program on two machines with a quad-core non-hyperthreading CPU and a six-core hyperthreading CPU. I tried 4 and 6 MPI ranks. I set STARPU_NCPU to 1 but the problems occurs also when StarPU is initialized with default settings.
Best Regards,
Mirko Myllykoski
#include <starpu.h> #include <starpu_mpi.h> #define HANDLES 10 #define TASKS 20 starpu_data_handle_t handles[HANDLES]; // dummy codelet static void kernel(void *buffers[], void *cl_args) { } static struct starpu_codelet codelet = { .cpu_funcs = { kernel }, .nbuffers = 1, .modes = { STARPU_RW } }; int main(int argc, char **argv) { int thread_support; MPI_Init_thread( &argc, &argv, MPI_THREAD_MULTIPLE, &thread_support); starpu_init(NULL); starpu_mpi_init(&argc, &argv, 0); int world_size = starpu_mpi_world_size(); int my_rank = starpu_mpi_world_rank(); printf("Rank %d: Initializing data handles\n", my_rank); for (int i = 0; i < HANDLES; i++) { starpu_matrix_data_register( &handles[i], -1, 0, 128, 128, 128, sizeof(double)); // register tag and owner int owner = i % world_size; starpu_mpi_data_register(handles[i], i, owner); // owner "initializes" the data handles if (owner == my_rank) { starpu_data_acquire(handles[i], STARPU_W); starpu_data_release(handles[i]); } } printf("Rank %d: Inserting tasks\n", my_rank); for (int i = 0; i < TASKS; i++) { starpu_mpi_task_insert( MPI_COMM_WORLD, &codelet, STARPU_EXECUTE_ON_NODE, i % world_size, STARPU_RW, handles[i % HANDLES], 0); } //printf("Rank %d: Accessing data handles\n", my_rank); //for (int i = 0; i < HANDLES; i++) { // int owner = i % world_size; // if (owner == my_rank) { // starpu_data_acquire(handles[i], STARPU_R); // starpu_data_release(handles[i]); // } //} printf("Rank %d: Waint for all\n", my_rank); starpu_mpi_wait_for_all(MPI_COMM_WORLD); //printf("Rank %d: Barrier\n", my_rank); //starpu_mpi_barrier(MPI_COMM_WORLD); printf("Rank %d: Unregistering data handles\n", my_rank); for (int i = 0; i < HANDLES; i++) starpu_data_unregister(handles[i]); starpu_mpi_shutdown(); starpu_shutdown(); MPI_Finalize(); printf("Rank %d: Ready\n", my_rank); return 0; }
- [Starpu-devel] Problem with starpu_mpi_wait_for_all and starpu_mpi_barrier functions, Mirko Myllykoski, 31/10/2018
Archives gérées par MHonArc 2.6.19+.