Accéder au contenu.
Menu Sympa

starpu-devel - [Starpu-devel] Problem with starpu_mpi_wait_for_all and starpu_mpi_barrier functions

Objet : Developers list for StarPU

Archives de la liste

[Starpu-devel] Problem with starpu_mpi_wait_for_all and starpu_mpi_barrier functions


Chronologique Discussions 
  • From: Mirko Myllykoski <mirkom@cs.umu.se>
  • To: Starpu Devel <starpu-devel@lists.gforge.inria.fr>
  • Subject: [Starpu-devel] Problem with starpu_mpi_wait_for_all and starpu_mpi_barrier functions
  • Date: Wed, 31 Oct 2018 19:46:24 +0100
  • Authentication-results: mail3-smtp-sop.national.inria.fr; spf=None smtp.pra=mirkom@cs.umu.se; spf=Pass smtp.mailfrom=mirkom@cs.umu.se; spf=None smtp.helo=postmaster@mail.cs.umu.se
  • Ironport-phdr: 9a23:GVMuqBcNQvlS93WU3GCp83KLlGMj4u6mDksu8pMizoh2WeGdxcS6Zh7h7PlgxGXEQZ/co6odzbaO7Oa4ASQp2tWoiDg6aptCVhsI2409vjcLJ4q7M3D9N+PgdCcgHc5PBxdP9nC/NlVJSo6lPwWB6nK94iQPFRrhKAF7Ovr6GpLIj8Swyuu+54Dfbx9HiTahY75+Ngm6oRnMvcQKnIVuLbo8xAHUqXVSYeRWwm1oJVOXnxni48q74YBu/SdNtf8/7sBMSar1cbg2QrxeFzQmLns65Nb3uhnZTAuA/WUTX2MLmRdVGQfF7RX6XpDssivms+d2xSeXMdHqQb0yRD+v9LlgRgP2hygbNj456GDXhdJ2jKJHuxKquhhzz5fJbI2JKPZye6XQds4YS2VcRMZcTyNODZ+yYYUPEeQPIOVWrobzqFYVqBuyGQusCfnzxjJSmHP727Ax3eQ7EQHB2QwtB9UAsHXOrNX2M6cZTOe7zanMzTXHb/JW2jD96JPLchAgvPqBWrdwccvLyUksEAPFi06dppD+Pz+Py+QNtnWb4/B7Ve2xkmMqrRx6rDu3xso0l4XEiJ4Zxkra+Sh3xIs5P9K1RFJhbdK6EJZcrz+WO5dqTs8/RmxluT01xqEDtJGleSUHx4gryhHDZ/GCdoWF4xHuWeCMKjlinn1lYqiwhxOq/Eig1OL8Us603U5PriVfk9nMsmoB2ADI6sSdV/t9+1qh1SyU1w/N8u1EIEY0mrTHK5M537I9mIYfvV7dEiL4nEj6lqCbelk+9uS26+nrerDmqYWdN49whAH+KKMumsmnDOQ9MwgORWub+eO51LD44UL5W69GjuAzkqnEqpzaPtoUqrajAwBJyoYj9wq/DzC+3dQDgXkINkhFeAqaj4TwJl7BPu74Aum7g1m3lDdrxuvGPqH6ApnXIXjDnrDhfaxy60FC0gYzzNZf54hVCr4bOv7zVFXx55TkCUoCLwWu2/uvJNhj24dWDXmTC7WBLebevEGF4sorIvKQf8katjHnJPVj5vj0jHZ/l0VLLoez2p5CTXGiHfMuBlSdZWvvhcwCEC9euwskT+vvoFaZFyNWejCpUvRvtXkAFIu6ANKbFciWi7ub0XL+R8UOPzEUOhW3CX7tMr68dbIJYSOWLNVml2VdB7O6DZIkyFe1uV2jkuY1Hq/v4iQd8Knb+p1t/eSKzEM57nppCtnbyGzfFzgpzFNNfCc/2eVEmWI4ylqH1vEm0flRFNgV7OgPTwInc4XRnbR3
  • List-archive: <http://lists.gforge.inria.fr/pipermail/starpu-devel/>
  • List-id: "Developers list. For discussion of new features, code changes, etc." <starpu-devel.lists.gforge.inria.fr>

Hi,

I have been experiencing problems with the starpu_mpi_wait_for_all and starpu_mpi_barrier functions. In some situations these functions cause my code to freeze. I have attached a test program that replicates the problem to this email. The test program insert a set of tasks:

45 for (int i = 0; i < TASKS; i++) {
46 starpu_mpi_task_insert(
47 MPI_COMM_WORLD, &codelet,
48 STARPU_EXECUTE_ON_NODE, i % world_size,
49 STARPU_RW, handles[i % HANDLES], 0);
50 }

And then calls the starpu_mpi_wait_for_all function:

62 starpu_mpi_wait_for_all(MPI_COMM_WORLD);

The test program freezes at that point.

However, if I first access the local data handles, then everything works just fine:

53 for (int i = 0; i < HANDLES; i++) {
54 int owner = i % world_size;
55 if (owner == my_rank) {
56 starpu_data_acquire(handles[i], STARPU_R);
57 starpu_data_release(handles[i]);
58 }
59 }
60
61 printf("Rank %d: Waint for all\n", my_rank);
62 starpu_mpi_wait_for_all(MPI_COMM_WORLD);

I have also experienced a similar problem with the starpu_mpi_barrier function but that seems to be a much more difficult thing to replicate.

This problems occurs with StarPU 1.2.6 and the latest StarPU 1.3 nightly tarball. I am using OpenMPI. I tested the program on two machines with a quad-core non-hyperthreading CPU and a six-core hyperthreading CPU. I tried 4 and 6 MPI ranks. I set STARPU_NCPU to 1 but the problems occurs also when StarPU is initialized with default settings.

Best Regards,
Mirko Myllykoski
#include <starpu.h>
#include <starpu_mpi.h>

#define HANDLES 10
#define TASKS 20

starpu_data_handle_t handles[HANDLES];

// dummy codelet
static void kernel(void *buffers[], void *cl_args) { }
static struct starpu_codelet codelet = {
    .cpu_funcs = { kernel },
    .nbuffers = 1,
    .modes = { STARPU_RW }
};

int main(int argc, char **argv)
{
    int thread_support;
    MPI_Init_thread(
        &argc, &argv, MPI_THREAD_MULTIPLE, &thread_support);
    starpu_init(NULL);
    starpu_mpi_init(&argc, &argv, 0);

    int world_size = starpu_mpi_world_size();
    int my_rank = starpu_mpi_world_rank();

    printf("Rank %d: Initializing data handles\n", my_rank);
    for (int i = 0; i < HANDLES; i++) {
        starpu_matrix_data_register(
            &handles[i], -1, 0, 128, 128, 128, sizeof(double));

        // register tag and owner
        int owner = i % world_size;
        starpu_mpi_data_register(handles[i], i, owner);

        // owner "initializes" the data handles
        if (owner == my_rank) {
            starpu_data_acquire(handles[i], STARPU_W);
            starpu_data_release(handles[i]);
        }
    }

    printf("Rank %d: Inserting tasks\n", my_rank); 
    for (int i = 0; i < TASKS; i++) {
        starpu_mpi_task_insert(
            MPI_COMM_WORLD, &codelet,
            STARPU_EXECUTE_ON_NODE, i % world_size,
            STARPU_RW, handles[i % HANDLES], 0);
    }

    //printf("Rank %d: Accessing data handles\n", my_rank);
    //for (int i = 0; i < HANDLES; i++) {
    //    int owner = i % world_size;
    //    if (owner == my_rank) {
    //        starpu_data_acquire(handles[i], STARPU_R);
    //        starpu_data_release(handles[i]);
    //    }
    //}

    printf("Rank %d: Waint for all\n", my_rank); 
    starpu_mpi_wait_for_all(MPI_COMM_WORLD);
    
    //printf("Rank %d: Barrier\n", my_rank);
    //starpu_mpi_barrier(MPI_COMM_WORLD);

    printf("Rank %d: Unregistering data handles\n", my_rank); 
    for (int i = 0; i < HANDLES; i++)
        starpu_data_unregister(handles[i]);

    starpu_mpi_shutdown();
    starpu_shutdown();
    MPI_Finalize();

    printf("Rank %d: Ready\n", my_rank); 

    return 0;
}


  • [Starpu-devel] Problem with starpu_mpi_wait_for_all and starpu_mpi_barrier functions, Mirko Myllykoski, 31/10/2018

Archives gérées par MHonArc 2.6.19+.

Haut de le page