Accéder au contenu.
Menu Sympa

starpu-devel - [Starpu-devel] Data Distribution using Starpumpi

Objet : Developers list for StarPU

Archives de la liste

[Starpu-devel] Data Distribution using Starpumpi


Chronologique Discussions 
  • From: Yizhou Qian <yizhou96@gmail.com>
  • To: starpu-devel@lists.gforge.inria.fr
  • Subject: [Starpu-devel] Data Distribution using Starpumpi
  • Date: Sat, 21 Dec 2019 00:28:38 +0800
  • Authentication-results: mail3-smtp-sop.national.inria.fr; spf=None smtp.pra=yizhou96@gmail.com; spf=Pass smtp.mailfrom=yizhou96@gmail.com; spf=None smtp.helo=postmaster@mail-lf1-f46.google.com
  • Ironport-phdr: 9a23:wT52wxxKrSZsIvzXCy+O+j09IxM/srCxBDY+r6Qd1OgRIJqq85mqBkHD//Il1AaPAdyAragd0KGG7ujJYi8p2d65qncMcZhBBVcuqP49uEgeOvODElDxN/XwbiY3T4xoXV5h+GynYwAOQJ6tL1LdrWev4jEMBx7xKRR6JvjvGo7Vks+7y/2+94fcbglVijexe61+IAi4oAnetcQbgZZpJ7osxBfOvnZGYfldy3lyJVKUkRb858Ow84Bm/i9Npf8v9NNOXLvjcaggQrNWEDopM2Yu5M32rhbDVheA5mEdUmoNjBVFBRXO4QzgUZfwtiv6sfd92DWfMMbrQ704RSiu4qF2QxLulSwJNSM28HvPh8N/jKxVrhGvqQFhzYHIe4yaLuZyc7nHcN8GWWZMXMBcXDFBDIOmaIsPCvIMMuRZr4j8p1sOqga+DhS3CuPu0DBIgGL90Ko60uQgEADG3AsgH88KvXnVt9j1O6ISXvq0zKnM1znMc/RW2TLk5YXObxsvoumMUKpufcbNzUQjDQDIg1WKpYD4Pj6Y1P4Bvmea4udmSOmhkXQoqxtrrTiq3sosipfGhoYSyl3c8CV22oc1JdmhRE5/b96oDYJcty+VOoZ3WM8iTGZouCE1yr0Cp5G3ZjQFyJMixxLHavyHdZaH4g77WeqPPTt1gGhpdbG/ihqo7ESty+/xWtO73VtLtiZFl8PDtnEJ1xzd8MiHTf5981+h2DaO1gHT6uZEIV0wmKfaMJMhzbswmYASsUTHBCP5hEL2jKqOekU+5ueo8/jnYqnhppKELI90lhvxMr42msyiGOg3LxYBX3aF9uS4z7Dj+Uz5QK5Wjv0tiKXZv57aJcMBpq62HQBZyIcj6xClDzenytsUh3cHLEgWMC6A2pP1MkvWPbX0AOmyh3yokSx33LbJMLr7DZiLL37Zkb6nc6wuxVRbzV8CzMJf4I8cM7AIJrqnS07jtNvFX0ERPAm9wuKhA9J4gNBNEVmTC7OUZfuB+WSD4fgidrHVNd0l/Q3lIv1g3MbAyHowmFsTZ66shMJFZ3WxH/AgKEKcMyO13oUxVFwStw97d9TEzUWYWGcKNXm3VqM4oDo8DdD+VNqRdsWWmLWEmRyDMNhWa2RBUA3eFH7pc8CAVa5JZn7MZMBmlTMAWP6qTIpzjRw=
  • List-archive: <http://lists.gforge.inria.fr/pipermail/starpu-devel/>
  • List-id: "Developers list. For discussion of new features, code changes, etc." <starpu-devel.lists.gforge.inria.fr>

Hi,

I am trying some simple starpu mpi application, in which I am incrementing one value a on rank 0 and adding it later to another value b on rank 1. I am following the documentation and only registering a on rank 0 and b on rank 1, while using the mentioned "lazy allocation" technique for other variables that each rank may only need reading access. However, I got the error suggesting that on rank 1 the data handle for a is uninitialized? The exact message is:

[starpu][_starpu_select_src_node][assert failure] The data for the handle 0x2120a50 is requested, but the handle does not have a valid value. Perhaps some initialization task is missing?


Below is my full error message and source code:


error message:

[adncat@sh-107-47 ~/yizhou/Starpu]$ mpirun -n 2 ./cholesky_mpi 2 2

[starpu][compare_value_and_recalibrate] Current configuration does not match the bus performance model (CPUS: (stored) 1 != (current) 24), recalibrating...

[starpu][compare_value_and_recalibrate] Current configuration does not match the bus performance model (CPUS: (stored) 1 != (current) 24), recalibrating...

[starpu][compare_value_and_recalibrate] ... done

[starpu][compare_value_and_recalibrate] ... done

[starpu][_starpu_mpi_print_thread_level_support] MPI_Init_thread level = MPI_THREAD_SERIALIZED; Multiple threads may make MPI calls, but only one at a time.

Rank 0 of 2 ranks

Incrementing *a, *a=2

[starpu][_starpu_mpi_print_thread_level_support] MPI_Init_thread level = MPI_THREAD_SERIALIZED; Multiple threads may make MPI calls, but only one at a time.

Rank 1 of 2 ranks

/home/users/adncat/starpu/lib/libstarpu-1.3.so.1(_starpu_select_src_node+0x348)[0x7fc9e2880c18]

/home/users/adncat/starpu/lib/libstarpu-1.3.so.1(_starpu_create_request_to_fetch_data+0xd0c)[0x7fc9e2881b6c]

/home/users/adncat/starpu/lib/libstarpu-1.3.so.1(_starpu_fetch_data_on_node+0x10e)[0x7fc9e2881d8e]

/home/users/adncat/starpu/lib/libstarpu-1.3.so.1(_starpu_fetch_task_input+0x113)[0x7fc9e2882c73]

/home/users/adncat/starpu/lib/libstarpu-1.3.so.1(_starpu_cpu_driver_run_once+0xd1)[0x7fc9e28c9af1]

/home/users/adncat/starpu/lib/libstarpu-1.3.so.1(_starpu_cpu_worker+0x2d)[0x7fc9e28c9d2d]

/lib64/libpthread.so.0(+0x7dd5)[0x7fc9e066cdd5]

/lib64/libc.so.6(clone+0x6d)[0x7fc9df5e402d]


[starpu][_starpu_select_src_node][assert failure] The data for the handle 0x2120a50 is requested, but the handle does not have a valid value. Perhaps some initialization task is missing?


cholesky_mpi: datawizard/coherency.c:70: _starpu_select_src_node: Assertion `src_node_mask != 0' failed.

[sh-107-47:284948] *** Process received signal ***

[sh-107-47:284948] Signal: Aborted (6)

[sh-107-47:284948] Signal code:  (-6)

[sh-107-47:284948] [ 0] /lib64/libpthread.so.0(+0xf5d0)[0x7fc9e06745d0]

[sh-107-47:284948] [ 1] /lib64/libc.so.6(gsignal+0x37)[0x7fc9df51c2c7]

[sh-107-47:284948] [ 2] /lib64/libc.so.6(abort+0x148)[0x7fc9df51d9b8]

[sh-107-47:284948] [ 3] /lib64/libc.so.6(+0x2f0e6)[0x7fc9df5150e6]

[sh-107-47:284948] [ 4] /lib64/libc.so.6(+0x2f192)[0x7fc9df515192]

[sh-107-47:284948] [ 5] /home/users/adncat/starpu/lib/libstarpu-1.3.so.1(_starpu_select_src_node+0x398)[0x7fc9e2880c68]

[sh-107-47:284948] [ 6] /home/users/adncat/starpu/lib/libstarpu-1.3.so.1(_starpu_create_request_to_fetch_data+0xd0c)[0x7fc9e2881b6c]

[sh-107-47:284948] [ 7] /home/users/adncat/starpu/lib/libstarpu-1.3.so.1(_starpu_fetch_data_on_node+0x10e)[0x7fc9e2881d8e]

[sh-107-47:284948] [ 8] /home/users/adncat/starpu/lib/libstarpu-1.3.so.1(_starpu_fetch_task_input+0x113)[0x7fc9e2882c73]

[sh-107-47:284948] [ 9] /home/users/adncat/starpu/lib/libstarpu-1.3.so.1(_starpu_cpu_driver_run_once+0xd1)[0x7fc9e28c9af1]

[sh-107-47:284948] [10] /home/users/adncat/starpu/lib/libstarpu-1.3.so.1(_starpu_cpu_worker+0x2d)[0x7fc9e28c9d2d]

[sh-107-47:284948] [11] /lib64/libpthread.so.0(+0x7dd5)[0x7fc9e066cdd5]

[sh-107-47:284948] [12] /lib64/libc.so.6(clone+0x6d)[0x7fc9df5e402d]

[sh-107-47:284948] *** End of error message ***

-------------------------------------------------------

Primary job  terminated normally, but 1 process returned

a non-zero exit code. Per user-direction, the job has been aborted.

-------------------------------------------------------

--------------------------------------------------------------------------

mpirun noticed that process rank 1 with PID 0 on node sh-107-47 exited on signal 6 (Aborted).





Source code:

void task1(void *buffers[], void *cl_arg) { 

    int *A= (int *)STARPU_VARIABLE_GET_PTR(buffers[0]);
    *A+=1;
    cout<<"Incrementing *a, *a="<<*A<<endl;
    return;
     }
struct starpu_codelet cl1 = {
    .where = STARPU_CPU,
    .cpu_funcs = { task1, NULL },
    .nbuffers = 1,
    .modes = { STARPU_RW }
};

void task2(void *buffers[], void *cl_arg) { 

    int *A0= (int *)STARPU_VARIABLE_GET_PTR(buffers[0]);
    int *A1= (int *)STARPU_VARIABLE_GET_PTR(buffers[1]);
    *A1+=*A0;
    cout<<"*b + *a= "<<*A1<<endl;
    return;
     }
struct starpu_codelet cl2 = {
    .where = STARPU_CPU,
    .cpu_funcs = { task2, NULL },
    .nbuffers = 2,
    .modes = { STARPU_R, STARPU_RW }
};


void test(int rank)  {
    int* a=new int(1);
    int* b=new int(1);
    int* c=new int(1);
    starpu_data_handle_t data1, data2;
    if (rank==0) {
        starpu_variable_data_register(&data1, STARPU_MAIN_RAM, (uintptr_t)a, sizeof(int));
        starpu_variable_data_register(&data2, -1, (uintptr_t)NULLsizeof(int));
    }
    else {
        starpu_variable_data_register(&data1, -1, (uintptr_t)NULLsizeof(int));
        starpu_variable_data_register(&data2, STARPU_MAIN_RAM, (uintptr_t)b, sizeof(int));
    }
    starpu_mpi_data_register(data1, 0, rank);
    starpu_mpi_data_register(data2, 1, rank);

    if (rank==0) {
        starpu_mpi_task_insert(MPI_COMM_WORLD,&cl1, STARPU_RW, data1, 0);
    }
    else {
        starpu_mpi_task_insert(MPI_COMM_WORLD,&cl2, STARPU_R, data1,STARPU_RW, data2,0);
    }


    return;


}



Best,


Yizhou




Archives gérées par MHonArc 2.6.19+.

Haut de le page