Accéder au contenu.
Menu Sympa

starpu-devel - Re: [Starpu-devel] Bug avec le prefetch

Objet : Developers list for StarPU

Archives de la liste

Re: [Starpu-devel] Bug avec le prefetch


Chronologique Discussions 
  • From: Cyril Roelandt <cyril.roelandt@inria.fr>
  • To: Stojce Nakov <stojce.nakov@inria.fr>
  • Cc: starpu-devel@lists.gforge.inria.fr
  • Subject: Re: [Starpu-devel] Bug avec le prefetch
  • Date: Wed, 21 Sep 2011 22:58:18 +0200 (CEST)
  • List-archive: <http://lists.gforge.inria.fr/pipermail/starpu-devel>
  • List-id: "Developers list. For discussion of new features, code changes, etc." <starpu-devel.lists.gforge.inria.fr>

J'ai un problème quand je veux utilise le prefetch.
Si je veux faire une prefetch explicite avec starpu_data_prefetch_on_node  après une taches dans laquelle cette donnée est utiliser, ça bloque  (tout les workers font le boucle while (machine_is_runing), mais il y a pas des taches, par contre l'appelle de starpu_task_wait_for_all ne passe pas).
Ça arrive que dans le cas où au moins une des taches, avant ou après le prefetch, a le donnée en mode W / RW. Si les deux taches ont le donnée en mode R, alors tout ce passe bien.

Il y a une archive en pièce jointe  avec une petit exemple qui illustre cette probleme. En changent le mode d'access du donné v1 en R, tout ce passe bien.  (pour le nombre de chunks, utiliser une puissance de deux)
Je l'ai tester sur la platforme plafrim, sur les machine mirage en utilisant CUDA 3.2.16 et gcc 4.3.2 comme compilateur.


Do you think the following piece of code also highlights your issue ? I think it's a bit simpler, mostly because it does not use CUDA (I'm not a CUDA expert myself, plus it show that the issue is not CUDA-related).
#include <starpu.h>
#include <starpu_cuda.h>
#include <starpu_data_filters.h>

int n, nchunks;

static void dot_kernel_cpu(void *descr[], void *cl_arg)
{
  double *v1 = (double *)STARPU_VECTOR_GET_PTR(descr[0]);
  double *v2 = (double *)STARPU_VECTOR_GET_PTR(descr[1]);
  double *dot = (double *)STARPU_VARIABLE_GET_PTR(descr[2]);
  unsigned n = STARPU_VECTOR_GET_NX(descr[1]);
  unsigned i; 
 
  for( i = 0; i<n; i++)
    *dot += v1[i]* v2[i];
 
}
 
static struct starpu_perfmodel_t dot_model = {
  .type = STARPU_HISTORY_BASED,
  .symbol = "dot_vector"
};

 
static starpu_codelet dot_kernel_cl = {
  .where = STARPU_CPU,
  .cpu_func = dot_kernel_cpu,
  .nbuffers = 3,
  .model = &dot_model
};
 
void dot_kernel1(starpu_data_handle v1,
                 starpu_data_handle v2,
                 starpu_data_handle dot,
                 unsigned nchunks)
{
  unsigned i;
  for (i = 0; i < nchunks; i++)
    {
      struct starpu_task *task = starpu_task_create();
      task = starpu_task_create();
      task->cl=&dot_kernel_cl;
      task->buffers[0].handle = starpu_data_get_sub_data(v1,1,i);
      task->buffers[0].mode = STARPU_RW;
      task->buffers[1].handle = starpu_data_get_sub_data(v2,1,i);
      task->buffers[1].mode = STARPU_R;
      task->buffers[2].handle = dot;
      task->buffers[2].mode = STARPU_RW;

      starpu_task_submit(task);
    }
}
 
starpu_data_handle v1_handle, v2_handle, var_handle;
struct  starpu_data_filter vector_filter;

int
main(int argc, char **argv)
{
  int i;
  double *v1, *v2;
  double var = 0.0;
  n = 1024*256;
  nchunks = 1;

  starpu_data_handle v1_handle, v2_handle, var_handle;

  starpu_init(NULL);
  starpu_helper_cublas_init();

  starpu_malloc((void **)&v1, n*sizeof(double));
  starpu_malloc((void **)&v2, n*sizeof(double));

  for(i=0; i<n; i++) {
    v1[i] = 1.0;
    v2[i] = 2.0;
  }

  printf("init vectors!\n");

  starpu_vector_data_register(&v1_handle, 0, (uintptr_t)v1, n, sizeof(double));
  starpu_vector_data_register(&v2_handle, 0, (uintptr_t)v1, n, sizeof(double));
  starpu_variable_data_register(&var_handle, 0, (uintptr_t)&var, sizeof(double));
  printf("data registered\n");

  vector_filter.filter_func = starpu_block_filter_func_vector;
  vector_filter.nchildren = nchunks;
  starpu_data_partition(v1_handle, &vector_filter);
  starpu_data_partition(v2_handle, &vector_filter);
  printf("data partitioned\n");


  for(i=0; i<n; i++) {
    v1[i] = 1.0;
    v2[i] = 2.0;
  }

//  dot_kernel(v1_handle, v2_handle, var_handle, nchunks);

  for ( i = 0; i < nchunks; i++)
    starpu_data_prefetch_on_node(starpu_data_get_sub_data( v1_handle, 1, i), 0, 1);

  dot_kernel1(v1_handle, v2_handle, var_handle, nchunks);
  printf("task submmited\n");
  starpu_task_wait_for_all();

  starpu_helper_cublas_shutdown();
  starpu_shutdown();

  return 0;
}

I can confirm this hangs on hannibal.


(gdb) info threads
  2 Thread 0x7ffff63ae700 (LWP 27109)  0x00007ffff7b99db0 in _starpu_execute_registered_progression_hooks@plt () from /home/croelandt/opt/lib/libstarpu.so.0
* 1 Thread 0x7ffff7fde700 (LWP 27106)  pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:162
(gdb) bt 
#0  pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:162
#1  0x00007ffff7b9d80b in starpu_task_wait_for_all () at core/task.c:340
#2  0x0000000000400f19 in main (argc=1, argv=0x7fffffffe468) at test_prefetch.c:106
(gdb) thread 2
[Switching to thread 2 (Thread 0x7ffff63ae700 (LWP 27109))]#0  0x00007ffff7b99db0 in _starpu_execute_registered_progression_hooks@plt () from /home/croelandt/opt/lib/libstarpu.so.0
(gdb) bt
#0  0x00007ffff7b99db0 in _starpu_execute_registered_progression_hooks@plt () from /home/croelandt/opt/lib/libstarpu.so.0
#1  0x00007ffff7bb3954 in _starpu_datawizard_progress (memory_node=0, may_alloc=1) at datawizard/progress.c:29
#2  0x00007ffff7bc6153 in _starpu_cpu_worker (arg=0x7ffff7dd65e0) at drivers/cpu/driver_cpu.c:130
#3  0x00007ffff6d35b40 in start_thread (arg=<value optimized out>) at pthread_create.c:304
#4  0x00007ffff74b736d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:112
#5  0x0000000000000000 in ?? ()



What's quite funny is that replacing :
starpu_data_prefetch_on_node(starpu_data_get_sub_data( v1_handle, 1, i), 0, 1);
by :
starpu_data_prefetch_on_node(starpu_data_get_sub_data( v1_handle, 1, i), 0, 0);

It works juste fine, so we might want to look at the differences between the sync/non-sync code.

I'm also wondering why STARPU_R is hardcoded in starpu_data_prefetch_on_node :

int starpu_data_prefetch_on_node(starpu_data_handle handle, unsigned node, unsigned async)
{
        return _starpu_prefetch_data_on_node_with_mode(handle, node, async, STARPU_R);
}

Anyone ?


Cyril.




Archives gérées par MHonArc 2.6.19+.

Haut de le page