Accéder au contenu.
Menu Sympa

starpu-devel - Re: [Starpu-devel] Problem running CUDA task

Objet : Developers list for StarPU

Archives de la liste

Re: [Starpu-devel] Problem running CUDA task


Chronologique Discussions 
  • From: Miguel Palhas <mpalhas@gmail.com>
  • To: starpu-devel@lists.gforge.inria.fr
  • Subject: Re: [Starpu-devel] Problem running CUDA task
  • Date: Mon, 6 May 2013 12:47:14 +0100
  • List-archive: <http://lists.gforge.inria.fr/pipermail/starpu-devel>
  • List-id: "Developers list. For discussion of new features, code changes, etc." <starpu-devel.lists.gforge.inria.fr>

An update. It seems the problem is with the kernel itself not being called.

This is the code for both the CUDA kernel and the function associated with the cuda_funcs member of the codelet:

static __global__ void cuda_kernel_impl(float *val, unsigned n, float factor) {
  unsigned i = blockIdx.x * blockDim.x + threadIdx.x;
  cuPrintf("got here %d %d\n", i, n);
  if (i < n) {
    val[i] *= factor;
    cuPrintf("%f\n", val[i]);
  }
}

extern "C" void cuda_kernel(void *buffers[], void *args) {
  float *factor = (float*) args;

  // length of the vector
  unsigned n = STARPU_VECTOR_GET_NX(buffers[0]);

  // CUDA copy of the vector pointer
  float *val = (float*) STARPU_VECTOR_GET_PTR(buffers[0]);
  unsigned threads_per_block = 64;
  unsigned nblocks = (n + threads_per_block - 1) / threads_per_block;
  
  printf("i'm in the proxy function %d %d\n", nblocks, threads_per_block);
  cudaPrintfInit();
  cuda_kernel_impl<<<nblocks, threads_per_block, 0, starpu_cuda_get_local_stream()>>>(val, n, *factor);
  cudaStreamSynchronize(starpu_cuda_get_local_stream());
  cudaPrintfDisplay(stdout, true);
  cudaPrintfEnd();
}

As you can see, i'm printing "i'm in the proxy function", and the invoking the kernel, which also prints some stuff.

This is what i get when the result is correct:
i'm in the proxy function 1 64
[0, 0]: got here 0 5
[0, 32]: got here 32 5
[0, 1]: got here 1 5
... (64 lines like this)
[0, 62]: got here 62 5
[0, 63]: got here 63 5
[0, 0]: 0.000000
[0, 1]: 3.140000
[0, 2]: 6.280000
[0, 3]: 9.420000
[0, 4]: 12.560000
0.000000
3.140000
6.280000
9.420000
12.560000


Those last 5 lines are the output array being printed at the end of the main function, like shown in my first email. They match the output of the cuPrintf function during the kernel
Sometimes, however, i get this:

i'm in the proxy function 1 64
0.000000
1.000000
2.000000
3.000000
4.000000

As you can see, i got to the proxy function without a problem, but i get no output at all from cuPrintf, which suggests the kernel is not invoked at all. Then the main function outputs the array with it's initial values unchanged.
What could be the problem here? Since it's just a raw kernel invocation, i'm beginning to think the problem is not with StarPU but with the machine itself.



On Sun, May 5, 2013 at 11:43 PM, Miguel Palhas <mpalhas@gmail.com> wrote:
Greetings

I'm once again having problems, this time with the vector scaling sample provided in the docs

The problem is when the task is ran using a CUDA device. I've changed the codelet definition to only use STARPU_CUDA to better test this problem.

i'm doing everything like shown in the sample:

struct starpu_task *task = starpu_task_create
task->synchronous = 1;
task->cl = &cl;
task->handles[0] = vector_handle;
task->cl_arg = &factor;
task->cl_arg_size = sizeof(float);

starpu_task_submit(task);

starpu_data_unregister(vector_handle);
starpu_shutdown();

for(i = 0; i < NX; ++i) printf("%f\n", values[i]);


The problem is that 50% of the times i try this, the values array is printed unchanged at the end. This seems to be random, but somehow the data is not being synchronized after the CUDA kernel executes. What did i do wrong here? If required, i can provide the full source code i used

Also, another question: I tried, in order to fix this, to place a starpu_data_acquire(vector_handle, STARPU_R) right before the starpu_data_unregister call, but then the program hangs completely at that point. What is going on here?

--
Cumprimentos
Miguel Palhas



--
Cumprimentos
Miguel Palhas



Archives gérées par MHonArc 2.6.19+.

Haut de le page