Accéder au contenu.
Menu Sympa

starpu-devel - Re: [Starpu-devel] GPU issue with r12137

Objet : Developers list for StarPU

Archives de la liste

Re: [Starpu-devel] GPU issue with r12137


Chronologique Discussions 
  • From: Samuel Thibault <samuel.thibault@ens-lyon.org>
  • To: Xavier Lacoste <xavier.lacoste@inria.fr>, starpu-devel@lists.gforge.inria.fr
  • Subject: Re: [Starpu-devel] GPU issue with r12137
  • Date: Mon, 7 Apr 2014 18:37:42 +0200
  • List-archive: <http://lists.gforge.inria.fr/pipermail/starpu-devel>
  • List-id: "Developers list. For discussion of new features, code changes, etc." <starpu-devel.lists.gforge.inria.fr>

Samuel Thibault, le Thu 03 Apr 2014 14:24:59 +0200, a écrit :
> Xavier Lacoste, le Thu 03 Apr 2014 13:44:00 +0200, a écrit :
> > diff --git a/src/datawizard/coherency.c b/src/datawizard/coherency.c
> > index 6e67156..8e82163 100644
> > --- a/src/datawizard/coherency.c
> > +++ b/src/datawizard/coherency.c
> > @@ -172,13 +172,13 @@ static int worker_supports_direct_access(unsigned
> > node,
> > unsigned handling_node)
> > enum starpu_node_kind kind = starpu_node_get_kind
> > (handling_node);
> > /* GPUs not always allow direct remote access: if
> > CUDA4
> > * is enabled, we allow two CUDA devices to
> > communicate. */
> > - return
> > -#if 0
> > + return kind ==
> > +#if 1
> > /* CUDA does not seem very safe with
> > concurrent
> > * transfer queueing, avoid queueing from
> > CPUs
> > */
> > - kind == STARPU_CPU_RAM ||
> > + STARPU_CPU_RAM ||
> > #endif
> > - kind == STARPU_CUDA_RAM;
> > + STARPU_CUDA_RAM;
> > }
> > #else
> > /* Direct GPU-GPU transfers are not allowed in
> > general
> > */
> >
> > solves the issue.
>
> Uh?!
>
> Since kind can not be both STARPU_CPU_RAM || STARPU_CUDA_RAM, it
> basically means always returning 0.

Ah, no, I misread that, STARPU_CPU_RAM || STARPU_CUDA_RAM is 1, which
is STARPU_CPU_RAM. So this change makes StarPU always use a CPU worker
for getting data back. The difference is that CPU workers are using
stream 0, which synchronizes everything. Now I just realize: you are
using starpu_cuda_get_local_stream(), but I don't see you calling
cudaStreamSynchronize(starpu_cuda_get_local_stream()); at the end of
your codelets. You have to, StarPU does not waits for completion of
kernels on the stream by itself (at least up to 1.2, which introduces a
STARPU_CUDA_ASYNC flag which enables doing it). This was working just by
luck because CPUs were requesting the data and using stream 0 to fetch
it. You can use the attached patch to work around the issue, but your
source code should really be doing it from its codelets, no version of
StarPU has ever done it. I have also added some assertion in StarPU
which catches when the application does not do it. I can see that at
least po_trfsp1d_sparse_gemm_starpu_cuda doesn't. I tried to add them
into my copy of ricar, but it didn't work for some reason, perhaps some
paths still pointing into your tree.

Samuel
Index: src/drivers/cuda/driver_cuda.c
===================================================================
--- src/drivers/cuda/driver_cuda.c (révision 12580)
+++ src/drivers/cuda/driver_cuda.c (copie de travail)
@@ -368,6 +368,7 @@
_starpu_simgrid_execute_job(j, args->perf_arch, NAN);
#else
func(_STARPU_TASK_GET_INTERFACES(task), task->cl_arg);
+ cudaStreamSynchronize(starpu_cuda_get_local_stream());
#endif
}




Archives gérées par MHonArc 2.6.19+.

Haut de le page