starpu-devel - Re: [Starpu-devel] StarPU fails when using dmda/pheft with more than 1 GPU

Objet : Developers list for StarPU

Archives de la liste

Re: [Starpu-devel] StarPU fails when using dmda/pheft with more than 1 GPU

From: Samuel Thibault <samuel.thibault@ens-lyon.org>
To: David Pereira <david_sape@hotmail.com>
Cc: starpu-devel@lists.gforge.inria.fr
Subject: Re: [Starpu-devel] StarPU fails when using dmda/pheft with more than 1 GPU
Date: Mon, 15 Sep 2014 13:00:10 +0200
List-archive: <http://lists.gforge.inria.fr/pipermail/starpu-devel/>
List-id: "Developers list. For discussion of new features, code changes, etc." <starpu-devel.lists.gforge.inria.fr>

Hello,

David Pereira, le Sat 13 Sep 2014 20:25:37 +0000, a écrit :
> I have a problem which I cannot identify. When using the "pheft" or "dmda"
> schedulers using
> only one GPU (STARPU_NCUDA=1), the program runs smoothly, but when I use the
> two GPUs that I have,
> an unspecified launch failure occurs from a kernel launched by a task. This
> error happens sometimes in one kernel but may also happen in other kernels.
>
> The "dm", "eager" and "peager" don't have this problem.

Hum. dm and dmda are very similar, the difference essentially likes in
the scheduling decision, dmda takes into account data transfer time,
while dm does not, so dmda tends to reduce the number of data transfers.

> Also, is it possible that a CUDA program runs faster when using StarPU
> (using one GPU) than running the program without the framework? The
> StarPU version has parallel tasks unlike the version without the
> framework (which does not use streams).

StarPU implements a lot of optimisations such as overlapping data
transfers with computations, which are often not easy to code directly
in the application. The parallel implementation of tasks can also of
course considerably reduce the critical path in the task graph.

> I'm thinking that StarPU may be launching concurrent kernels... Is
> that right?

It does not do that unless you explicitly set STARPU_NWORKER_PER_CUDA,
and that is implemented only in the trunk, not 1.1.x.

> Last question, how can I know if GPUDirect is been used? I was analyzing
> information from Vite and I think the transfers goes from a GPU
> global memory to RAM first before going to the other GPU.

It's not simple. Seeing the data go through the RAM in Vite
clearly means GPUDirect was not used. Which data interface are you
using? (vector/matrix/something else?) The interface has to have a
cuda_to_cuda_async or any_to_any method for StarPU to be able to make
GPU-GPU transfers. Then you will see transfers go from a GPU to another
GPU in the vite trace. This however does not necessarily mean GPUDirect
is enabled, notably when the two GPUs are not on the same NUMA node. One
can see GPU-Direct enabled or not when the bus gets calibrated: after
CUDA 0 -> 1, one can see GPU-Direct 0 -> 1 when it was possible to
enable it. When it's not displayed, it means CUDA does not allow
GPU-Direct, and thus even if StarPU submits GPU-GPU transfers, and vite
shows them as such, the CUDA driver actually makes the data go through
the host (but more efficiently than if StarPU submitted separate GPU-CPU
+ CPU-GPU transfers)

Samuel

[Starpu-devel] StarPU fails when using dmda/pheft with more than 1 GPU, David Pereira, 13/09/2014
- Re: [Starpu-devel] StarPU fails when using dmda/pheft with more than 1 GPU, Samuel Thibault, 15/09/2014
  - Re: [Starpu-devel] StarPU fails when using dmda/pheft with more than 1 GPU, David Pereira, 21/09/2014
    - Re: [Starpu-devel] StarPU fails when using dmda/pheft with more than 1 GPU, Samuel Thibault, 22/09/2014

Archives gérées par MHonArc 2.6.19+.

Archives de la liste

Re: [Starpu-devel] StarPU fails when using dmda/pheft with more than 1 GPU