Accéder au contenu.
Menu Sympa

starpu-devel - Re: [Starpu-devel] Deadlock with starpu_calibrate

Objet : Developers list for StarPU

Archives de la liste

Re: [Starpu-devel] Deadlock with starpu_calibrate


Chronologique Discussions 
  • From: Mathieu Faverge <mathieu.faverge@inria.fr>
  • To: Ian Masliah <ian.masliah@inria.fr>, Samuel Thibault <samuel.thibault@inria.fr>, starpu-devel@lists.gforge.inria.fr
  • Subject: Re: [Starpu-devel] Deadlock with starpu_calibrate
  • Date: Sun, 4 Dec 2016 20:29:41 +0100
  • List-archive: <http://lists.gforge.inria.fr/pipermail/starpu-devel/>
  • List-id: "Developers list. For discussion of new features, code changes, etc." <starpu-devel.lists.gforge.inria.fr>

Is there a way to ask starpu to calibrate only two GPUs ?  or just one ?
Thanks
Mathieu

Le 04/12/2016 à 20:07, Ian Masliah a écrit :
I didn't find a solution to the problem unfortuntaley and it happened on multiple machines with K40 and K20s. I tried to recompile with a different
CUDA toolkit but it didn't change anything for me so I'm not sure. 

I did try to disable GPU_DIRECT with export STARPU_ENABLE_CUDA_GPU_GPU_DIRECT=0 but it didn't work
either at that time.

2016-12-04 20:02 GMT+01:00 Samuel Thibault <samuel.thibault@inria.fr>:
Mathieu Faverge, on Sun 04 Dec 2016 19:53:01 +0100, wrote:
> Here it is:
> #0  0x00007ffff7ff9b24 in clock_gettime ()
> #1  0x00007ffff7858ea6 in clock_gettime () from /lib64/librt.so.1
> #2  0x00007ffff3be2f1e in ?? () from /usr/lib64/nvidia/libcuda.so.1
> #3  0x00007ffff3c70325 in ?? () from /usr/lib64/nvidia/libcuda.so.1
> #4  0x00007ffff3ba890e in ?? () from /usr/lib64/nvidia/libcuda.so.1
> #5  0x00007ffff3ba8b01 in ?? () from /usr/lib64/nvidia/libcuda.so.1
> #6  0x00007ffff3ae192a in ?? () from /usr/lib64/nvidia/libcuda.so.1
> #7  0x00007ffff3c0b648 in cuCtxSynchronize () from
> /usr/lib64/nvidia/libcuda.so.1
> #8  0x00007ffff6dca179 in ?? () from /opt/cuda/8.0/lib64/libcudart.so.8.0
> #9  0x00007ffff6deb5f9 in cudaThreadSynchronize () from
> /opt/cuda/8.0/lib64/libcudart.so.8.0
...

> And if you do a strace, the number of calls to clock_gettime is just insane.

So it really looks like CUDA getting stuck in there.  Not StarPU's fault
then.  You could try to set

export STARPU_ENABLE_CUDA_GPU_GPU_DIRECT=0

in case CUDA has troubles with GPU direct.

Mathieu Faverge, on Sun 04 Dec 2016 19:59:55 +0100, wrote:
> Ok I'll check but it might just be that I didn't use the same cuda on the
> compilation and compute node.

That could be a problem indeed, I'm not really surprised by CUDA bugs
any more :)

Samuel



-- 
--
Mathieu Faverge
Maitre de conférence / Associate Professor
Institut Polytechnique de Bordeaux - ENSEIRB-Matmeca
INRIA Bordeaux - Sud-Ouest, HiePACS Team
200 avenue de la vielle tour
33405 Talence Cedex
Phone: (+33) 5 24 57 40 73



Archives gérées par MHonArc 2.6.19+.

Haut de le page