Accéder au contenu.
Menu Sympa

starpu-devel - Re: [Starpu-devel] Deadlock with starpu_calibrate

Objet : Developers list for StarPU

Archives de la liste

Re: [Starpu-devel] Deadlock with starpu_calibrate


Chronologique Discussions 
  • From: Mathieu Faverge <mathieu.faverge@inria.fr>
  • To: Samuel Thibault <samuel.thibault@inria.fr>, starpu-devel@lists.gforge.inria.fr, Ian Masliah <ian.masliah@inria.fr>
  • Subject: Re: [Starpu-devel] Deadlock with starpu_calibrate
  • Date: Sun, 4 Dec 2016 20:12:16 +0100
  • List-archive: <http://lists.gforge.inria.fr/pipermail/starpu-devel/>
  • List-id: "Developers list. For discussion of new features, code changes, etc." <starpu-devel.lists.gforge.inria.fr>

Le 04/12/2016 à 20:02, Samuel Thibault a écrit :
Mathieu Faverge, on Sun 04 Dec 2016 19:53:01 +0100, wrote:
Here it is:
#0 0x00007ffff7ff9b24 in clock_gettime ()
#1 0x00007ffff7858ea6 in clock_gettime () from /lib64/librt.so.1
#2 0x00007ffff3be2f1e in ?? () from /usr/lib64/nvidia/libcuda.so.1
#3 0x00007ffff3c70325 in ?? () from /usr/lib64/nvidia/libcuda.so.1
#4 0x00007ffff3ba890e in ?? () from /usr/lib64/nvidia/libcuda.so.1
#5 0x00007ffff3ba8b01 in ?? () from /usr/lib64/nvidia/libcuda.so.1
#6 0x00007ffff3ae192a in ?? () from /usr/lib64/nvidia/libcuda.so.1
#7 0x00007ffff3c0b648 in cuCtxSynchronize () from
/usr/lib64/nvidia/libcuda.so.1
#8 0x00007ffff6dca179 in ?? () from /opt/cuda/8.0/lib64/libcudart.so.8.0
#9 0x00007ffff6deb5f9 in cudaThreadSynchronize () from
/opt/cuda/8.0/lib64/libcudart.so.8.0
...

And if you do a strace, the number of calls to clock_gettime is just insane.
So it really looks like CUDA getting stuck in there. Not StarPU's fault
then. You could try to set

export STARPU_ENABLE_CUDA_GPU_GPU_DIRECT=0

in case CUDA has troubles with GPU direct.
I tried that, but when you export that variable, it doesn't test the direct link however it is still stuck in the next step.


Mathieu Faverge, on Sun 04 Dec 2016 19:59:55 +0100, wrote:
Ok I'll check but it might just be that I didn't use the same cuda on the
compilation and compute node.
That could be a problem indeed, I'm not really surprised by CUDA bugs
any more :)

I recompiled everything, and it is still the same behaviour :(.


Samuel



--
--
Mathieu Faverge
Maitre de conférence / Associate Professor
Institut Polytechnique de Bordeaux - ENSEIRB-Matmeca
INRIA Bordeaux - Sud-Ouest, HiePACS Team
200 avenue de la vielle tour
33405 Talence Cedex
Phone: (+33) 5 24 57 40 73





Archives gérées par MHonArc 2.6.19+.

Haut de le page