Objet : Developers list for StarPU
Archives de la liste
- From: Ian Masliah <ian.masliah@inria.fr>
- To: Samuel Thibault <samuel.thibault@inria.fr>, Mathieu Faverge <mathieu.faverge@inria.fr>, starpu-devel@lists.gforge.inria.fr, Ian Masliah <ian.masliah@inria.fr>
- Subject: Re: [Starpu-devel] Deadlock with starpu_calibrate
- Date: Sun, 4 Dec 2016 20:07:32 +0100
- List-archive: <http://lists.gforge.inria.fr/pipermail/starpu-devel/>
- List-id: "Developers list. For discussion of new features, code changes, etc." <starpu-devel.lists.gforge.inria.fr>
I didn't find a solution to the problem unfortuntaley and it happened on multiple machines with K40 and K20s. I tried to recompile with a different
CUDA toolkit but it didn't change anything for me so I'm not sure.
I did try to disable GPU_DIRECT with export STARPU_ENABLE_CUDA_GPU_GPU_DIRECT=0 but it didn't work
either at that time.
2016-12-04 20:02 GMT+01:00 Samuel Thibault <samuel.thibault@inria.fr>:
Mathieu Faverge, on Sun 04 Dec 2016 19:53:01 +0100, wrote:
> Here it is:
> #0 0x00007ffff7ff9b24 in clock_gettime ()
> #1 0x00007ffff7858ea6 in clock_gettime () from /lib64/librt.so.1
> #2 0x00007ffff3be2f1e in ?? () from /usr/lib64/nvidia/libcuda.so.1
> #3 0x00007ffff3c70325 in ?? () from /usr/lib64/nvidia/libcuda.so.1
> #4 0x00007ffff3ba890e in ?? () from /usr/lib64/nvidia/libcuda.so.1
> #5 0x00007ffff3ba8b01 in ?? () from /usr/lib64/nvidia/libcuda.so.1
> #6 0x00007ffff3ae192a in ?? () from /usr/lib64/nvidia/libcuda.so.1
> #7 0x00007ffff3c0b648 in cuCtxSynchronize () from
> /usr/lib64/nvidia/libcuda.so.1
> #8 0x00007ffff6dca179 in ?? () from /opt/cuda/8.0/lib64/libcudart.so.8.0
> #9 0x00007ffff6deb5f9 in cudaThreadSynchronize () from
> /opt/cuda/8.0/lib64/libcudart.so.8.0
...
> And if you do a strace, the number of calls to clock_gettime is just insane.
So it really looks like CUDA getting stuck in there. Not StarPU's fault
then. You could try to set
export STARPU_ENABLE_CUDA_GPU_GPU_DIRECT=0
in case CUDA has troubles with GPU direct.
Mathieu Faverge, on Sun 04 Dec 2016 19:59:55 +0100, wrote:
> Ok I'll check but it might just be that I didn't use the same cuda on the
> compilation and compute node.
That could be a problem indeed, I'm not really surprised by CUDA bugs
any more :)
Samuel
- [Starpu-devel] Deadlock with starpu_calibrate, Mathieu Faverge, 04/12/2016
- Re: [Starpu-devel] Deadlock with starpu_calibrate, Samuel Thibault, 04/12/2016
- Re: [Starpu-devel] Deadlock with starpu_calibrate, Mathieu Faverge, 04/12/2016
- Re: [Starpu-devel] Deadlock with starpu_calibrate, Samuel Thibault, 04/12/2016
- Re: [Starpu-devel] Deadlock with starpu_calibrate, Ian Masliah, 04/12/2016
- Re: [Starpu-devel] Deadlock with starpu_calibrate, Mathieu Faverge, 04/12/2016
- Re: [Starpu-devel] Deadlock with starpu_calibrate, Mathieu Faverge, 04/12/2016
- Re: [Starpu-devel] Deadlock with starpu_calibrate, Samuel Thibault, 04/12/2016
- Re: [Starpu-devel] Deadlock with starpu_calibrate, Mathieu Faverge, 04/12/2016
- Re: [Starpu-devel] Deadlock with starpu_calibrate, Samuel Thibault, 04/12/2016
- Re: [Starpu-devel] Deadlock with starpu_calibrate, Mathieu Faverge, 04/12/2016
- Re: [Starpu-devel] Deadlock with starpu_calibrate, Ian Masliah, 04/12/2016
- Re: [Starpu-devel] Deadlock with starpu_calibrate, Samuel Thibault, 04/12/2016
- Re: [Starpu-devel] Deadlock with starpu_calibrate, Mathieu Faverge, 04/12/2016
- Re: [Starpu-devel] Deadlock with starpu_calibrate, Mathieu Faverge, 04/12/2016
- Re: [Starpu-devel] Deadlock with starpu_calibrate, Samuel Thibault, 04/12/2016
Archives gérées par MHonArc 2.6.19+.