Accéder au contenu.
Menu Sympa

starpu-devel - Re: [Starpu-devel] Deadlock with starpu_calibrate

Objet : Developers list for StarPU

Archives de la liste

Re: [Starpu-devel] Deadlock with starpu_calibrate


Chronologique Discussions 
  • From: Mathieu Faverge <mathieu.faverge@inria.fr>
  • To: Samuel Thibault <samuel.thibault@inria.fr>, starpu-devel@lists.gforge.inria.fr, Ian Masliah <ian.masliah@inria.fr>
  • Subject: Re: [Starpu-devel] Deadlock with starpu_calibrate
  • Date: Sun, 4 Dec 2016 19:53:01 +0100
  • List-archive: <http://lists.gforge.inria.fr/pipermail/starpu-devel/>
  • List-id: "Developers list. For discussion of new features, code changes, etc." <starpu-devel.lists.gforge.inria.fr>

Hello,

Here it is:
#0 0x00007ffff7ff9b24 in clock_gettime ()
#1 0x00007ffff7858ea6 in clock_gettime () from /lib64/librt.so.1
#2 0x00007ffff3be2f1e in ?? () from /usr/lib64/nvidia/libcuda.so.1
#3 0x00007ffff3c70325 in ?? () from /usr/lib64/nvidia/libcuda.so.1
#4 0x00007ffff3ba890e in ?? () from /usr/lib64/nvidia/libcuda.so.1
#5 0x00007ffff3ba8b01 in ?? () from /usr/lib64/nvidia/libcuda.so.1
#6 0x00007ffff3ae192a in ?? () from /usr/lib64/nvidia/libcuda.so.1
#7 0x00007ffff3c0b648 in cuCtxSynchronize () from /usr/lib64/nvidia/libcuda.so.1
#8 0x00007ffff6dca179 in ?? () from /opt/cuda/8.0/lib64/libcudart.so.8.0
#9 0x00007ffff6deb5f9 in cudaThreadSynchronize () from /opt/cuda/8.0/lib64/libcudart.so.8.0
#10 0x00007ffff7aa1285 in measure_bandwidth_between_dev_and_dev_cuda (src=0, dst=1) at core/perfmodel/perfmodel_bus.c:312
#11 0x00007ffff7aa17a9 in benchmark_all_gpu_devices () at core/perfmodel/perfmodel_bus.c:717
#12 0x00007ffff7aa1803 in generate_bus_affinity_file () at core/perfmodel/perfmodel_bus.c:925
#13 0x00007ffff7aa293e in _starpu_bus_force_sampling () at core/perfmodel/perfmodel_bus.c:2430
#14 0x00007ffff7aa29d4 in check_bus_config_file () at core/perfmodel/perfmodel_bus.c:1634
#15 _starpu_load_bus_performance_files () at core/perfmodel/perfmodel_bus.c:2450
#16 0x00007ffff7a88dff in starpu_initialize (user_conf=<optimized out>, argc=<optimized out>, argv=<optimized out>) at core/workers.c:1219
#17 0x00000000004008f8 in main (argc=<optimized out>, argv=<optimized out>) at starpu_calibrate_bus.c:81


And if you do a strace, the number of calls to clock_gettime is just insane. And I forgot, it is with the tarball 1.2.0.

Mathieu

Le 04/12/2016 à 19:44, Samuel Thibault a écrit :
Hello,

Mathieu Faverge, on Sun 04 Dec 2016 19:39:07 +0100, wrote:
doesn't seem to make any progress. It is stuck after printing:

$ starpu_calibrate_bus
[starpu][check_bus_config_file] No performance model for the bus,
calibrating...
[starpu][benchmark_all_gpu_devices] CUDA 0...
[starpu][benchmark_all_gpu_devices] CUDA 1...
[starpu][benchmark_all_gpu_devices] CUDA 2...
[starpu][benchmark_all_gpu_devices] CUDA 0 -> 1...
[starpu][measure_bandwidth_between_dev_and_dev_cuda] GPU-Direct 1 -> 0
[starpu][measure_bandwidth_between_dev_and_dev_cuda] GPU-Direct 0 -> 1
Could you get a backtrace? I'm afraid it's probably just stuck inside a
CUDA call.

Samuel



--
--
Mathieu Faverge
Maitre de conférence / Associate Professor
Institut Polytechnique de Bordeaux - ENSEIRB-Matmeca
INRIA Bordeaux - Sud-Ouest, HiePACS Team
200 avenue de la vielle tour
33405 Talence Cedex
Phone: (+33) 5 24 57 40 73





Archives gérées par MHonArc 2.6.19+.

Haut de le page