Objet : Developers list for StarPU
Archives de la liste
- From: Nathalie Furmento <nathalie.furmento@labri.fr>
- To: "starpu-devel@lists.gforge.inria.fr" <starpu-devel@lists.gforge.inria.fr>
- Subject: [Starpu-devel] Freebsd and pthread_barrier_wait
- Date: Tue, 16 Jul 2013 17:06:52 +0200
- List-archive: <http://lists.gforge.inria.fr/pipermail/starpu-devel>
- List-id: "Developers list. For discussion of new features, code changes, etc." <starpu-devel.lists.gforge.inria.fr>
The test ./tests/parallel_tasks/spmd_peager deadlocks on freebsd.
(see https://ci.inria.fr/starpu/job/starpu-trunk/label=starpu-freebsd91amd64/490/consoleText
https://ci.inria.fr/starpu/job/starpu-trunk/label=starpu-freebsd91amd64/490/)
As valgrind indicates below, one of the workers belonging to the parallel job ends (even though there is a barrier) and destroys the job structure, while the other worker is stil waiting on the barrier.
==47989== Thread 3:
==47989== Invalid read of size 8
==47989== at 0x1B55AC8: pthread_barrier_wait (in /lib/libthr.so.3)
==47989== by 0x1274D7B: execute_job_on_cpu (driver_cpu.c:172)
==47989== by 0x127559C: _starpu_cpu_driver_run_once (driver_cpu.c:325)
==47989== by 0x12756F0: _starpu_cpu_worker (driver_cpu.c:389)
==47989== by 0x1B550A3: ??? (in /lib/libthr.so.3)
==47989== Address 0x226e890 is 48 bytes inside a block of size 64 free'd
==47989== at 0x1005EDE: free (in /usr/local/lib/valgrind/vgpreload_memcheck-amd64-freebsd.so)
==47989== by 0x1B55A42: pthread_barrier_destroy (in /lib/libthr.so.3)
==47989== by 0x12217E9: _starpu_job_destroy (jobs.c:106)
==47989== by 0x1222D70: starpu_task_clean (task.c:119)
==47989== by 0x1222E29: _starpu_task_destroy (task.c:155)
==47989== by 0x12221F5: _starpu_handle_job_termination (jobs.c:276)
==47989== by 0x1275635: _starpu_cpu_driver_run_once (driver_cpu.c:352)
==47989== by 0x12756F0: _starpu_cpu_worker (driver_cpu.c:389)
==47989== by 0x1B550A3: ??? (in /lib/libthr.so.3)
==47989==
I checked the size given in parameter to pthread_barrier_init, the pointers for the barrier, everything seems to be allright. Note that the problem only seems to happen on freebdsd,
If anybody has any ideas what could be wrong.
Here the backtrace with gdb when interruping the program.
(gdb) bt
#0 0x000000080116b87c in pthread_kill () from /lib/libthr.so.3
#1 0x0000000801164ae0 in pthread_barrier_wait () from /lib/libthr.so.3
#2 0x0000000800883d4c in execute_job_on_cpu (j=0x80193c580, worker_task=0x801944000, cpu_args=0x800ab8b40, rank=1, perf_arch=1) at ../../src/drivers/cpu/driver_cpu.c:172
#3 0x000000080088456d in _starpu_cpu_driver_run_once (d=0x7fffff9fcfb0) at ../../src/drivers/cpu/driver_cpu.c:325
#4 0x00000008008846c1 in _starpu_cpu_worker (arg=0x800ab8b40) at ../../src/drivers/cpu/driver_cpu.c:389
#5 0x00000008011640a4 in pthread_getprio () from /lib/libthr.so.3
#6 0x0000000000000000 in ?? ()
Error accessing memory address 0x7fffff9fd000: Bad address.
And the ones for all the threads.
(gdb) thread apply all bt
Thread 4 (Thread 801808400 (LWP 121406/spmd_peager)):
#0 0x000000080116b87c in pthread_kill () from /lib/libthr.so.3
#1 0x0000000801164ae0 in pthread_barrier_wait () from /lib/libthr.so.3
#2 0x0000000800883d4c in execute_job_on_cpu (j=0x80193c580, worker_task=0x801944000, cpu_args=0x800ab8b40, rank=1, perf_arch=1) at ../../src/drivers/cpu/driver_cpu.c:172
#3 0x000000080088456d in _starpu_cpu_driver_run_once (d=0x7fffff9fcfb0) at ../../src/drivers/cpu/driver_cpu.c:325
#4 0x00000008008846c1 in _starpu_cpu_worker (arg=0x800ab8b40) at ../../src/drivers/cpu/driver_cpu.c:389
#5 0x00000008011640a4 in pthread_getprio () from /lib/libthr.so.3
#6 0x0000000000000000 in ?? ()
Error accessing memory address 0x7fffff9fd000: Bad address.
Thread 3 (Thread 801808000 (LWP 100601/spmd_peager)):
#0 0x000000080116b87c in pthread_kill () from /lib/libthr.so.3
#1 0x0000000801164ae0 in pthread_barrier_wait () from /lib/libthr.so.3
#2 0x0000000800883b49 in execute_job_on_cpu (j=0x80193c700, worker_task=0x801932a00, cpu_args=0x800ab89b8, rank=0, perf_arch=1) at ../../src/drivers/cpu/driver_cpu.c:139
#3 0x000000080088456d in _starpu_cpu_driver_run_once (d=0x7fffffbfdfb0) at ../../src/drivers/cpu/driver_cpu.c:325
#4 0x00000008008846c1 in _starpu_cpu_worker (arg=0x800ab89b8) at ../../src/drivers/cpu/driver_cpu.c:389
#5 0x00000008011640a4 in pthread_getprio () from /lib/libthr.so.3
#6 0x0000000000000000 in ?? ()
Error accessing memory address 0x7fffffbfe000: Bad address.
Thread 2 (Thread 801807400 (LWP 139082/spmd_peager)):
#0 0x000000080116b87c in pthread_kill () from /lib/libthr.so.3
#1 0x0000000801165b25 in pthread_getschedparam () from /lib/libthr.so.3
#2 0x000000080116dc8d in pthread_cond_signal () from /lib/libthr.so.3
#3 0x00000008008301ed in starpu_pthread_cond_wait (cond=0x800af1e20, mutex=0x800af1e10) at ../../src/common/thread.c:308
#4 0x000000080082e9c5 in _starpu_barrier_counter_wait_for_empty_counter (barrier_c=0x800af1e00) at ../../src/common/barrier_counter.c:41
#5 0x000000080085309f in _starpu_wait_for_all_tasks_of_sched_ctx (sched_ctx_id=0) at ../../src/core/sched_ctx.c:776
#6 0x0000000800833c8d in starpu_task_wait_for_all () at ../../src/core/task.c:718
#7 0x0000000000400f2d in main (argc=1, argv=0x7fffffffda10) at ../../tests/parallel_tasks/spmd_peager.c:89
#0 0x000000080116b87c in pthread_kill () from /lib/libthr.so.3
Cheers,
Nathalie
- [Starpu-devel] Freebsd and pthread_barrier_wait, Nathalie Furmento, 16/07/2013
- Re: [Starpu-devel] Freebsd and pthread_barrier_wait, Samuel Thibault, 17/07/2013
Archives gérées par MHonArc 2.6.19+.