Accéder au contenu.
Menu Sympa

starpu-devel - Re: [Starpu-devel] increment_redux_v2

Objet : Developers list for StarPU

Archives de la liste

Re: [Starpu-devel] increment_redux_v2


Chronologique Discussions 
  • From: Olivier Aumage <olivier.aumage@inria.fr>
  • To: starpu-devel@lists.gforge.inria.fr
  • Subject: Re: [Starpu-devel] increment_redux_v2
  • Date: Fri, 16 Sep 2011 10:53:02 +0200
  • List-archive: <http://lists.gforge.inria.fr/pipermail/starpu-devel>
  • List-id: "Developers list. For discussion of new features, code changes, etc." <starpu-devel.lists.gforge.inria.fr>

Hi,

It appears that for some reasons, a handle may in some cases be freed in
_starpu_data_unregister() while another thread is running the loop in
_starpu_notify_data_dependencies().

Attachment: starpu_4112_debug.patch
Description: Binary data



The patch enclosed show that _starpu_notify_data_dependencies() may perform a
_starpu_spin_lock(&handle->header_lock) on a handle that is already freed
when running the increment_redux or increment_redux_v2 tests.

So, what are the current assumptions/invariants with respect to preventing
_starpu_data_unregister to free a handle that is still being accessed?
--
Olivier

Le 9 sept. 2011 à 15:50, Olivier Aumage a écrit :

> Hi,
>
> I am frequently experiencing hangs with the 'datawizard/increment_redux_v2'
> test of the StarPU test suite.
> The hang frequency is about 1 hang out of 5 runs with '--enable-debug'
> enabled. Frequency is 1 hang out of 3 runs without --enable-debug
>
> The machine used is a regular node from the PlaFRIM machine without GPU
> (e.g. a 'fourmi' node).
>
> Module list:
> ------------
> 1) tools/svn/1.6.11 6) compiler/gcc/4.6.0
> 2) lib/gmp/5.0.1 7) tools/autoconf/2.68
> 3) lib/mpfr/3.0.0 8) tools/automake/1.11.1
> 4) lib/mpc/0.8.2 9) tools/libtool/2.4
> 5) lib/libelf/0.8.13
>
> StarPU SVN revision:
> --------------------
> 4083
>
> Configure settings:
> -------------------
> ../trunk/configure --prefix=/home/aumage/SVN/StarPU/install --disable-cuda
> --enable-debug
>
> Symptoms
> --------
> - The program hangs
> - Analysing the CTRL-/ core dump always show 2 threads lefts. The main
> thread is waiting on pthread_join. The other thread is waiting on a
> spinlock.
>
> GDB core dump analysis after CTRL-/:
> ------------------------------------
> Core was generated by `./increment_redux_v2'.
> Program terminated with signal 3, Quit.
> #0 0x00007f0b61ca37b5 in pthread_join () from /lib64/libpthread.so.0
> (gdb) info threads
> 2 Thread 606 0x00007f0b61ca7672 in ?? () from /lib64/libpthread.so.0
> * 1 Thread 605 0x00007f0b61ca37b5 in pthread_join () from
> /lib64/libpthread.so.0
> (gdb) bt
> #0 0x00007f0b61ca37b5 in pthread_join () from /lib64/libpthread.so.0
> #1 0x00007f0b62899eaa in _starpu_terminate_workers (config=0x7f0b62ad1480)
> at ../../trunk/src/core/workers.c:405
> #2 0x00007f0b6289a09a in starpu_shutdown () at
> ../../trunk/src/core/workers.c:481
> #3 0x0000000000400bd1 in main ()
> (gdb) thread 2
> [Switching to thread 2 (Thread 606)]#0 0x00007f0b61ca7672 in ?? () from
> /lib64/libpthread.so.0
> (gdb) bt
> #0 0x00007f0b61ca7672 in ?? () from /lib64/libpthread.so.0
> #1 0x00007f0b628966ff in _starpu_spin_lock (lock=0x603b40) at
> ../../trunk/src/common/starpu_spinlock.c:72
> #2 0x00007f0b6289ebce in _starpu_notify_data_dependencies
> (handle=0x603b30) at
> ../../trunk/src/core/dependencies/data_concurrency.c:336
> #3 0x00007f0b628abc18 in _starpu_release_data_on_node (handle=0x603b30,
> default_wt_mask=0, replicate=0x603b70) at
> ../../trunk/src/datawizard/coherency.c:474
> #4 0x00007f0b628abfd9 in _starpu_push_task_output (task=0x7f0b4c1f1310,
> mask=0) at ../../trunk/src/datawizard/coherency.c:616
> #5 0x00007f0b628bfd5e in execute_job_on_cpu (j=0x7f0b4c1f1460,
> cpu_args=0x7f0b62ad1640, is_parallel_task=0, rank=0,
> perf_arch=STARPU_CPU_DEFAULT)
> at ../../trunk/src/drivers/cpu/driver_cpu.c:72
> #6 0x00007f0b628c02ce in _starpu_cpu_worker (arg=0x7f0b62ad1640) at
> ../../trunk/src/drivers/cpu/driver_cpu.c:182
> #7 0x00007f0b61ca3070 in start_thread () from /lib64/libpthread.so.0
> #8 0x00007f0b6260110d in clone () from /lib64/libc.so.6
> #9 0x0000000000000000 in ?? ()
> (gdb)
> %-----
>
> --
> Olivier
>
>
> _______________________________________________
> Starpu-devel mailing list
> Starpu-devel@lists.gforge.inria.fr
> http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/starpu-devel




Archives gérées par MHonArc 2.6.19+.

Haut de le page