Accéder au contenu.
Menu Sympa

starpu-devel - [Starpu-devel] segfault with dynamic partitioning and STARPU_PROFILING=1

Objet : Developers list for StarPU

Archives de la liste

[Starpu-devel] segfault with dynamic partitioning and STARPU_PROFILING=1


Chronologique Discussions 
  • From: Alfredo Buttari <alfredo.buttari@enseeiht.fr>
  • To: starpu-devel@lists.gforge.inria.fr
  • Subject: [Starpu-devel] segfault with dynamic partitioning and STARPU_PROFILING=1
  • Date: Wed, 31 Aug 2016 11:24:57 +0200
  • Authentication-results: mail3-smtp-sop.national.inria.fr; spf=None smtp.pra=alfredo.buttari@gmail.com; spf=Pass smtp.mailfrom=alfredo.buttari@gmail.com; spf=None smtp.helo=postmaster@mail-wm0-f44.google.com
  • Ironport-phdr: 9a23:LXvX7x1CBF5jLTkLsmDT+DRfVm0co7zxezQtwd8ZsegeK/ad9pjvdHbS+e9qxAeQG96KsrQY26GO6uigATVGusfZ9ihaMdRlbFwssY0uhQsuAcqIWwXQDcXBSGgEJvlET0Jv5HqhMEJYS47UblzWpWCuv3ZJQk2sfTR8Kum9IIPOlcP/j7n0oMyKJVkYz2LkKfMqdVPt/F2X7pFXyaJZaY8JgiPTpXVJf+kEjUhJHnm02yjG28Gr4ZR4+D5Rsf9yv+RJUKH9YrhqBecAVGduYCgJ45jwqRDZVRbK6nYCX2E+lhtTHxOD4x/9RJj89Cr8rOt0nieAbuPsSrVhcDCs9apnT1fClTsbPiQ4uDXejsJqga5c5hi8uwB22Y/8bYeOOfd/fr+bc8lMFjkJZdpYSyEUWtD0VIAIFedUZes=
  • List-archive: <http://lists.gforge.inria.fr/pipermail/starpu-devel/>
  • List-id: "Developers list. For discussion of new features, code changes, etc." <starpu-devel.lists.gforge.inria.fr>

Hello,
I am observing segfaults in a code that uses dynamic partitioning (partition_plan -> partition submit -> unpartition_submit) when STARPU_PROFILING is set to 1.
Here's the gdb backtrace:


(gdb) bt
#0  0x00007ffff4aaf547 in _starpu_fetch_nowhere_task_input (j=<optimized out>) at datawizard/coherency.c:1169
#1  0x00007ffff4a90d75 in _starpu_repush_task (j=j@entry=0xcb5c30) at core/sched_policy.c:463
#2  0x00007ffff4a90f76 in _starpu_push_task (j=j@entry=0xcb5c30) at core/sched_policy.c:402
#3  0x00007ffff4a6ae97 in _starpu_enforce_deps_starting_from_task (j=j@entry=0xcb5c30) at core/jobs.c:667
#4  0x00007ffff4a78d51 in _starpu_notify_cg (cg=cg@entry=0xaf4b30) at core/dependencies/cg.c:232
#5  0x00007ffff4a78fd7 in _starpu_notify_cg_list (successors=successors@entry=0xc8e240) at core/dependencies/cg.c:281
#6  0x00007ffff4a7e6dc in _starpu_notify_task_dependencies (j=j@entry=0xc8e080) at core/dependencies/task_deps.c:57
#7  0x00007ffff4a79247 in _starpu_notify_dependencies (j=j@entry=0xc8e080) at core/dependencies/dependencies.c:33
#8  0x00007ffff4a69de5 in _starpu_handle_job_termination (j=j@entry=0xc8e080) at core/jobs.c:376
#9  0x00007ffff4af8bae in _starpu_cpu_driver_run_once (cpu_worker=cpu_worker@entry=0x7ffff4d83090 <_starpu_config+2224>) at drivers/cpu/driver_cpu.c:345
#10 0x00007ffff4af96dd in _starpu_cpu_worker (arg=0x7ffff4d83090 <_starpu_config+2224>) at drivers/cpu/driver_cpu.c:376
#11 0x00007ffff482c6fa in start_thread (arg=0x7fffebfb2700) at pthread_create.c:333
#12 0x00007ffff3f2eb5d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109


I have reproduced the problem in the following simple code:


program main
  use fstarpu_mod
  use iso_c_binding
  implicit none

  type(c_ptr), allocatable              :: shdlsa(:)
  type(c_ptr)                           :: hdla, filter
  real(kind(1.d0)), allocatable, target :: a(:,:)
  integer, target                       :: bsizea
  integer                               :: m, n, npartsa
  
  err = fstarpu_init(C_NULL_PTR)

  m = 16
  n = 16
  allocate(a(m,n))

  call fstarpu_matrix_data_register(hdla, 0, &
       c_loc(a(1,1)),                        &
       size(a,1),                            &
       size(a,1),                            &
       size(a,2),                            &
       c_sizeof(a(1,1)))

  bsizea = 4
  npartsa = (n-1)/bsizea+1
  allocate(shdlsa(npartsa))

  filter = fstarpu_df_alloc_matrix_filter_vertical_block()
  call fstarpu_data_filter_set_nchildren(filter, npartsa)
  call fstarpu_data_partition_plan(hdla, filter, shdlsa)

  call fstarpu_data_partition_submit(hdla, npartsa, shdlsa)
  call fstarpu_data_unpartition_submit(hdla, 0, shdlsa, 0)
  call fstarpu_data_partition_clean(hdla,npartsa,shdlsa)
  call fstarpu_task_wait_for_all()
  call fstarpu_data_unregister(hdla)

  call fstarpu_shutdown()

  stop

end program main


and here is what valgrind --tool=memcheck says about its execution:


==2241== Invalid read of size 8
==2241==    at 0x7EBA547: _starpu_fetch_nowhere_task_input (coherency.c:1169)
==2241==    by 0x7E9BD74: _starpu_repush_task (sched_policy.c:463)
==2241==    by 0x7E75AC4: _starpu_enforce_deps_and_schedule (jobs.c:644)
==2241==    by 0x7E76D58: _starpu_submit_job (task.c:372)
==2241==    by 0x7E79418: starpu_task_submit (task.c:682)
==2241==    by 0x7EDBAD0: _starpu_task_insert_v (starpu_task_insert.c:137)
==2241==    by 0x7EDBFF8: starpu_task_insert (starpu_task_insert.c:159)
==2241==    by 0x7EC28CA: starpu_data_partition_submit (filters.c:604)
==2241==    by 0x404E7E: __dqrm_dsmat_mod_MOD_dqrm_block_partition1 (dqrm_dsmat_mod.F90:1294)
==2241==    by 0x40B695: __dqrm_dsmat_mod_MOD_dqrm_dsmat_qr_facto_async (dqrm_dsmat_mod.F90:505)
==2241==    by 0x40B7E8: __dqrm_dsmat_mod_MOD_dqrm_dsmat_qr_facto (dqrm_dsmat_mod.F90:384)
==2241==    by 0x403A3B: MAIN__ (dqrm_test_dns.F90:94)



if STARPU_PROFILING is not set, the error above disappears.


Ciao
Alfredo





--
-----------------------------------------
Alfredo Buttari, PhD
CNRS-IRIT
2 rue Camichel, 31071 Toulouse, France
http://buttari.perso.enseeiht.fr



Archives gérées par MHonArc 2.6.19+.

Haut de le page