Objet : Developers list for StarPU
Archives de la liste
[Starpu-devel] Error with MPI and _starpu_release_data_enforce_sequential_consistency
Chronologique Discussions
- From: Marc Sergent <marc.sergent@inria.fr>
- To: starpu-devel@lists.gforge.inria.fr
- Subject: [Starpu-devel] Error with MPI and _starpu_release_data_enforce_sequential_consistency
- Date: Thu, 28 Mar 2013 15:32:44 +0100 (CET)
- List-archive: <http://lists.gforge.inria.fr/pipermail/starpu-devel>
- List-id: "Developers list. For discussion of new features, code changes, etc." <starpu-devel.lists.gforge.inria.fr>
Hi!
When executing mpi/tests/user_defined_datatype with the latest version of the trunk and MPICH2 version 1.4.1, i sometimes get the following error:
(the test is run on my laptop which runs a debian and starpu is configured with fxt and mkl)
[starpu][_starpu_mpi_print_thread_level_support] MPI_Init_thread level = MPI_THREAD_SERIALIZED; Multiple threads may make MPI calls, but only one at a time.
Testing with function 0x401b30
Testing with function 0x401870
core/dependencies/implicit_data_deps.c:355 pthread_mutex_lock: Invalid argument
[starpu][abort] core/dependencies/implicit_data_deps.c:355 _starpu_release_data_enforce_sequential_consistency
Program received signal SIGABRT, Aborted.
starpu-mpi is dealing with a detached request, and is calling starpu_data_release on the data which calls _starpu_unlock_post_sync_tasks which itself calls _starpu_release_data_enforce_sequential_consistency. This function is able to take the lock sequential_consistency_mutex but fails to unlock it before exiting.
Here the backtrace and some values which could be interesting. The problem can be reproduced quite easily, i can send any other data if needed.
(gdb) bt
#0 0x00007ffff43d7475 in raise () from /lib/x86_64-linux-gnu/libc.so.6
#1 0x00007ffff43da6f0 in abort () from /lib/x86_64-linux-gnu/libc.so.6
#2 0x00007ffff5a07ec4 in _starpu_release_data_enforce_sequential_consistency ( task=0x1fd55d0, handle=0x158f7a0)
at core/dependencies/implicit_data_deps.c:437
#3 0x00007ffff5a00515 in _starpu_handle_job_termination (j=j@entry=0x1fd5960)
at core/jobs.c:177
#4 0x00007ffff5a11ae0 in _starpu_push_task (j=j@entry=0x1fd5960)
at core/sched_policy.c:357
#5 0x00007ffff5a00af3 in _starpu_enforce_deps_and_schedule ( j=j@entry=0x1fd5960) at core/jobs.c:381
#6 0x00007ffff5a014ec in _starpu_submit_job (j=j@entry=0x1fd5960)
at core/task.c:249
#7 0x00007ffff5a02984 in starpu_task_submit (task=0x1fd55d0)
at core/task.c:473
#8 0x00007ffff5a02c02 in _starpu_task_submit_internally (task=<optimized out>)
at core/task.c:489
#9 0x00007ffff5a081cf in _starpu_unlock_post_sync_tasks (handle=0x158f7a0)
at core/dependencies/implicit_data_deps.c:525
#10 0x00007ffff5a2402b in starpu_data_release_on_node (handle=<optimized out>,
node=node@entry=0) at datawizard/user_interactions.c:330
#11 0x00007ffff5a24057 in starpu_data_release (handle=<optimized out>)
at datawizard/user_interactions.c:335
#12 0x00007ffff5c97b1c in _starpu_mpi_handle_request_termination ( req=req@entry=0x1fd5250) at starpu_mpi.c:677
#13 0x00007ffff5c97f5e in _starpu_mpi_test_detached_requests ()
at starpu_mpi.c:767
#14 _starpu_mpi_progress_thread_func (arg=0x60c530) at starpu_mpi.c:915
#15 0x00007ffff534ab50 in start_thread ()
from /lib/x86_64-linux-gnu/libpthread.so.0
#16 0x00007ffff447fa7d in clone () from /lib/x86_64-linux-gnu/libc.so.6
#17 0x0000000000000000 in ?? ()
(gdb) f 9
#9 0x00007ffff5a081cf in _starpu_unlock_post_sync_tasks (handle=0x158f7a0)
at core/dependencies/implicit_data_deps.c:525
525 int ret = _starpu_task_submit_internally(link->task);
(gdb) p *link->task
$3 = {cl = 0x0, buffers = {{handle = 0x0, mode = STARPU_NONE}, {handle = 0x0,
mode = STARPU_NONE}, {handle = 0x0, mode = STARPU_NONE}, {handle = 0x0,
mode = STARPU_NONE}, {handle = 0x0, mode = STARPU_NONE}, {handle = 0x0,
mode = STARPU_NONE}, {handle = 0x0, mode = STARPU_NONE}, {handle = 0x0,
mode = STARPU_NONE}}, handles = {0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0,
0x0}, interfaces = {0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0}, cl_arg = 0x0,
cl_arg_size = 0, callback_func = 0, callback_arg = 0x0, use_tag = 0,
tag_id = 0, sequential_consistency = 1, synchronous = 0, priority = 0,
execute_on_a_specific_worker = 0, workerid = 0, bundle = 0x0, detach = 1,
destroy = 1, regenerate = 0, status = STARPU_TASK_FINISHED,
profiling_info = 0x0, predicted = nan(0x8000000000000),
predicted_transfer = nan(0x8000000000000), prev = 0x0, next = 0x0,
mf_skip = 0, starpu_private = 0x1fd5960, magic = 42, sched_ctx = 0,
hypervisor_tag = 0, flops = 0, scheduled = 0}
(gdb) f 13
#13 0x00007ffff5c97f5e in _starpu_mpi_test_detached_requests () at starpu_mpi.c:767
767 _starpu_mpi_handle_request_termination(req);
(gdb) p *req->data_handle
$6 = {req_list = 0x1591540, refcnt = 0, current_mode = STARPU_R,
header_lock = {lock = 1}, busy_count = 0, busy_waiting = 1, busy_mutex = {
__data = {__lock = 0, __count = 0, __owner = 0, __nusers = 0, __kind = -1,
__spins = 0, __list = {__prev = 0x0, __next = 0x0}},
__size = '\000' <repeats 16 times>"\377, \377\377\377", '\000' <repeats 19 times>, __align = 0}, busy_cond = {__data = {__lock = 1, __futex = 0,
__total_seq = 18446744073709551615, __wakeup_seq = 0, __woken_seq = 0,
__mutex = 0x0, __nwaiters = 0, __broadcast_seq = 0},
__size = "\001\000\000\000\000\000\000\000\377\377\377\377\377\377\377\377", '\000' <repeats 31 times>, __align = 1}, root_handle = 0x158f7a0,
father_handle = 0x0, sibling_index = 0, depth = 1, children = 0x0,
nchildren = 0, per_node = {{_prev = 0x0, _next = 0x0, handle = 0x158f7a0,
data_interface = 0x15914e0, memory_node = 0, relaxed_coherency = 0,
initialized = 0, state = STARPU_OWNER, refcnt = 0, allocated = 1 '\001',
automatically_allocated = 0 '\000', mc = 0x0, requested = "", request = {
0x0}}}, per_worker = {{_prev = 0x0, _next = 0x0, handle = 0x158f7a0,
data_interface = 0x1591500, memory_node = 0, relaxed_coherency = 1,
initialized = 0, state = STARPU_INVALID, refcnt = 0,
allocated = 0 '\000', automatically_allocated = 0 '\000', mc = 0x0,
requested = "", request = {0x0}}, {_prev = 0x0, _next = 0x0,
handle = 0x158f7a0, data_interface = 0x1591520, memory_node = 0,
relaxed_coherency = 1, initialized = 0, state = STARPU_INVALID,
refcnt = 0, allocated = 0 '\000', automatically_allocated = 0 '\000',
mc = 0x0, requested = "", request = {0x0}}, {_prev = 0x0, _next = 0x0,
handle = 0x0, data_interface = 0x0, memory_node = 0,
relaxed_coherency = 0, initialized = 0, state = STARPU_OWNER,
refcnt = 0, allocated = 0 '\000', automatically_allocated = 0 '\000',
mc = 0x0, requested = "", request = {0x0}} <repeats 78 times>},
ops = 0x6048a0, footprint = 3655735684, home_node = 0, wt_mask = 0,
is_readonly = 0 '\000', is_not_important = 0, sequential_consistency = 1,
sequential_consistency_mutex = {__data = {__lock = 0, __count = 0,
__owner = 0, __nusers = 0, __kind = -1, __spins = 0, __list = {
__prev = 0x0, __next = 0x0}},
__size = '\000' <repeats 16 times>"\377, \377\377\377", '\000' <repeats 19 times>, __align = 0}, last_submitted_mode = STARPU_RW,
last_submitted_writer = 0x0, last_submitted_readers = 0x0,
last_submitted_ghost_writer_id_is_valid = 1,
last_submitted_ghost_writer_id = 29305,
last_submitted_ghost_readers_id = 0x0, post_sync_tasks = 0x0,
post_sync_tasks_cnt = 0, redux_cl = 0x0, init_cl = 0x0,
reduction_refcnt = 0, reduction_req_list = 0x1591560,
reduction_tmp_handles = {0x0 <repeats 80 times>}, lazy_unregister = 0,
rank = 1, tag = 4813, memory_stats = 0x0, mf_node = 6525744}
Thanks for any help!
Marc
- [Starpu-devel] Error with MPI and _starpu_release_data_enforce_sequential_consistency, Marc Sergent, 28/03/2013
- Re: [Starpu-devel] Error with MPI and _starpu_release_data_enforce_sequential_consistency, Benoît Lizé, 28/03/2013
- Re: [Starpu-devel] Error with MPI and _starpu_release_data_enforce_sequential_consistency, Nathalie Furmento, 29/03/2013
- Re: [Starpu-devel] Error with MPI and _starpu_release_data_enforce_sequential_consistency, Benoît Lizé, 29/03/2013
- Re: [Starpu-devel] Error with MPI and _starpu_release_data_enforce_sequential_consistency, Nathalie Furmento, 29/03/2013
- Re: [Starpu-devel] Error with MPI and _starpu_release_data_enforce_sequential_consistency, Benoît Lizé, 28/03/2013
Archives gérées par MHonArc 2.6.19+.