Objet : Developers list for StarPU
Archives de la liste
Re: [Starpu-devel] Error with MPI and _starpu_release_data_enforce_sequential_consistency
Chronologique Discussions
- From: Nathalie Furmento <nathalie.furmento@inria.fr>
- To: Benoît Lizé <benoit.lize@gmail.com>
- Cc: starpu-devel@lists.gforge.inria.fr, Marc Sergent <marc.sergent@inria.fr>
- Subject: Re: [Starpu-devel] Error with MPI and _starpu_release_data_enforce_sequential_consistency
- Date: Fri, 29 Mar 2013 11:07:49 +0100
- List-archive: <http://lists.gforge.inria.fr/pipermail/starpu-devel>
- List-id: "Developers list. For discussion of new features, code changes, etc." <starpu-devel.lists.gforge.inria.fr>
Benoit, The bug should be fixed by r9056. Could you please try and let us know if it works now? Marc has not been able to reproduce the bug with the fix. Thanks, Nathalie On 28/03/2013 15:39, Benoît Lizé wrote: Hello,
I have the same issue with a much larger code, and was
trying to find the root cause before posting a message to this
mailing list.
Here is a stack trae I get:
Program received signal SIGABRT, Aborted.
[Switching to Thread 0x7fffe3760700 (LWP 19450)]
0x00007ffff4bb5475 in *__GI_raise (sig=<optimized
out>)
at ../nptl/sysdeps/unix/sysv/linux/raise.c:64
64 ../nptl/sysdeps/unix/sysv/linux/raise.c: No such
file or directory.
(gdb) bt
#0 0x00007ffff4bb5475 in *__GI_raise (sig=<optimized
out>)
at ../nptl/sysdeps/unix/sysv/linux/raise.c:64
#1 0x00007ffff4bb86f0 in *__GI_abort () at abort.c:92
#2 0x00007ffff5871443 in
_starpu_release_data_enforce_sequential_consistency (
task=0x7fffd7381b90, handle=0x7546e70) at
core/dependencies/implicit_data_deps.c:355
#3 0x00007ffff5871b28 in _starpu_unlock_post_sync_tasks
(handle=0x7546e70)
at core/dependencies/implicit_data_deps.c:523
#4 0x00007ffff589ae6f in starpu_data_release_on_node
(handle=0x7546e70, node=0)
at datawizard/user_interactions.c:328
#5 0x00007ffff589ae8e in starpu_data_release
(handle=0x7546e70)
at datawizard/user_interactions.c:333
#6 0x00007ffff5643fd3 in
_starpu_mpi_handle_request_termination (req=0x7fffd43288f0)
at starpu_mpi.c:666
#7 0x00007ffff56446d8 in
_starpu_mpi_test_detached_requests () at starpu_mpi.c:755
#8 0x00007ffff5645207 in
_starpu_mpi_progress_thread_func (arg=0x102a800) at
starpu_mpi.c:904
#9 0x00007ffff4f13b50 in start_thread (arg=<optimized
out>) at pthread_create.c:304
#10 0x00007ffff4c5da7d in clone () at
../sysdeps/unix/sysv/linux/x86_64/clone.S:112
#11 0x0000000000000000 in ?? ()
Here is some context:
Unfortunately, I don't have anything pointing to the
root cause...
--
Benoît Lizé
On Thu, Mar 28, 2013 at 3:32 PM, Marc
Sergent <marc.sergent@inria.fr>
wrote:
Here the backtrace and some values which could
be interesting. The problem can be reproduced
quite easily, i can send any other data if
needed.
Hi!
When executing
mpi/tests/user_defined_datatype with the latest
version of the trunk and MPICH2 version 1.4.1, i
sometimes get the following error:
starpu-mpi is
dealing with a detached request, and is calling
starpu_data_release on the data which
calls _starpu_unlock_post_sync_tasks which
itself
calls _starpu_release_data_enforce_sequential_consistency.
This function is able to take the lock
sequential_consistency_mutex but fails to
unlock it before exiting.(the test is run on my laptop which runs
a debian and starpu is configured with fxt and
mkl)
[starpu][_starpu_mpi_print_thread_level_support]
MPI_Init_thread level =
MPI_THREAD_SERIALIZED; Multiple threads may
make MPI calls, but only one at a time.
Testing with function 0x401b30
Testing with function 0x401870
core/dependencies/implicit_data_deps.c:355
pthread_mutex_lock: Invalid argument
[starpu][abort]
core/dependencies/implicit_data_deps.c:355
_starpu_release_data_enforce_sequential_consistency
Program received signal
SIGABRT, Aborted.
(gdb) bt
#0 0x00007ffff43d7475 in
raise () from
/lib/x86_64-linux-gnu/libc.so.6
#1 0x00007ffff43da6f0 in
abort () from
/lib/x86_64-linux-gnu/libc.so.6
#2 0x00007ffff5a07ec4 in
_starpu_release_data_enforce_sequential_consistency
( task=0x1fd55d0,
handle=0x158f7a0)
at
core/dependencies/implicit_data_deps.c:437
#3 0x00007ffff5a00515 in
_starpu_handle_job_termination
(j=j@entry=0x1fd5960)
at core/jobs.c:177
#4 0x00007ffff5a11ae0 in
_starpu_push_task (j=j@entry=0x1fd5960)
at core/sched_policy.c:357
#5 0x00007ffff5a00af3 in
_starpu_enforce_deps_and_schedule ( j=j@entry=0x1fd5960) at
core/jobs.c:381
#6 0x00007ffff5a014ec in
_starpu_submit_job (j=j@entry=0x1fd5960)
at core/task.c:249
#7 0x00007ffff5a02984 in
starpu_task_submit (task=0x1fd55d0)
at core/task.c:473
#8 0x00007ffff5a02c02 in
_starpu_task_submit_internally
(task=<optimized out>)
at core/task.c:489
#9 0x00007ffff5a081cf in
_starpu_unlock_post_sync_tasks
(handle=0x158f7a0)
at
core/dependencies/implicit_data_deps.c:525
#10 0x00007ffff5a2402b in
starpu_data_release_on_node
(handle=<optimized out>,
node=node@entry=0) at
datawizard/user_interactions.c:330
#11 0x00007ffff5a24057 in
starpu_data_release (handle=<optimized
out>)
at
datawizard/user_interactions.c:335
#12
0x00007ffff5c97b1c in
_starpu_mpi_handle_request_termination ( req=req@entry=0x1fd5250)
at starpu_mpi.c:677
#13 0x00007ffff5c97f5e in
_starpu_mpi_test_detached_requests ()
at starpu_mpi.c:767
#14
_starpu_mpi_progress_thread_func
(arg=0x60c530) at starpu_mpi.c:915
#15 0x00007ffff534ab50 in
start_thread ()
from
/lib/x86_64-linux-gnu/libpthread.so.0
#16 0x00007ffff447fa7d in
clone () from
/lib/x86_64-linux-gnu/libc.so.6
#17 0x0000000000000000 in ??
()
(gdb) f 9
#9 0x00007ffff5a081cf in
_starpu_unlock_post_sync_tasks
(handle=0x158f7a0)
at
core/dependencies/implicit_data_deps.c:525
525
int ret =
_starpu_task_submit_internally(link->task);
(gdb) p *link->task
$3 = {cl = 0x0, buffers =
{{handle = 0x0, mode = STARPU_NONE},
{handle = 0x0,
mode = STARPU_NONE},
{handle = 0x0, mode = STARPU_NONE},
{handle = 0x0,
mode = STARPU_NONE},
{handle = 0x0, mode = STARPU_NONE},
{handle = 0x0,
mode = STARPU_NONE},
{handle = 0x0, mode = STARPU_NONE},
{handle = 0x0,
mode =
STARPU_NONE}}, handles = {0x0, 0x0, 0x0,
0x0, 0x0, 0x0, 0x0,
0x0}, interfaces =
{0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0,
0x0}, cl_arg = 0x0,
cl_arg_size = 0,
callback_func = 0, callback_arg = 0x0,
use_tag = 0,
tag_id = 0,
sequential_consistency = 1, synchronous
= 0, priority = 0,
execute_on_a_specific_worker = 0,
workerid = 0, bundle = 0x0, detach = 1,
destroy = 1, regenerate
= 0, status = STARPU_TASK_FINISHED,
profiling_info = 0x0,
predicted = nan(0x8000000000000),
predicted_transfer =
nan(0x8000000000000), prev = 0x0, next =
0x0,
mf_skip = 0,
starpu_private = 0x1fd5960, magic = 42,
sched_ctx = 0,
hypervisor_tag = 0,
flops = 0, scheduled = 0}
(gdb) f 13
#13
0x00007ffff5c97f5e in
_starpu_mpi_test_detached_requests () at
starpu_mpi.c:767
767
_starpu_mpi_handle_request_termination(req);
(gdb) p *req->data_handle
$6 = {req_list = 0x1591540,
refcnt = 0, current_mode = STARPU_R,
header_lock = {lock = 1},
busy_count = 0, busy_waiting = 1, busy_mutex
= {
__data = {__lock = 0,
__count = 0, __owner = 0, __nusers = 0,
__kind = -1,
__spins = 0, __list =
{__prev = 0x0, __next = 0x0}},
__size = '\000'
<repeats 16 times>"\377,
\377\377\377", '\000' <repeats 19
times>, __align = 0}, busy_cond = {__data
= {__lock = 1, __futex = 0,
__total_seq =
18446744073709551615, __wakeup_seq = 0,
__woken_seq = 0,
__mutex = 0x0,
__nwaiters = 0, __broadcast_seq = 0},
__size =
"\001\000\000\000\000\000\000\000\377\377\377\377\377\377\377\377",
'\000' <repeats 31 times>, __align =
1}, root_handle = 0x158f7a0,
father_handle = 0x0,
sibling_index = 0, depth = 1, children =
0x0,
nchildren = 0, per_node =
{{_prev = 0x0, _next = 0x0, handle =
0x158f7a0,
data_interface =
0x15914e0, memory_node = 0,
relaxed_coherency = 0,
initialized = 0, state =
STARPU_OWNER, refcnt = 0, allocated = 1
'\001',
automatically_allocated
= 0 '\000', mc = 0x0, requested = "",
request = {
0x0}}}, per_worker =
{{_prev = 0x0, _next = 0x0, handle =
0x158f7a0,
data_interface =
0x1591500, memory_node = 0,
relaxed_coherency = 1,
initialized = 0, state =
STARPU_INVALID, refcnt = 0,
allocated = 0 '\000',
automatically_allocated = 0 '\000', mc =
0x0,
requested = "", request
= {0x0}}, {_prev = 0x0, _next = 0x0,
handle = 0x158f7a0,
data_interface = 0x1591520, memory_node =
0,
relaxed_coherency = 1,
initialized = 0, state = STARPU_INVALID,
refcnt = 0, allocated =
0 '\000', automatically_allocated = 0
'\000',
mc =
0x0, requested = "", request = {0x0}},
{_prev = 0x0, _next = 0x0,
handle = 0x0,
data_interface = 0x0, memory_node = 0,
relaxed_coherency = 0,
initialized = 0, state = STARPU_OWNER,
refcnt = 0, allocated =
0 '\000', automatically_allocated = 0
'\000',
mc = 0x0, requested =
"", request = {0x0}} <repeats 78
times>},
ops = 0x6048a0, footprint =
3655735684, home_node
= 0, wt_mask = 0,
is_readonly = 0 '\000',
is_not_important = 0, sequential_consistency
= 1,
sequential_consistency_mutex
= {__data = {__lock = 0, __count = 0,
__owner = 0, __nusers =
0, __kind = -1, __spins = 0, __list = {
__prev = 0x0, __next =
0x0}},
__size = '\000'
<repeats 16 times>"\377,
\377\377\377", '\000' <repeats 19
times>, __align = 0}, last_submitted_mode
= STARPU_RW,
last_submitted_writer = 0x0,
last_submitted_readers = 0x0,
last_submitted_ghost_writer_id_is_valid =
1,
last_submitted_ghost_writer_id = 29305,
last_submitted_ghost_readers_id = 0x0,
post_sync_tasks = 0x0,
post_sync_tasks_cnt = 0,
redux_cl = 0x0, init_cl = 0x0,
reduction_refcnt = 0,
reduction_req_list = 0x1591560,
reduction_tmp_handles = {0x0
<repeats 80 times>}, lazy_unregister =
0,
rank = 1, tag = 4813,
memory_stats = 0x0, mf_node = 6525744}
Marc _______________________________________________ Starpu-devel mailing list Starpu-devel@lists.gforge.inria.fr http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/starpu-devel _______________________________________________ Starpu-devel mailing list Starpu-devel@lists.gforge.inria.fr http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/starpu-devel |
- [Starpu-devel] Error with MPI and _starpu_release_data_enforce_sequential_consistency, Marc Sergent, 28/03/2013
- Re: [Starpu-devel] Error with MPI and _starpu_release_data_enforce_sequential_consistency, Benoît Lizé, 28/03/2013
- Re: [Starpu-devel] Error with MPI and _starpu_release_data_enforce_sequential_consistency, Nathalie Furmento, 29/03/2013
- Re: [Starpu-devel] Error with MPI and _starpu_release_data_enforce_sequential_consistency, Benoît Lizé, 29/03/2013
- Re: [Starpu-devel] Error with MPI and _starpu_release_data_enforce_sequential_consistency, Nathalie Furmento, 29/03/2013
- Re: [Starpu-devel] Error with MPI and _starpu_release_data_enforce_sequential_consistency, Benoît Lizé, 28/03/2013
Archives gérées par MHonArc 2.6.19+.