Accéder au contenu.
Menu Sympa

starpu-devel - Re: [Starpu-devel] possible source of : ../../src/datawizard/coherency.c:60: _starpu_select_src_node: Assertion `src_node_mask != 0' failed.

Objet : Developers list for StarPU

Archives de la liste

Re: [Starpu-devel] possible source of : ../../src/datawizard/coherency.c:60: _starpu_select_src_node: Assertion `src_node_mask != 0' failed.


Chronologique Discussions 
  • From: Xavier Lacoste <xl64100@gmail.com>
  • To: starpu-devel@lists.gforge.inria.fr
  • Subject: Re: [Starpu-devel] possible source of : ../../src/datawizard/coherency.c:60: _starpu_select_src_node: Assertion `src_node_mask != 0' failed.
  • Date: Thu, 9 Oct 2014 10:50:36 +0200
  • List-archive: <http://lists.gforge.inria.fr/pipermail/starpu-devel/>
  • List-id: "Developers list. For discussion of new features, code changes, etc." <starpu-devel.lists.gforge.inria.fr>

Hello,

I think I identified the handle responsible for the error, it's a remote data
handle registered with an incorrect rank...

Sorry for disturbing,
Regards,

XL.


Le 8 oct. 2014 à 15:43, Xavier Lacoste <xl64100@gmail.com> a écrit :

> Hello,
>
> I recently copied my (working) task insertion algorithm from a recent
> version of PaStiX to an older one and got this error when inserting MPI
> tasks:
>
> [starpu][_starpu_select_src_node][assert failure] The data for this handle
> is requested, but this handle does not have a valid value. Perhaps some
> initialization task is missing?
> ../../src/datawizard/coherency.c:60: _starpu_select_src_node: Assertion
> `src_node_mask != 0' failed.
>
> As the message says, I suppose I'm doing something wrong but I can't find
> what...
>
> I have been searching for an initialization error for 2 days but can't find
> it...
> I copied both data registration and task soumission routines (only the
> codelets are different), they are nearly the same...
>
> I checked the handles involved in the insertion tasks are all registred :
> I'm inserting a task using handles :
> - remote column block 4 : handle value : 0xeed8a0
> - local column block 10 : 0xee8f20
> - Work array data : 0xefb520
> - block sparsity information array : 0xef6ba0
>
> And registered local data:
> registered 5 0xed1f80
> registered 6 0xed6920
> registered 7 0xedb2a0
> registered 8 0xedfc20
> registered 9 0xee45a0
> registered 10 0xee8f20
>
> Remote data
> registered halo : 4 0xeed8a0
>
> Block sparsity information arrays
> registered bloktab 0 0xef2220
> registered bloktab 1 0xef6ba0
>
> Working data array
> registered work_handle 0xefb520
>
> The handle in _starpu_select_src_node () does not correspond to anything
> handle=0x32b7, gdb must be lost... :
>
> #0 0x00007fffefb598e5 in raise () from /lib64/libc.so.6
> #1 0x00007fffefb5b0c5 in abort () from /lib64/libc.so.6
> #2 0x00007fffefb52a0e in __assert_fail_base () from /lib64/libc.so.6
> #3 0x00007fffefb52ad0 in __assert_fail () from /lib64/libc.so.6
> #4 0x00007ffff5b1f870 in _starpu_select_src_node (handle=0x32b7,
> destination=12983) at ../../src/datawizard/coherency.c:60
> #5 0x00007ffff5b1eef5 in _starpu_create_request_to_fetch_data
> (handle=0x32b7,
> dst_replicate=0x32b7, mode=6, is_prefetch=4294967295, async=15428208,
> callback_func=0x50, callback_arg=0x0)
> at ../../src/datawizard/coherency.c:430
> #6 0x00007ffff5b1ed94 in _starpu_fetch_data_on_node (handle=0x32b7,
> dst_replicate=0x32b7, mode=6, detached=4294967295, async=15428208,
> callback_func=0x50, callback_arg=0xeed940)
> at ../../src/datawizard/coherency.c:568
> #7 0x00007ffff5b1ec16 in prefetch_data_on_node (task=0x32b7, node=12983)
> at ../../src/datawizard/coherency.c:586
> #8 starpu_prefetch_task_input_on_node (task=0x32b7, node=12983)
> at ../../src/datawizard/coherency.c:683
> #9 0x00007ffff5b10299 in _starpu_push_task_to_workers (task=0x32b7)
> at ../../src/core/sched_policy.c:448
> #10 0x00007ffff5b1000f in _starpu_push_task (j=0x32b7)
> at ../../src/core/sched_policy.c:385
> #11 0x00007ffff5af0c4f in starpu_task_submit (task=0x32b7)
> ---Type <return> to continue, or q <return> to quit---
> at ../../src/core/task.c:513
> #12 0x00007ffff5b379b9 in _starpu_insert_task_create_and_submit (
> arg_buffer=0x32b7, arg_buffer_size=12983, cl=0x6,
> task=0xffffffffffffffff,
> varg_list=0xeb6a70) at ../../src/util/starpu_insert_task_utils.c:418
> #13 0x00007ffff5df5f91 in starpu_mpi_insert_task (comm=0x32b7,
> codelet=0x32b7)
> at ../../../mpi/src/starpu_mpi_insert_task.c:457
> #14 0x00000000004fabc5 in halo_submit (sopalin_data=0xe4eee8)
> at ./sopalin/src/starpu_submit_tasks.c:676
> #15 0x00000000004fe34d in sy_starpu_submit_tasks (sopalin_data=0xe4eee8)
> at ./sopalin/src/starpu_submit_tasks.c:1518
> #16 0x00000000004f7cf4 in sy_sopalin_thread (m=0xe445e8, sopaparam=0xe447d8)
> at ./sopalin/src/sopalin3d.c:1376
> #17 0x000000000045c5a3 in pastix_task_sopalin (pastix_data=0xe445e8,
> pastix_comm=0xa1d000, n=150, colptr=0xe41f00, row=0xe421c0,
> avals=0xe42920, b=0xe43780, rhsnbr=1, loc2glob=0x0)
> at sopalin/src/pastix.c:3533
> #18 0x0000000000460ccb in pastix (pastix_data=0x7fffffffa630,
> pastix_comm=0xa1d000, n=150, colptr=0xe41f00, row=0xe421c0,
> avals=0xe42920, perm=0xe44120, invp=0xe44380, b=0xe43780, rhs=1,
> iparm=0x7fffffffa3e8, dparm=0x7fffffffa6a0) at sopalin/src/pastix.c:4998
> #19 0x00000000004098c4 in main (argc=8, argv=0x7fffffffaa48) at simple.c:207
>
> If you have any idea of the source of such an issue, please tell me.
>
> Regards,
>
> XL.
>
>

Attachment: signature.asc
Description: Message signed with OpenPGP using GPGMail




Archives gérées par MHonArc 2.6.19+.

Haut de le page