Objet : Developers list for StarPU
Archives de la liste
[Starpu-devel] Issue with distributed NUMA-aware StarPU and dmda scheduler
Chronologique Discussions
- From: Philippe SWARTVAGHER <philippe.swartvagher@inria.fr>
- To: starpu-devel@lists.gforge.inria.fr
- Subject: [Starpu-devel] Issue with distributed NUMA-aware StarPU and dmda scheduler
- Date: Fri, 13 Mar 2020 14:02:38 +0100
- List-archive: <http://lists.gforge.inria.fr/pipermail/starpu-devel/>
- List-id: "Developers list. For discussion of new features, code changes, etc." <starpu-devel.lists.gforge.inria.fr>
Hello,
I have bugs with StarPU-MPI, NUMA support and dmda scheduler.
mpirun -DNMAD_DRIVER=tcp -DSTARPU_USE_NUMA=0 -DSTARPU_SCHED=dmda -DSTARPU_RESERVE_NCPU=2 -DSTARPU_MPI_COOP_SENDS=0 -n 2 -nodelist henri0,henri1 ~/chameleon/build/testing/chameleon_stesting -o potrf -H -n 4800:50000:8000 --mtxfmt 1
=> works fine
mpirun -DNMAD_DRIVER=tcp -DSTARPU_USE_NUMA=1
-DSTARPU_RESERVE_NCPU=2 -DSTARPU_MPI_COOP_SENDS=0 -n 2 -nodelist
henri0,henri1 ~/chameleon/build/testing/chameleon_stesting -o
potrf -H -n 4800:50000:8000 --mtxfmt 1
=> get the error: free(): invalid size
GDB backtrace:
#0 0x00007ffff10ef081 in raise () from
/lib/x86_64-linux-gnu/libc.so.6
#1 0x00007ffff10da535 in abort () from
/lib/x86_64-linux-gnu/libc.so.6
#2 0x00007ffff1130db8 in ?? () from
/lib/x86_64-linux-gnu/libc.so.6
#3 0x00007ffff113748a in ?? () from
/lib/x86_64-linux-gnu/libc.so.6
#4 0x00007ffff1138c0c in ?? () from
/lib/x86_64-linux-gnu/libc.so.6
#5 0x00007ffff7fb0053 in _starpu_mpi_handle_request_termination
(req=req@entry=0x55555581b640) at
../../../mpi/src/mpi/starpu_mpi_mpi.c:861
#6 0x00007ffff7fb21a1 in _starpu_mpi_test_detached_requests () at
../../../mpi/src/mpi/starpu_mpi_mpi.c:1033
#7 _starpu_mpi_progress_thread_func (arg=0x5555556cfa00) at
../../../mpi/src/mpi/starpu_mpi_mpi.c:1337
#8 0x00007ffff16b6fb7 in start_thread (arg=<optimized out>)
at pthread_create.c:486
#9 0x00007ffff11af2ef in clone () from
/lib/x86_64-linux-gnu/libc.so.6
mpirun -DNMAD_DRIVER=tcp -DSTARPU_USE_NUMA=1 -DSTARPU_SCHED=dmda -DSTARPU_RESERVE_NCPU=2 -DSTARPU_MPI_COOP_SENDS=0 -n 2 -nodelist henri0,henri1 ~/chameleon/build/testing/chameleon_stesting -o potrf -H -n 4800:50000:8000 --mtxfmt 1
=> get the error: munmap_chunk(): invalid pointer (but sometimes it's free(): invalid size)
And same backtrace.
I use MadMPI, but there is the same error with NewMadeleine (I didn't try with an OpenMPI).
Any idea ?
-- Philippe SWARTVAGHER Doctorant Équipe TADaaM, Inria Bordeaux Sud-Ouest
- [Starpu-devel] Issue with distributed NUMA-aware StarPU and dmda scheduler, Philippe SWARTVAGHER, 13/03/2020
- Re: [Starpu-devel] Issue with distributed NUMA-aware StarPU and dmda scheduler, Samuel Thibault, 13/03/2020
- Re: [Starpu-devel] Issue with distributed NUMA-aware StarPU and dmda scheduler, Samuel Thibault, 13/03/2020
- Re: [Starpu-devel] Issue with distributed NUMA-aware StarPU and dmda scheduler, Samuel Thibault, 14/03/2020
- Re: [Starpu-devel] Issue with distributed NUMA-aware StarPU and dmda scheduler, Philippe SWARTVAGHER, 16/03/2020
- Re: [Starpu-devel] Issue with distributed NUMA-aware StarPU and dmda scheduler, Samuel Thibault, 16/03/2020
- Re: [Starpu-devel] Issue with distributed NUMA-aware StarPU and dmda scheduler, Philippe SWARTVAGHER, 16/03/2020
- Re: [Starpu-devel] Issue with distributed NUMA-aware StarPU and dmda scheduler, Samuel Thibault, 13/03/2020
Archives gérées par MHonArc 2.6.19+.