Accéder au contenu.
Menu Sympa

starpu-devel - [Starpu-devel] Issue with distributed NUMA-aware StarPU and dmda scheduler

Objet : Developers list for StarPU

Archives de la liste

[Starpu-devel] Issue with distributed NUMA-aware StarPU and dmda scheduler


Chronologique Discussions 
  • From: Philippe SWARTVAGHER <philippe.swartvagher@inria.fr>
  • To: starpu-devel@lists.gforge.inria.fr
  • Subject: [Starpu-devel] Issue with distributed NUMA-aware StarPU and dmda scheduler
  • Date: Fri, 13 Mar 2020 14:02:38 +0100
  • List-archive: <http://lists.gforge.inria.fr/pipermail/starpu-devel/>
  • List-id: "Developers list. For discussion of new features, code changes, etc." <starpu-devel.lists.gforge.inria.fr>

Hello,

I have bugs with StarPU-MPI, NUMA support and dmda scheduler.


mpirun -DNMAD_DRIVER=tcp -DSTARPU_USE_NUMA=0 -DSTARPU_SCHED=dmda -DSTARPU_RESERVE_NCPU=2 -DSTARPU_MPI_COOP_SENDS=0 -n 2 -nodelist henri0,henri1 ~/chameleon/build/testing/chameleon_stesting -o potrf -H -n 4800:50000:8000 --mtxfmt 1

=> works fine


mpirun -DNMAD_DRIVER=tcp -DSTARPU_USE_NUMA=1 -DSTARPU_RESERVE_NCPU=2 -DSTARPU_MPI_COOP_SENDS=0 -n 2 -nodelist henri0,henri1 ~/chameleon/build/testing/chameleon_stesting -o potrf -H -n 4800:50000:8000 --mtxfmt 1

=> get the error: free(): invalid size

GDB backtrace:

#0  0x00007ffff10ef081 in raise () from /lib/x86_64-linux-gnu/libc.so.6
#1  0x00007ffff10da535 in abort () from /lib/x86_64-linux-gnu/libc.so.6
#2  0x00007ffff1130db8 in ?? () from /lib/x86_64-linux-gnu/libc.so.6
#3  0x00007ffff113748a in ?? () from /lib/x86_64-linux-gnu/libc.so.6
#4  0x00007ffff1138c0c in ?? () from /lib/x86_64-linux-gnu/libc.so.6
#5  0x00007ffff7fb0053 in _starpu_mpi_handle_request_termination (req=req@entry=0x55555581b640) at ../../../mpi/src/mpi/starpu_mpi_mpi.c:861
#6  0x00007ffff7fb21a1 in _starpu_mpi_test_detached_requests () at ../../../mpi/src/mpi/starpu_mpi_mpi.c:1033
#7  _starpu_mpi_progress_thread_func (arg=0x5555556cfa00) at ../../../mpi/src/mpi/starpu_mpi_mpi.c:1337
#8  0x00007ffff16b6fb7 in start_thread (arg=<optimized out>) at pthread_create.c:486
#9  0x00007ffff11af2ef in clone () from /lib/x86_64-linux-gnu/libc.so.6


mpirun -DNMAD_DRIVER=tcp -DSTARPU_USE_NUMA=1 -DSTARPU_SCHED=dmda -DSTARPU_RESERVE_NCPU=2 -DSTARPU_MPI_COOP_SENDS=0 -n 2 -nodelist henri0,henri1 ~/chameleon/build/testing/chameleon_stesting -o potrf -H -n 4800:50000:8000 --mtxfmt 1

=> get the error: munmap_chunk(): invalid pointer (but sometimes it's free(): invalid size)

And same backtrace.


I use MadMPI, but there is the same error with NewMadeleine (I didn't try with an OpenMPI).


Any idea ?

-- 
Philippe SWARTVAGHER

Doctorant
Équipe TADaaM, Inria Bordeaux Sud-Ouest



Archives gérées par MHonArc 2.6.19+.

Haut de le page