Accéder au contenu.
Menu Sympa

starpu-devel - [Starpu-devel] segfault in StarPU memory allocator

Objet : Developers list for StarPU

Archives de la liste

[Starpu-devel] segfault in StarPU memory allocator


Chronologique Discussions 
  • From: Alfredo Buttari <alfredo.buttari@enseeiht.fr>
  • To: starpu-devel@lists.gforge.inria.fr
  • Subject: [Starpu-devel] segfault in StarPU memory allocator
  • Date: Mon, 2 Jul 2018 16:57:27 +0200
  • Authentication-results: mail3-smtp-sop.national.inria.fr; spf=None smtp.pra=alfredo.buttari@enseeiht.fr; spf=Pass smtp.mailfrom=alfredo.buttari@gmail.com; spf=None smtp.helo=postmaster@mail-ua0-f169.google.com
  • Ironport-phdr: 9a23:mGIIbh8hQyrlBf9uRHKM819IXTAuvvDOBiVQ1KB30ewcTK2v8tzYMVDF4r011RmVBduds6oMotGVmpioYXYH75eFvSJKW713fDhBt/8rmRc9CtWOE0zxIa2iRSU7GMNfSA0tpCnjYgBaF8nkelLdvGC54yIMFRXjLwp1Ifn+FpLPg8it2O2+55zebx9UiDahfLh/MAi4oQLNu8cMnIBsMLwxyhzHontJf+RZ22ZlLk+Nkhj/+8m94odt/zxftPw9+cFAV776f7kjQrxDEDsmKWE169b1uhTFUACC+2ETUmQSkhpPHgjF8BT3VYr/vyfmquZw3jSRMMvrRr42RDui9b9mRxDohikJNDA37X/ZhdBrgaJHvB6svQBzz5LIbIyXMvd1Y6PTfckdRWpERstcSyhBAo2mb4QREuUBOvtTopTgp1sSsRuxHxOsBOLywTJPnHD22aM60/4/Hg7b2wwsBckBsGnIrNXpLqgSS+G1wbLWwjXFdPNZxyny6InIchA9u/2MU6hwfNPXxEIyGQ3FiVCQppbkPzOTzukCrW6b7/F+Wu2xim4nqx1xriKhxsc2koXFm4AYxkrf9SV+xos+ON62SFZjbNK6DJddszuWOoh2T884XW1kpSQ3xqcbtZO6eCUHzoksyQTFZPydaYeI5wruVOaPLjd8g3JoYLe/iAyz8Uik0+H8S9O73EpToipLj9XBt3QA2wbc6siATft98UOh1iiV2w/P7eFEJFg4lavdK5E/3r49joQfvVjHEyPsm0j7jLWaels59uWq8ejrf7Trq5uEO49xkA7+M6AumsKlAeQ/NwgDR3KU+eCy1L3540L2XbJKguctnanErJDaOd4UprS4AwJO3YYj7gywDzai0NgCgXYHK1dFdAqdj4f1I1HOPOz4DfCnjlS3jDdk3erGPqX8ApnUM3fMjqnhcqh560NHzAozzMtf545PCr0bL/LzVEjxtMbXDhAnKQC0wuDnCM981owEQ26PDLWZY+vutgqT+us1O/TJaIILtTLVL/k+++WognE+g1AQO6ivx5oeLn6iTdp8JEDMSH7imNoNFi8goxYiRfbrwAmLVTBJanq1Gagh+i0yEoOOAIHYS42sgKDH0j3tTc4eXXxPFl3ZSSSgTI6DQfpZMHvDcP8kqSQNUP2ac6Fk0BivsAHgzL8+d7jb/yQZsdTo090nvrSPxyF3ziR9CoGm60/IV3t9xzhaSjkt0a1+oVA7xE3RifEl0cwdLsRa4rZyail/NZPYyLYkWdX7WwaEY97RDVj/HITgDjY2QdY8hdQJZhQlFg==
  • List-archive: <http://lists.gforge.inria.fr/pipermail/starpu-devel/>
  • List-id: "Developers list. For discussion of new features, code changes, etc." <starpu-devel.lists.gforge.inria.fr>

Dear StarPU developers/users,
I need some help to debug a code which segfaults in a worker when the
starpu_shutdown() routine is called with the following backtrace:

Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7fffe0b52700 (LWP 166825)]
starpu_free_on_node_flags (dst_node=1, addr=140735338284336, size=0,
flags=6) at datawizard/malloc.c:1061
1061 int block = ((addr - chunk->base) / CHUNK_ALLOC_MIN) + 1,
prevblock, nextblock;
Missing separate debuginfos, use: debuginfo-install
glibc-2.17-196.el7_4.2.x86_64 libxml2-2.9.1-6.el7_2.3.x86_64
numactl-libs-2.0.9-6.el7_2.x86_64 xz-libs-5.1.2-12alpha.el7.x86_64
zlib-1.2.7-15.el7.x86_64
(gdb) backtrace
#0 starpu_free_on_node_flags (dst_node=1, addr=140735338284336,
size=0, flags=6) at datawizard/malloc.c:1061
#1 0x00007ffff74810f3 in free_memory_on_node (mc=<optimized out>,
node=<optimized out>) at datawizard/memalloc.c:390
#2 flush_memchunk_cache (node=1, reclaim=140735338284336) at
datawizard/memalloc.c:845
#3 0x00007ffff74811c6 in _starpu_memory_reclaim_generic (node=1,
force=2144896304, reclaim=0) at datawizard/memalloc.c:971
#4 0x00007ffff74b3a86 in _starpu_cuda_driver_deinit (worker_set=0x1)
at drivers/cuda/driver_cuda.c:1027
#5 0x00007ffff74b27df in _starpu_cuda_worker (_arg=0x1) at
drivers/cuda/driver_cuda.c:1070
#6 0x00007ffff3883e25 in start_thread () from /lib64/libpthread.so.0
#7 0x00007fffef88f34d in clone () from /lib64/libc.so.6


Unfortunately valgrind does not like CUDA code:

==1292== Memcheck, a memory error detector
==1292== Copyright (C) 2002-2013, and GNU GPL'd, by Julian Seward et al.
==1292== Using Valgrind-3.10.0 and LibVEX; rerun with -h for copyright info
==1292== Command: ./dqrm_test
==1292==
vex amd64->IR: unhandled instruction bytes: 0xF 0xAE 0x64 0x24 0x40
0x48 0x8B 0x73
vex amd64->IR: REX=0 REX.W=0 REX.R=0 REX.X=0 REX.B=0
vex amd64->IR: VEX=0 VEX.L=0 VEX.nVVVV=0x0 ESC=0F
vex amd64->IR: PFX.66=0 PFX.F2=0 PFX.F3=0
==1292== valgrind: Unrecognised instruction at address 0x4015f48.
==1292== at 0x4015F48: _dl_runtime_resolve_xsave (in /usr/lib64/ld-2.17.so)
[cut]


and cuda-memcheck reports no errors.

Do you have any idea on how to proceed?



ciao
Alfredo




-----------------------------------------
Alfredo Buttari, PhD
CNRS-IRIT
2 rue Camichel, 31071 Toulouse, France
http://buttari.perso.enseeiht.fr




Archives gérées par MHonArc 2.6.19+.

Haut de le page