Objet : Developers list for StarPU
Archives de la liste
[Starpu-devel] starpu / guix on plafrim partition dedicated for the November HPC school
Chronologique Discussions
- From: Emmanuel Agullo <Emmanuel.Agullo@inria.fr>
- To: "starpu-devel\@lists.gforge.inria.fr" <starpu-devel@lists.gforge.inria.fr>, Florent Pruvost <florent.pruvost@inria.fr>, Ludovic Courtès <ludovic.courtes@inria.fr>
- Subject: [Starpu-devel] starpu / guix on plafrim partition dedicated for the November HPC school
- Date: Tue, 22 Oct 2019 21:25:22 +0200
- List-archive: <http://lists.gforge.inria.fr/pipermail/starpu-devel/>
- List-id: "Developers list. For discussion of new features, code changes, etc." <starpu-devel.lists.gforge.inria.fr>
Dear StarPU team,
I succeed to build starpu with guix on:
- my laptop,
- plafrim frontal node (devel13),
- a computing node on plafrim of type miriel (miriel015).
To do that, on any of those machines I run:
guix build starpu --no-grafts --check
And it's all working great.
However, when I am trying to build it on the plafrim-hpcs
machine (the partition of plafrim that we'll use for the HPC school in
November), I obtain an error during the starpu check step. I get that
both on the frontal node of plafrim-hpcs (mistral01) and on
a compute node (miriel001).
The failure is related to starpu_machine_display:
FAIL: starpu_machine_display
============================
[starpu][check_bus_config_file] No performance model for the bus,
calibrating...
[starpu][check_bus_config_file] ... done
[error] `./starpu_machine_display' killed with signal 11; test marked as
failed
warning: core file may not match specified executable file.
[New LWP 24848]
[Thread debugging using libthread_db enabled]
Using host libthread_db library
"/gnu/store/ahqgl4h89xqj695lgqvsaf6zh2nhy4pj-glibc-2.29/lib/libthread_db.so.1".
Core was generated by
`/tmp/guix-build-starpu-1.3.2.drv-0/source/tools/.libs/starpu_machine_display'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0 starpu_tree_get_neighbour (tree=tree@entry=0x41d400,
node=node@entry=0x41d400, visited=visited@entry=0x7fffffffc5e8 "\001",
present=present@entry=0x41d594 "\001") at core/tree.c:113
113 for(st = 0; st < father->arity; st++)
make check-TESTS
make[3]: Entering directory '/tmp/guix-build-starpu-1.3.2.drv-0/source/tools'
make[4]: Entering directory '/tmp/guix-build-starpu-1.3.2.drv-0/source/tools'
FAIL: starpu_machine_display
PASS: starpu_sched_display
PASS: starpu_perfmodel_display
PASS: starpu_perfmodel_plot
============================================================================
Testsuite summary for StarPU 1.3.1.99
============================================================================
# TOTAL: 4
# PASS: 3
# SKIP: 0
# XFAIL: 0
# FAIL: 1
# XPASS: 0
# ERROR: 0
============================================================================
See tools/test-suite.log
Please report to starpu-devel@lists.gforge.inria.fr
============================================================================
make[4]: *** [Makefile:1713: test-suite.log] Error 1
make[4]: Leaving directory '/tmp/guix-build-starpu-1.3.2.drv-0/source/tools'
make[3]: *** [Makefile:1821: check-TESTS] Error 2
make[3]: Leaving directory '/tmp/guix-build-starpu-1.3.2.drv-0/source/tools'
make[2]: *** [Makefile:1969: check-am] Error 2
make[2]: Leaving directory '/tmp/guix-build-starpu-1.3.2.drv-0/source/tools'
make[1]: *** [Makefile:1605: check-recursive] Error 1
make[1]: Leaving directory '/tmp/guix-build-starpu-1.3.2.drv-0/source/tools'
make: *** [Makefile:833: check-recursive] Error 1
Test suite failed, dumping logs.
--- ./tools/test-suite.log --------------------------------------------------
===========================================
StarPU 1.3.1.99: tools/test-suite.log
===========================================
# TOTAL: 4
# PASS: 3
# SKIP: 0
# XFAIL: 0
# FAIL: 1
# XPASS: 0
# ERROR: 0
.. contents:: :depth: 2
FAIL: starpu_machine_display
============================
[starpu][check_bus_config_file] No performance model for the bus,
calibrating...
[starpu][check_bus_config_file] ... done
[error] `./starpu_machine_display' killed with signal 11; test marked as
failed
warning: core file may not match specified executable file.
[New LWP 24848]
[Thread debugging using libthread_db enabled]
Using host libthread_db library
"/gnu/store/ahqgl4h89xqj695lgqvsaf6zh2nhy4pj-glibc-2.29/lib/libthread_db.so.1".
Core was generated by
`/tmp/guix-build-starpu-1.3.2.drv-0/source/tools/.libs/starpu_machine_display'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0 starpu_tree_get_neighbour (tree=tree@entry=0x41d400,
node=node@entry=0x41d400, visited=visited@entry=0x7fffffffc5e8 "\001",
present=present@entry=0x41d594 "\001") at core/tree.c:113
113 for(st = 0; st < father->arity; st++)
warning: File
"/gnu/store/2plcy91lypnbbysb18ymnhaw3zwk8pg1-gcc-7.4.0-lib/lib/libstdc++.so.6.0.24-gdb.py"
auto-loading has been declined by your `auto-load safe-path' set to
"$debugdir:$datadir/auto-load".
To enable execution of this file add
add-auto-load-safe-path
/gnu/store/2plcy91lypnbbysb18ymnhaw3zwk8pg1-gcc-7.4.0-lib/lib/libstdc++.so.6.0.24-gdb.py
line to your configuration file
"/tmp/guix-build-starpu-1.3.2.drv-0/source/.gdbinit".
To completely disable this security protection add
set auto-load safe-path /
line to your configuration file
"/tmp/guix-build-starpu-1.3.2.drv-0/source/.gdbinit".
For more information about this security protection see the
"Auto-loading safe path" section in the GDB manual. E.g., run from the shell:
info "(gdb)Auto-loading safe path"
#0 starpu_tree_get_neighbour (tree=tree@entry=0x41d400,
node=node@entry=0x41d400, visited=visited@entry=0x7fffffffc5e8 "\001",
present=present@entry=0x41d594 "\001") at core/tree.c:113
father = 0x0
st = 0
n = <optimized out>
#1 0x00007ffff7f2c290 in tree_has_next (workers=0x41d560, it=0x7fffffffc5d0)
at worker_collection/worker_tree.c:205
tree = 0x41d400
workerids = 0x41d7b0
nworkers = <optimized out>
w = <optimized out>
neighbour = <optimized out>
id = <optimized out>
__ptrs = <optimized out>
__n = <optimized out>
#2 0x00007ffff7eeb522 in lws_add_workers (sched_ctx_id=<optimized out>,
workerids=<optimized out>, nworkers=<optimized out>) at
sched_policies/work_stealing_policy.c:820
neighbour = <optimized out>
neigh_workerids = 0x41d7b0
neigh_nworkers = <optimized out>
w = <optimized out>
workerid = <optimized out>
bindid = <optimized out>
it = {cursor = 0, value = 0x41d400, possible_value = 0x0, visited =
"\001", '\000' <repeats 62 times>, possibly_parallel = -1}
cnt = 1
ws = 0x41f020
workers = 0x41d560
tree = 0x41d400
i = <optimized out>
__func__ = "lws_add_workers"
__PRETTY_FUNCTION__ = "lws_add_workers"
#3 0x00007ffff7edf28a in _starpu_add_workers_to_new_sched_ctx (nworkers=1,
workerids=<optimized out>, sched_ctx=0x7ffff7fc8958 <_starpu_config+110808>)
at core/sched_ctx.c:404
workers = <optimized out>
config = <optimized out>
_workerids = {0}
i = <optimized out>
workers = <optimized out>
config = <optimized out>
_workerids = <optimized out>
i = <optimized out>
workerid = <optimized out>
worker = <optimized out>
p_ret = <optimized out>
__ptrs = <optimized out>
__n = <optimized out>
p_ret = <optimized out>
__ptrs = <optimized out>
__n = <optimized out>
#4 _starpu_create_sched_ctx (policy=<optimized out>,
workerids=workerids@entry=0x0, nworkers_ctx=nworkers_ctx@entry=-1,
is_initial_sched=is_initial_sched@entry=1,
sched_ctx_name=sched_ctx_name@entry=0x7ffff7f624c3 "init",
min_prio_set=<optimized out>, min_prio=-1, max_prio_set=0, max_prio=-1,
awake_workers=1, sched_policy_init=0x0, user_data=0x0, nsub_ctxs=0,
sub_ctxs=0x0, nsms=0) at core/sched_ctx.c:642
config = <optimized out>
__PRETTY_FUNCTION__ = "_starpu_create_sched_ctx"
id = <optimized out>
sched_ctx = 0x7ffff7fc8958 <_starpu_config+110808>
nworkers = 1
i = <optimized out>
__func__ = "_starpu_create_sched_ctx"
#5 0x00007ffff7eb0885 in starpu_initialize
(user_conf=user_conf@entry=0x7fffffffcd60, argv=<optimized out>,
argc=<optimized out>) at core/workers.c:1404
selected_policy = <optimized out>
is_a_sink = <optimized out>
worker = <optimized out>
ret = <optimized out>
main_thread_cpuid = <optimized out>
main_thread_bind = <optimized out>
main_thread_activity = <optimized out>
p_ret = <optimized out>
__ptrs = <optimized out>
__n = <optimized out>
#6 0x00007ffff7eb1299 in starpu_initialize (user_conf=0x7fffffffcd60,
argc=<optimized out>, argv=<optimized out>) at core/workers.c:1194
worker = <optimized out>
ret = <optimized out>
main_thread_cpuid = <optimized out>
main_thread_bind = <optimized out>
main_thread_activity = <optimized out>
p_ret = <optimized out>
__ptrs = <optimized out>
__n = <optimized out>
p_ret = <optimized out>
__ptrs = <optimized out>
__n = <optimized out>
p_ret = <optimized out>
__ptrs = <optimized out>
__n = <optimized out>
p_ret = <optimized out>
__ptrs = <optimized out>
__n = <optimized out>
buf = <optimized out>
size = <optimized out>
copy = <optimized out>
__ptrs = <optimized out>
__n = <optimized out>
size = <optimized out>
copy = <optimized out>
__ptrs = <optimized out>
__n = <optimized out>
p_ret = <optimized out>
__ptrs = <optimized out>
__n = <optimized out>
p_ret = <optimized out>
__ptrs = <optimized out>
__n = <optimized out>
p_ret = <optimized out>
__ptrs = <optimized out>
__n = <optimized out>
p_ret = <optimized out>
p_ret = <optimized out>
selected_policy = <optimized out>
p_ret = <optimized out>
__ptrs = <optimized out>
__n = <optimized out>
p_ret = <optimized out>
__ptrs = <optimized out>
__n = <optimized out>
p_ret = <optimized out>
__ptrs = <optimized out>
__n = <optimized out>
#7 0x0000000000401275 in main (argc=<optimized out>, argv=<optimized out>)
at starpu_machine_display.c:130
ret = <optimized out>
force = 0
info = 0
conf = {magic = 42, sched_policy_name = 0x0, sched_policy = 0x0,
sched_policy_init = 0x0, ncpus = -1, reserve_ncpus = -1, ncuda = -1, nopencl
= -1, nmic = -1, nmpi_ms = -1, use_explicit_workers_bindid = 0,
workers_bindid = {0 <repeats 64 times>}, use_explicit_workers_cuda_gpuid = 0,
workers_cuda_gpuid = {0 <repeats 64 times>},
use_explicit_workers_opencl_gpuid = 0, workers_opencl_gpuid = {0 <repeats 64
times>}, use_explicit_workers_mic_deviceid = 0, workers_mic_deviceid = {0
<repeats 64 times>}, use_explicit_workers_mpi_ms_deviceid = 0,
workers_mpi_ms_deviceid = {0 <repeats 64 times>}, bus_calibrate = 0,
calibrate = 0, single_combined_worker = 0, mic_sink_program_path = 0x0,
disable_asynchronous_copy = 0, disable_asynchronous_cuda_copy = 0,
disable_asynchronous_opencl_copy = 0, disable_asynchronous_mic_copy = 0,
disable_asynchronous_mpi_ms_copy = 0, cuda_opengl_interoperability = 0x0,
n_cuda_opengl_interoperability = 0, not_launched_drivers = 0x0,
n_not_launched_drivers = 0, trace_buffer_size = 67108864,
global_sched_ctx_min_priority = -1, global_sched_ctx_max_priority = -1,
catch_signals = 1}
Thread 1 (Thread 0x7ffff7925b80 (LWP 24848)):
#0 starpu_tree_get_neighbour (tree=tree@entry=0x41d400,
node=node@entry=0x41d400, visited=visited@entry=0x7fffffffc5e8 "\001",
present=present@entry=0x41d594 "\001") at core/tree.c:113
father = 0x0
st = 0
n = <optimized out>
#1 0x00007ffff7f2c290 in tree_has_next (workers=0x41d560, it=0x7fffffffc5d0)
at worker_collection/worker_tree.c:205
tree = 0x41d400
workerids = 0x41d7b0
nworkers = <optimized out>
w = <optimized out>
neighbour = <optimized out>
id = <optimized out>
__ptrs = <optimized out>
__n = <optimized out>
#2 0x00007ffff7eeb522 in lws_add_workers (sched_ctx_id=<optimized out>,
workerids=<optimized out>, nworkers=<optimized out>) at
sched_policies/work_stealing_policy.c:820
neighbour = <optimized out>
neigh_workerids = 0x41d7b0
neigh_nworkers = <optimized out>
w = <optimized out>
workerid = <optimized out>
bindid = <optimized out>
it = {cursor = 0, value = 0x41d400, possible_value = 0x0, visited =
"\001", '\000' <repeats 62 times>, possibly_parallel = -1}
cnt = 1
ws = 0x41f020
workers = 0x41d560
tree = 0x41d400
i = <optimized out>
__func__ = "lws_add_workers"
__PRETTY_FUNCTION__ = "lws_add_workers"
#3 0x00007ffff7edf28a in _starpu_add_workers_to_new_sched_ctx (nworkers=1,
workerids=<optimized out>, sched_ctx=0x7ffff7fc8958 <_starpu_config+110808>)
at core/sched_ctx.c:404
workers = <optimized out>
config = <optimized out>
_workerids = {0}
i = <optimized out>
workers = <optimized out>
config = <optimized out>
_workerids = <optimized out>
i = <optimized out>
workerid = <optimized out>
worker = <optimized out>
p_ret = <optimized out>
__ptrs = <optimized out>
__n = <optimized out>
p_ret = <optimized out>
__ptrs = <optimized out>
__n = <optimized out>
#4 _starpu_create_sched_ctx (policy=<optimized out>,
workerids=workerids@entry=0x0, nworkers_ctx=nworkers_ctx@entry=-1,
is_initial_sched=is_initial_sched@entry=1,
sched_ctx_name=sched_ctx_name@entry=0x7ffff7f624c3 "init",
min_prio_set=<optimized out>, min_prio=-1, max_prio_set=0, max_prio=-1,
awake_workers=1, sched_policy_init=0x0, user_data=0x0, nsub_ctxs=0,
sub_ctxs=0x0, nsms=0) at core/sched_ctx.c:642
config = <optimized out>
__PRETTY_FUNCTION__ = "_starpu_create_sched_ctx"
id = <optimized out>
sched_ctx = 0x7ffff7fc8958 <_starpu_config+110808>
nworkers = 1
i = <optimized out>
__func__ = "_starpu_create_sched_ctx"
#5 0x00007ffff7eb0885 in starpu_initialize
(user_conf=user_conf@entry=0x7fffffffcd60, argv=<optimized out>,
argc=<optimized out>) at core/workers.c:1404
selected_policy = <optimized out>
is_a_sink = <optimized out>
worker = <optimized out>
ret = <optimized out>
main_thread_cpuid = <optimized out>
main_thread_bind = <optimized out>
main_thread_activity = <optimized out>
p_ret = <optimized out>
__ptrs = <optimized out>
__n = <optimized out>
#6 0x00007ffff7eb1299 in starpu_initialize (user_conf=0x7fffffffcd60,
argc=<optimized out>, argv=<optimized out>) at core/workers.c:1194
worker = <optimized out>
ret = <optimized out>
main_thread_cpuid = <optimized out>
main_thread_bind = <optimized out>
main_thread_activity = <optimized out>
p_ret = <optimized out>
__ptrs = <optimized out>
__n = <optimized out>
p_ret = <optimized out>
__ptrs = <optimized out>
__n = <optimized out>
p_ret = <optimized out>
__ptrs = <optimized out>
__n = <optimized out>
p_ret = <optimized out>
__ptrs = <optimized out>
__n = <optimized out>
buf = <optimized out>
size = <optimized out>
copy = <optimized out>
__ptrs = <optimized out>
__n = <optimized out>
size = <optimized out>
copy = <optimized out>
__ptrs = <optimized out>
__n = <optimized out>
p_ret = <optimized out>
__ptrs = <optimized out>
__n = <optimized out>
p_ret = <optimized out>
__ptrs = <optimized out>
__n = <optimized out>
p_ret = <optimized out>
__ptrs = <optimized out>
__n = <optimized out>
p_ret = <optimized out>
p_ret = <optimized out>
selected_policy = <optimized out>
p_ret = <optimized out>
__ptrs = <optimized out>
__n = <optimized out>
p_ret = <optimized out>
__ptrs = <optimized out>
__n = <optimized out>
p_ret = <optimized out>
__ptrs = <optimized out>
__n = <optimized out>
#7 0x0000000000401275 in main (argc=<optimized out>, argv=<optimized out>)
at starpu_machine_display.c:130
ret = <optimized out>
force = 0
info = 0
conf = {magic = 42, sched_policy_name = 0x0, sched_policy = 0x0,
sched_policy_init = 0x0, ncpus = -1, reserve_ncpus = -1, ncuda = -1, nopencl
= -1, nmic = -1, nmpi_ms = -1, use_explicit_workers_bindid = 0,
workers_bindid = {0 <repeats 64 times>}, use_explicit_workers_cuda_gpuid = 0,
workers_cuda_gpuid = {0 <repeats 64 times>},
use_explicit_workers_opencl_gpuid = 0, workers_opencl_gpuid = {0 <repeats 64
times>}, use_explicit_workers_mic_deviceid = 0, workers_mic_deviceid = {0
<repeats 64 times>}, use_explicit_workers_mpi_ms_deviceid = 0,
workers_mpi_ms_deviceid = {0 <repeats 64 times>}, bus_calibrate = 0,
calibrate = 0, single_combined_worker = 0, mic_sink_program_path = 0x0,
disable_asynchronous_copy = 0, disable_asynchronous_cuda_copy = 0,
disable_asynchronous_opencl_copy = 0, disable_asynchronous_mic_copy = 0,
disable_asynchronous_mpi_ms_copy = 0, cuda_opengl_interoperability = 0x0,
n_cuda_opengl_interoperability = 0, not_launched_drivers = 0x0,
n_not_launched_drivers = 0, trace_buffer_size = 67108864,
global_sched_ctx_min_priority = -1, global_sched_ctx_max_priority = -1,
catch_signals = 1}
#Execution_time_in_seconds 0.562179 ./starpu_machine_display
FAIL starpu_machine_display (exit status: 1)
command "make" "check" failed with status 2
builder for `/gnu/store/3ib79dl1wqrg5whvnf109awkcbrijn1y-starpu-1.3.2.drv'
failed with exit code 1
la compilation de
/gnu/store/3ib79dl1wqrg5whvnf109awkcbrijn1y-starpu-1.3.2.drv a échoué
Aucun journal de compilation pour «
/gnu/store/3ib79dl1wqrg5whvnf109awkcbrijn1y-starpu-1.3.2.drv ».
guix build: erreur : build of
`/gnu/store/3ib79dl1wqrg5whvnf109awkcbrijn1y-starpu-1.3.2.drv' failed
I'm attaching the log of the error (starting at the check step).
Here is my guix environment (the same on all 5 machines), if it may
help.
eagullo@tek9:/tmp/toto$ guix describe --format=channels
(list (channel
(name 'guix-hpc-non-free)
(url "https://gitlab.inria.fr/guix-hpc/guix-hpc-non-free.git")
(commit
"0822b7eebb63e2360125e017934ba817686d9669"))
(channel
(name 'guix-hpc)
(url "https://gitlab.inria.fr/guix-hpc/guix-hpc.git")
(commit
"b24a8fa55bfd4a0987320e0e45d3a85f0920427c"))
(channel
(name 'guix)
(url "https://git.savannah.gnu.org/git/guix.git")
(commit
"bfaa06171842225c8bf4c839e35ac36a0b2c2d59")))
In case you cannot reproduce the bug on one of your machines,
and you need to check it out more closely, we can do that with my
account on this particular plafrim partition if you want to.
Note that there is no emergency to track it if it's not an obvious
issue I am missing (I can simply retrieve the pre-built binaries
with "guix build starpu", i.e. without the "--no-grafts --check"
options).
Thanks much in advance for you help once again!
Best,
Manu
- [Starpu-devel] starpu / guix on plafrim partition dedicated for the November HPC school, Emmanuel Agullo, 22/10/2019
- Re: [Starpu-devel] starpu / guix on plafrim partition dedicated for the November HPC school, Nathalie Furmento, 23/10/2019
- Re: [Starpu-devel] starpu / guix on plafrim partition dedicated for the November HPC school, Emmanuel Agullo, 23/10/2019
- Re: [Starpu-devel] starpu / guix on plafrim partition dedicated for the November HPC school, Emmanuel Agullo, 24/10/2019
- Re: [Starpu-devel] starpu / guix on plafrim partition dedicated for the November HPC school, Nathalie Furmento, 25/10/2019
- Re: [Starpu-devel] starpu / guix on plafrim partition dedicated for the November HPC school, Emmanuel Agullo, 25/10/2019
- Re: [Starpu-devel] starpu / guix on plafrim partition dedicated for the November HPC school, Ludovic Courtès, 25/10/2019
- Re: [Starpu-devel] starpu / guix on plafrim partition dedicated for the November HPC school, Nathalie Furmento, 25/10/2019
- Re: [Starpu-devel] starpu / guix on plafrim partition dedicated for the November HPC school, Emmanuel Agullo, 24/10/2019
- Re: [Starpu-devel] starpu / guix on plafrim partition dedicated for the November HPC school, Emmanuel Agullo, 23/10/2019
- Re: [Starpu-devel] starpu / guix on plafrim partition dedicated for the November HPC school, Nathalie Furmento, 23/10/2019
Archives gérées par MHonArc 2.6.19+.