Accéder au contenu.
Menu Sympa

starpu-devel - [Starpu-devel] Problem with inserting tasks to default scheduling context after a parallel scheduling context has been created

Objet : Developers list for StarPU

Archives de la liste

[Starpu-devel] Problem with inserting tasks to default scheduling context after a parallel scheduling context has been created


Chronologique Discussions 
  • From: Mirko Myllykoski <mirkom@cs.umu.se>
  • To: Starpu Devel <starpu-devel@lists.gforge.inria.fr>
  • Subject: [Starpu-devel] Problem with inserting tasks to default scheduling context after a parallel scheduling context has been created
  • Date: Fri, 26 Oct 2018 15:53:47 +0200
  • Authentication-results: mail3-smtp-sop.national.inria.fr; spf=None smtp.pra=mirkom@cs.umu.se; spf=Pass smtp.mailfrom=mirkom@cs.umu.se; spf=None smtp.helo=postmaster@mail.cs.umu.se
  • Ironport-phdr: 9a23:qsg9QhKU4tKdoe7ystmcpTZWNBhigK39O0sv0rFitYgeKv7xwZ3uMQTl6Ol3ixeRBMOHs60C07KempujcFRI2YyGvnEGfc4EfD4+ouJSoTYdBtWYA1bwNv/gYn9yNs1DUFh44yPzahANS47xaFLIv3K98yMZFAnhOgppPOT1HZPZg9iq2+yo9JDffwdFiCChbb9uMR67sRjfus4KjIV4N60/0AHJonxGe+RXwWNnO1eelAvi68mz4ZBu7T1et+ou+MBcX6r6eb84TaFDAzQ9L281/szrugLdQgaJ+3ART38ZkhtMAwjC8RH6QpL8uTb0u+ZhxCWXO9D9QKsqUjq+8ahkVB7oiD8GNzEn9mHXltdwh79frB64uhBz35LYbISTOfFjfK3SYMkaSHJOUclNWSJPAp2yYZYMAesOM+lVtIz9q0cMrRakGQWhHv3jxzlVjXH2x6061OEhHBnB0gwhBdIOs3PUp8jyOqYSVeC1yKnJzTbEb/NN2jf96ZXDfxckofGNR7Jwcs3RyUw0GgzZlVWcs5HlPzaI1ugXqGiU8fNtWOSygGAkswF8uiWjy8kwhoXTmI4YxFTJ+T92zYopP9G0Vk52bca6HJdMqy2WKo57T8I5TG10vSs11LgLtJGncCUF1JgqwhvSZv2EfoWO/xntTvyeIS1ii3JgYL+/hwi98UynyuDkU8m7yldKri5fntbQrXABzQHT6s2aSvdn5Ueuxy6D1wHV6u5aPUA5jbfXJpA9zrIqiJYev0DOEjX5lUnqlqOaaEUp9vCt6+v9Y7XmopGcN5VzigH7Kqkun82/Af47MggJWmiW4viz1Kb58U3kRbVKk+c6krLHv5zCP8QUura5AxNJ0oYk8xu/FCum384CnXkfMVJJYQ+IgJb3O17QJPD1FvO/g1W3kDd33PDKJLLhApvKLnjZn7fuY6xx609ayAopzNBQ/YhYCr8bIKG7Zkikj8DRFAckeyC53evjQIFmyooEQX/KDqKHPaf6tV6T+vlpLOeLfoAY/jf7MfksofD02ywXg1gYKIyox5gQIFWpGvB3IEKCYnuk1tIAC2QNuyI1V6r3jUDESjMFNCX6ZL41+jxuUNHuNozEXI34xeXZhH7qTK0TXXhPDxW3KVmtcoyFX/kWbyfLe51qiXoZUKXnUIJzjEjy5j+/8KJuK6/vwgNdrYjqjYEn7PaViBQvsyd5XZzEjjO9Clpsl2ZNfAcYmaBypUsklAWG2Kl8xfdDU8FW+rVSX1ViOA==
  • List-archive: <http://lists.gforge.inria.fr/pipermail/starpu-devel/>
  • List-id: "Developers list. For discussion of new features, code changes, etc." <starpu-devel.lists.gforge.inria.fr>

Hi,

I am experiencing problems when inserting tasks to the default scheduling context after I have created and destroyed a parallel scheduling context.

Here is how to replicate the bug with StarPU 1.2.6:

1) Initialize StarPU with conf::sched_policy_name set to "dmdas" or to some other scheduling policy that takes advantage of StarPU's performance models.

2) Create a scheduling context that uses either the peager or the pheft scheduling policy.

3) Destroy the newly created scheduling context. You don't have to insert any tasks to this context.

4) Insert tasks to the default scheduling context. This triggers the following assert:
a.out: sched_policies/deque_modeling_policy_data_aware.c:590: compute_all_performance_predictions: Assertion `fifo != ((void *)0)' failed.

I have attached a test program and a matching output/backtrace to this email.

I have been told that the pheft scheduling policy is considered deprecated but I was hoping that at least the peager scheduling policy would stay usable. I am using both scheduling policies in a library I (and many other) have been developing.

Best Regards,
Mirko Myllykoski

#include <unistd.h>
#include <starpu.h>

static void kernel(void *buffers[], void *cl_args) {}

static struct starpu_codelet codelet = {
    .name = "codelet",
    .cpu_funcs = { kernel },
    .cuda_funcs = { kernel }
};

int main()
{
    struct starpu_conf conf;
    starpu_conf_init(&conf);
    
    // the default scheduling context must take advantage of StarPU's 
    // performance modelling
    conf.sched_policy_name = "dmdas";
    
    if (starpu_init(&conf) != 0)
        return 1;
    
    {
        struct starpu_worker_collection *collection = 
            starpu_sched_ctx_get_worker_collection(0);
	    printf("collection->nworkers = %d\n", collection->nworkers);
	}
    
    //
    // insert some tasks to the default scheduling context
    //
    
    for (int i = 0; i < 10; i++)
        starpu_task_insert(&codelet, 0);
        
    starpu_task_wait_for_all();
    
    //
    // create a parallel scheduler (peager or pheft) that contains all workers
    //
    
    int workers[STARPU_NMAXWORKERS];
    int worker_count = starpu_worker_get_ids_by_type(
        STARPU_CPU_WORKER, workers, STARPU_NMAXWORKERS);
 
    unsigned ctx = starpu_sched_ctx_create(workers, worker_count, "ctx", 
        STARPU_SCHED_CTX_POLICY_NAME, "peager", 0);

    {
        struct starpu_worker_collection *collection = 
            starpu_sched_ctx_get_worker_collection(0);
	    printf("collection->nworkers = %d\n", collection->nworkers);
	}

    starpu_task_wait_for_all_in_ctx(ctx);
    starpu_sched_ctx_delete(ctx);

    {
        struct starpu_worker_collection *collection = 
            starpu_sched_ctx_get_worker_collection(0);
	    printf("collection->nworkers = %d\n", collection->nworkers);
	}
    
    //
    // insert some tasks to the default scheduling context
    //
    // THE FAILURE OCCURS HERE!
    //
    
    for (int i = 0; i < 10; i++)
        starpu_task_insert(&codelet, 0);
        
    starpu_task_wait_for_all();

    starpu_shutdown();

    return 0;
}
GNU gdb (Ubuntu 8.1-0ubuntu3) 8.1.0.20180409-git
Copyright (C) 2018 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law. Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
<http://www.gnu.org/software/gdb/documentation/>.
For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from ./a.out...done.
Starting program: /tmp/test/a.out
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
[starpu][starpu_initialize] Warning: StarPU was configured with
--enable-debug (-O0), and is thus not optimized
[starpu][starpu_initialize] Warning: StarPU was configured with
--enable-spinlock-check, which slows down a bit
[starpu][starpu_initialize] Warning: StarPU was configured with
--enable-verbose, which slows down a bit
[starpu][starpu_initialize] Warning: StarPU was configured with --with-fxt,
which slows down a bit, limits scalability and makes worker initialization
sequential
[New Thread 0x7fffdfede700 (LWP 16633)]
[starpu][load_bus_affinity_file_content] loading affinities from
/home/mirkom/.starpu/sampling//bus/gandalf.affinity
[starpu][load_bus_latency_file_content] loading latencies from
/home/mirkom/.starpu/sampling//bus/gandalf.latency
[starpu][load_bus_bandwidth_file_content] loading bandwidth from
/home/mirkom/.starpu/sampling//bus/gandalf.bandwidth
[starpu][_starpu_init_workers_binding] worker 0 type 1 devid 0 bound to cpu
0, STARPU memory node 1
[starpu][_starpu_init_workers_binding] worker 1 type 0 devid 0 bound to cpu
1, STARPU memory node 0
[starpu][_starpu_init_workers_binding] worker 2 type 0 devid 1 bound to cpu
2, STARPU memory node 0
[starpu][_starpu_init_workers_binding] worker 3 type 0 devid 2 bound to cpu
3, STARPU memory node 0
[starpu][load_sched_policy] Use dmdas scheduler (data-aware performance model
(sorted))
[starpu][_starpu_launch_drivers] initialising worker 0/4
[New Thread 0x7fffd7fff700 (LWP 16634)]
[starpu][_starpu_driver_start] worker 0x7ffff7b7a6f0 0 for dev 0 is ready on
logical cpu 0
[starpu][_starpu_driver_start] worker 0x7ffff7b7a6f0 0 cpuset start at 0
[starpu][_starpu_launch_drivers] initialising worker 1/4
[New Thread 0x7fffd77fe700 (LWP 16635)]
[New Thread 0x7fffd6ffd700 (LWP 16636)]
[starpu][_starpu_launch_drivers] initialising worker 2/4
[starpu][_starpu_driver_start] worker 0x7ffff7b7ab18 1 for dev 0 is ready on
logical cpu 1
[starpu][_starpu_driver_start] worker 0x7ffff7b7ab18 1 cpuset start at 1
[starpu][_starpu_memory_manager_set_global_memory_size] Global size for node
0 is 67416805376
[New Thread 0x7fffd67fc700 (LWP 16637)]
[starpu][_starpu_launch_drivers] initialising worker 3/4
[starpu][_starpu_driver_start] worker 0x7ffff7b7af40 2 for dev 1 is ready on
logical cpu 2
[starpu][_starpu_driver_start] worker 0x7ffff7b7af40 2 cpuset start at 2
[New Thread 0x7fffd5ffb700 (LWP 16638)]
[starpu][_starpu_launch_drivers] waiting for worker 0 initialization
[starpu][_starpu_driver_start] worker 0x7ffff7b7b368 3 for dev 2 is ready on
logical cpu 3
[starpu][_starpu_driver_start] worker 0x7ffff7b7b368 3 cpuset start at 3
[New Thread 0x7fffd57fa700 (LWP 16639)]
[starpu][_starpu_cuda_limit_gpu_mem_if_needed] CUDA device 0: Wasting 404 MB
/ Limit 3633 MB / Total 4037 MB / Remains 3633 MB
[starpu][_starpu_memory_manager_set_global_memory_size] Global size for node
1 is 3809476608
[starpu][_starpu_cuda_driver_init] cuda (GeForce GTX 1050 Ti) dev id 0 worker
0 thread is ready to run on CPU 0 !
[starpu][_starpu_launch_drivers] waiting for worker 1 initialization
[starpu][_starpu_launch_drivers] waiting for worker 2 initialization
[starpu][_starpu_launch_drivers] waiting for worker 3 initialization
[starpu][_starpu_launch_drivers] finished launching drivers
[starpu][starpu_initialize] Initialisation finished
collection->nworkers = 4
[starpu][_starpu_task_wait_for_all_and_return_nb_waited_tasks] Waiting for
tasks submitted to context 0
[starpu][execute_job_on_cuda] Warning: starpu_cuda_get_local_stream() was not
used to submit kernel to CUDA on worker 0. CUDA will thus introduce a lot of
useless synchronizations, which will prevent proper overlapping of data
transfers and kernel execution. See the CUDA-specific part of the 'Check List
When Performance Are Not There' of the StarPU handbook
[starpu][load_sched_policy] Use peager scheduler (parallel eager policy)
[starpu][find_and_assign_combinations] Looking at 63GB
[starpu][find_and_assign_combinations] Looking at
[starpu][find_and_assign_combinations] Looking at 6144KB
[starpu][find_workers] worker 1 is part of it
[starpu][find_workers] worker 2 is part of it
[starpu][find_workers] worker 3 is part of it
[starpu][find_and_assign_combinations] Adding it
[starpu][synthesize_intermediate_workers] 3 children > 2, synthesizing
intermediate combined workers of size 2
[starpu][synthesize_intermediate_workers] child 1
[starpu][find_workers] worker 1 is part of it
[starpu][synthesize_intermediate_workers] child 2
[starpu][find_workers] worker 2 is part of it
[starpu][synthesize_intermediate_workers] Adding it
[starpu][synthesize_intermediate_workers] child 3
[starpu][find_workers] worker 3 is part of it
[starpu][find_and_assign_combinations] Looking at 256KB
[starpu][find_and_assign_combinations] Looking at 32KB
[starpu][find_and_assign_combinations] Looking at
[starpu][find_and_assign_combinations] Looking at
[starpu][find_workers] worker 1 is part of it
[starpu][find_and_assign_combinations] Looking at 256KB
[starpu][find_and_assign_combinations] Looking at 32KB
[starpu][find_and_assign_combinations] Looking at
[starpu][find_and_assign_combinations] Looking at
[starpu][find_workers] worker 2 is part of it
[starpu][find_and_assign_combinations] Looking at 256KB
[starpu][find_and_assign_combinations] Looking at 32KB
[starpu][find_and_assign_combinations] Looking at
[starpu][find_and_assign_combinations] Looking at
[starpu][find_workers] worker 3 is part of it
collection->nworkers = 6
collection->nworkers = 6
/home/mirkom/.starpu_install/starpu_1.2.6_debug/lib/libstarpu-1.2.so.5(+0x9e401)[0x7ffff7803401]
/home/mirkom/.starpu_install/starpu_1.2.6_debug/lib/libstarpu-1.2.so.5(+0x9f1e6)[0x7ffff78041e6]
/home/mirkom/.starpu_install/starpu_1.2.6_debug/lib/libstarpu-1.2.so.5(+0x9f81d)[0x7ffff780481d]
/home/mirkom/.starpu_install/starpu_1.2.6_debug/lib/libstarpu-1.2.so.5(_starpu_push_task_to_workers+0xbb1)[0x7ffff77e7635]
/home/mirkom/.starpu_install/starpu_1.2.6_debug/lib/libstarpu-1.2.so.5(_starpu_repush_task+0x598)[0x7ffff77e6a4f]
/home/mirkom/.starpu_install/starpu_1.2.6_debug/lib/libstarpu-1.2.so.5(_starpu_push_task+0x5e)[0x7ffff77e64a1]
/home/mirkom/.starpu_install/starpu_1.2.6_debug/lib/libstarpu-1.2.so.5(_starpu_enforce_deps_and_schedule+0x374)[0x7ffff7798a55]
/home/mirkom/.starpu_install/starpu_1.2.6_debug/lib/libstarpu-1.2.so.5(_starpu_submit_job+0x2be)[0x7ffff779b1d7]
/home/mirkom/.starpu_install/starpu_1.2.6_debug/lib/libstarpu-1.2.so.5(starpu_task_submit+0x9b2)[0x7ffff779ca42]
/home/mirkom/.starpu_install/starpu_1.2.6_debug/lib/libstarpu-1.2.so.5(+0xfcf83)[0x7ffff7861f83]
/home/mirkom/.starpu_install/starpu_1.2.6_debug/lib/libstarpu-1.2.so.5(starpu_task_insert+0xb1)[0x7ffff7862137]
/tmp/test/a.out(+0xd02)[0x555555554d02]
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xe7)[0x7ffff7395b97]
/tmp/test/a.out(+0xa6a)[0x555555554a6a]

[starpu][compute_all_performance_predictions][assert failure] worker 4 ctx 0


a.out: sched_policies/deque_modeling_policy_data_aware.c:590:
compute_all_performance_predictions: Assertion `fifo != ((void *)0)' failed.

Thread 1 "a.out" received signal SIGABRT, Aborted.
__GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:51
51 ../sysdeps/unix/sysv/linux/raise.c: No such file or directory.
#0 __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:51
#1 0x00007ffff73b4801 in __GI_abort () at abort.c:79
#2 0x00007ffff73a439a in __assert_fail_base (fmt=0x7ffff752b7d8 "%s%s%s:%u:
%s%sAssertion `%s' failed.\n%n", assertion=assertion@entry=0x7ffff78ca74a
"fifo != ((void *)0)", file=file@entry=0x7ffff78ca588
"sched_policies/deque_modeling_policy_data_aware.c", line=line@entry=590,
function=function@entry=0x7ffff78cab60 <__PRETTY_FUNCTION__.28490>
"compute_all_performance_predictions") at assert.c:92
#3 0x00007ffff73a4412 in __GI___assert_fail (assertion=0x7ffff78ca74a "fifo
!= ((void *)0)", file=0x7ffff78ca588
"sched_policies/deque_modeling_policy_data_aware.c", line=590,
function=0x7ffff78cab60 <__PRETTY_FUNCTION__.28490>
"compute_all_performance_predictions") at assert.c:101
#4 0x00007ffff7803482 in compute_all_performance_predictions
(task=0x55555589a930, nworkers=6, local_task_length=0x7fffffffc5b0,
exp_end=0x7fffffffc340, max_exp_endp=0x7fffffffc758,
best_exp_endp=0x7fffffffc750, local_data_penalty=0x7fffffffc4e0,
local_energy=0x7fffffffc410, forced_worker=0x7fffffffc720,
forced_impl=0x7fffffffc724, sched_ctx_id=0,
sorted_decision=0) at
sched_policies/deque_modeling_policy_data_aware.c:590
#5 0x00007ffff78041e6 in _dmda_push_task (task=0x55555589a930, prio=1,
sched_ctx_id=0, simulate=0, sorted_decision=0) at
sched_policies/deque_modeling_policy_data_aware.c:765
#6 0x00007ffff780481d in dmda_push_sorted_task (task=0x55555589a930) at
sched_policies/deque_modeling_policy_data_aware.c:877
#7 0x00007ffff77e7635 in _starpu_push_task_to_workers (task=0x55555589a930)
at core/sched_policy.c:586
#8 0x00007ffff77e6a4f in _starpu_repush_task (j=0x55555589aad0) at
core/sched_policy.c:466
#9 0x00007ffff77e64a1 in _starpu_push_task (j=0x55555589aad0) at
core/sched_policy.c:399
#10 0x00007ffff7798a55 in _starpu_enforce_deps_and_schedule
(j=0x55555589aad0) at core/jobs.c:636
#11 0x00007ffff779b1d7 in _starpu_submit_job (j=0x55555589aad0) at
core/task.c:370
#12 0x00007ffff779ca42 in starpu_task_submit (task=0x55555589a930) at
core/task.c:700
#13 0x00007ffff7861f83 in _starpu_task_insert_v (cl=0x555555756020 <codelet>,
varg_list=0x7fffffffd1f0) at util/starpu_task_insert.c:142
#14 0x00007ffff7862137 in starpu_task_insert (cl=0x555555756020 <codelet>) at
util/starpu_task_insert.c:164
#15 0x0000555555554d02 in main () at main.c:72


  • [Starpu-devel] Problem with inserting tasks to default scheduling context after a parallel scheduling context has been created, Mirko Myllykoski, 26/10/2018

Archives gérées par MHonArc 2.6.19+.

Haut de le page