Accéder au contenu.
Menu Sympa

starpu-devel - Re: [Starpu-devel] Issue with lws

Objet : Developers list for StarPU

Archives de la liste

Re: [Starpu-devel] Issue with lws


Chronologique Discussions 
  • From: Mathieu Faverge <mathieu.faverge@inria.fr>
  • To: Samuel Thibault <samuel.thibault@inria.fr>, starpu-devel@lists.gforge.inria.fr
  • Subject: Re: [Starpu-devel] Issue with lws
  • Date: Thu, 20 Sep 2018 15:15:02 +0200
  • List-archive: <http://lists.gforge.inria.fr/pipermail/starpu-devel/>
  • List-id: "Developers list. For discussion of new features, code changes, etc." <starpu-devel.lists.gforge.inria.fr>

Ok, let's try to summarize correctly :)

With --enable-debug and everything else by default on the 1.2.5 branch: Over 120 runs, I still triggered 4 times the same assert, and got 1 deadlock

With the second solution you suggest: select_victim returns -1, and comment the test in lws_push_task to always call select_worker:
      I didn't get a single error in 200 runs with or without --enable-debug

With the 1.3 branch, it seems to work but it so slow !!!!! I went from 2-3 s per run to 10s.

Mathieu

Le 18/09/2018 à 11:22, Samuel Thibault a écrit :
Mathieu Faverge, le sam. 15 sept. 2018 15:10:22 +0200, a ecrit:
Still the same asserts, the others you added are not triggered.
Ok... Did you try with --enable-debug in case the additional assertions
catch something?

Another test worth trying is to disable task stealing by making
select_victim() return -1, but using round-robin push instead of
locality push: in lws_push_task, comment the if this way:

if (workerid == -1)
workerid = starpu_worker_get_id();

/* If the current thread is not a worker but
* the main thread (-1) or the current worker is not in the target
* context, we find the better one to put task on its queue */
// if (workerid == -1 || !starpu_sched_ctx_contains_worker(workerid,
sched_ctx_id) ||
// !starpu_worker_can_execute_task_first_impl(workerid,
task, NULL))
workerid = select_worker(task, sched_ctx_id);

so that lws will always select various workers, thus scrambling the task
distribution. If that works too, it's really the task stealing mechanism
which has an issue, even if my glaring the source code didn't let me
find anything odd with it :)

Samuel



--
--
Mathieu Faverge
Maitre de conférence / Associate Professor
Institut Polytechnique de Bordeaux - ENSEIRB-Matmeca
INRIA Bordeaux - Sud-Ouest, HiePACS Team
200 avenue de la vielle tour
33405 Talence Cedex
Phone: (+33) 5 24 57 40 73





Archives gérées par MHonArc 2.6.19+.

Haut de le page