Objet : Developers list for StarPU
Archives de la liste
- From: Xavier Lacoste <xavier.lacoste@inria.fr>
- To: Samuel Thibault <samuel.thibault@ens-lyon.org>
- Cc: starpu-devel@lists.gforge.inria.fr
- Subject: Re: [Starpu-devel] hang with 8659
- Date: Tue, 19 Feb 2013 15:34:50 +0100
- List-archive: <http://lists.gforge.inria.fr/pipermail/starpu-devel>
- List-id: "Developers list. For discussion of new features, code changes, etc." <starpu-devel.lists.gforge.inria.fr>
If I understand well StarPU code shouldn't it be the _starpu_pop_task which
decrement nready counter ?
Maybe it could be done like that ?
_starpu_push_task() increment nready
_starpu_pop_task() decrement nready
_starpu_cuda_driver_once() increment if task can be executed localy
_starpu_handle_job_termination() decrement nready
XL.
Le 19 févr. 2013 à 15:16, Xavier Lacoste a écrit :
> Hello,
>
> I now understand my deadlock (I solved some problems coming from my code):
>
> The CUDA drivers steal a task he cannot run.
>
> Then he enter this part of the code of _starpu_cuda_driver_run_once:
>
> /* can CUDA do that task ? */
> if (!_STARPU_CUDA_MAY_PERFORM(j))
> {
> /* this is neither a cuda or a cublas task */
> _starpu_push_task(j);
> return 0;
> }
>
> The task is pushed into the local queue and nready is incremented but it
> has not been decremented.
> (in the code I sended the task was put back into original queue, I changed
> it to go to the stealer queue, don't know what is best. The change is here :
> int workerid = task->workerid;
> - if (workerid == -1)
> + if (starpu_worker_get_id() != -1)
> workerid = starpu_worker_get_id();
> )
>
> So the problem is that the CUDA worker keeps incrementing nready by calling
> _starpu_push_task.
>
> Do you think there is something missing in my scheduler to solve that ?
>
> Thanks,
>
> XL.
>
>
> Le 19 févr. 2013 à 11:05, Xavier Lacoste a écrit :
>
>> Hello,
>>
>> More info about my scheduler deadlock.
>>
>> When I trace _starpu_increment_nready / _starpu_decrement_nready calls I
>> see that the cuda worker increment the counter but never decrement it,
>> whereas CPU workers always do (++, --) couples.
>>
>> On that small case, the GPU stole 3 tasks from CPU and did 6 increment on
>> nready and no decrement.
>>
>> Any Idea of what I missed ?
>>
>> XL.
>>
>>
>> Le 15 févr. 2013 à 15:26, Xavier Lacoste a écrit :
>>
>>> My deadlock is coming from the fact that the CUDA worker starts with no
>>> task (I think, i'm not exactly sure).
>>> I don't explain why this is an issue with my modified work stealing
>>> policy but that is my conclusion so far (Only CPUs => OK, 1 CUDA =>
>>> deadlock).
>>>
>>> I didn't get any deadlock with my scheduler before because I wasn't using
>>> GPUs, nothing linked to the StarPU revision sorry for my bad report...
>>>
>>> XL.
>>>
>>> Le 15 févr. 2013 à 11:28, Xavier Lacoste a écrit :
>>>
>>>> Here is my code,
>>>>
>>>> I have lost the working revision number I'll try to find it.
>>>>
>>>> I don't know how to call this as I cannot call underline prefixed
>>>> function outside from starpu, but if it doesn't really matter i can do
>>>> without it.
>>>>
>>>> XL.
>>>>
>>>> <starpu_pastix_sched_policy.c>
>>>> Le 15 févr. 2013 à 11:11, Samuel Thibault a écrit :
>>>>
>>>>> Xavier Lacoste, le Fri 15 Feb 2013 11:02:35 +0100, a écrit :
>>>>>> I see that
>>>>>>
>>>>>> _starpu_push_task_end(task);
>>>>>>
>>>>>> was added in commit 8462
>>>>>>
>>>>>> Do I need an API available equivalent in my push_task() code?
>>>>>
>>>>> Ideally yes, but it probably doesn't have to do with the bug at stake,
>>>>> as it's only about some accounting, and about bundles. Put it right
>>>>> after you have pushed the task to some list.
>>>>>
>>>>> Since when are you getting this hang, exactly?
>>>>>
>>>>> It would probably be useful that you show us your scheduler code.
>>>>>
>>>>> Samuel
>>>>
>>>> _______________________________________________
>>>> Starpu-devel mailing list
>>>> Starpu-devel@lists.gforge.inria.fr
>>>> http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/starpu-devel
>>>
>>>
>>> _______________________________________________
>>> Starpu-devel mailing list
>>> Starpu-devel@lists.gforge.inria.fr
>>> http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/starpu-devel
>>
>>
>> _______________________________________________
>> Starpu-devel mailing list
>> Starpu-devel@lists.gforge.inria.fr
>> http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/starpu-devel
>
>
> _______________________________________________
> Starpu-devel mailing list
> Starpu-devel@lists.gforge.inria.fr
> http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/starpu-devel
- Re: [Starpu-devel] Assert fail with 8659, (suite)
- Re: [Starpu-devel] Assert fail with 8659, Xavier Lacoste, 15/02/2013
- Re: [Starpu-devel] Assert fail with 8659, Andra Hugo, 15/02/2013
- Re: [Starpu-devel] Assert fail with 8659, Xavier Lacoste, 15/02/2013
- Re: [Starpu-devel] Assert fail with 8659, Andra Hugo, 15/02/2013
- Re: [Starpu-devel] Assert fail with 8659, Xavier Lacoste, 15/02/2013
- Re: [Starpu-devel] hang with 8659, Samuel Thibault, 15/02/2013
- Re: [Starpu-devel] hang with 8659, Xavier Lacoste, 15/02/2013
- Re: [Starpu-devel] hang with 8659, Xavier Lacoste, 15/02/2013
- Re: [Starpu-devel] hang with 8659, Xavier Lacoste, 19/02/2013
- Re: [Starpu-devel] hang with 8659, Xavier Lacoste, 19/02/2013
- Re: [Starpu-devel] hang with 8659, Xavier Lacoste, 19/02/2013
- Re: [Starpu-devel] hang with 8659, Xavier Lacoste, 19/02/2013
- Re: [Starpu-devel] hang with 8659, Samuel Thibault, 19/02/2013
- Re: [Starpu-devel] hang with 8659, Xavier Lacoste, 20/02/2013
- Re: [Starpu-devel] hang with 8659, Samuel Thibault, 20/02/2013
- Re: [Starpu-devel] hang with 8659, Xavier Lacoste, 20/02/2013
- Re: [Starpu-devel] hang with 8659, Samuel Thibault, 22/02/2013
- Re: [Starpu-devel] hang with 8659, Samuel Thibault, 22/02/2013
- Re: [Starpu-devel] hang with 8659, Xavier Lacoste, 23/02/2013
- Re: [Starpu-devel] hang with 8659, Samuel Thibault, 19/02/2013
- Re: [Starpu-devel] hang with 8659, Xavier Lacoste, 20/02/2013
Archives gérées par MHonArc 2.6.19+.