Accéder au contenu.
Menu Sympa

starpu-devel - Re: [Starpu-devel] Strange behaviour using GPUs

Objet : Developers list for StarPU

Archives de la liste

Re: [Starpu-devel] Strange behaviour using GPUs


Chronologique Discussions 
  • From: Xavier Lacoste <xavier.lacoste@inria.fr>
  • To: Nathalie Furmento <nathalie.furmento@labri.fr>
  • Cc: Mathieu Faverge <Mathieu.Faverge@inria.fr>, starpu-devel@lists.gforge.inria.fr, Pierre Ramet <ramet@labri.fr>
  • Subject: Re: [Starpu-devel] Strange behaviour using GPUs
  • Date: Mon, 15 Jul 2013 08:29:00 +0200
  • List-archive: <http://lists.gforge.inria.fr/pipermail/starpu-devel>
  • List-id: "Developers list. For discussion of new features, code changes, etc." <starpu-devel.lists.gforge.inria.fr>

Hello,

I regenerated the DGEMM models with StarPU r10552 and I still have the same
strange behaviour with CUDA Kernel :
http://img5.imageshack.us/img5/5117/j9o.png,
this explain why when I let StarPU be dynamic it can't find a good
scheduling.
But when I force the scheduling I'm still far from what ParSEC can do with
nearly the same GPU scheduling.

I don't think there should be so much variations with the GEMM kernel on
GPU....

Maybe some issue with memory transfers ? As suspected in the traces in
previous mail http://img46.imageshack.us/img46/7026/ojb.png

XL.

Le 10 juil. 2013 à 11:43, Xavier Lacoste a écrit :

> Hello again,
>
> I still have some things I can't explain with my traces (eg.
> http://img46.imageshack.us/img46/7026/ojb.png)
>
> I can see a large period where nothing is done (around 50000) and a data
> movement performed with dates : 63306.3 - 28261 which are killing the
> performances.
>
> I also had a question about fxt, how do I set the buffer size to be sure
> all the event will be registered without having to synchronize on disk, is
> there a macro, a compilation option ?
>
> XL.
>
>
> Le 10 juil. 2013 à 09:43, Xavier Lacoste a écrit :
>
>> Hello,
>>
>> Indeed, the behaviour is really better with r10552, I still loose some
>> time when I use 9CPUs + 3 GPUs instead of 10+2 but not too much (84.2 s vs
>> 62.4 s whereas I had 750 s before) ...
>>
>> Thanks,
>>
>> XL.
>>
>>
>> Le 9 juil. 2013 à 15:54, Xavier Lacoste a écrit :
>>
>>> Thanks Nathalie,
>>>
>>> I'll try this when I'll happen to be scheduled on a GPU node... (plafrim
>>> scheduler is quite strange today...)
>>>
>>> XL.
>>>
>>> Le 9 juil. 2013 à 14:38, Nathalie Furmento a écrit :
>>>
>>>> Xavier,
>>>>
>>>> There was indeed a bug in 1.1.0rc1 which made dmda queues LIFOs . This
>>>> has been fixed in the branch. We actually found out by seeing MAGMA
>>>> performances dropping down.
>>>>
>>>> Could you please try and let us know if that also fixes your problem? I
>>>> am planning to release 1.1.0rc2 tomorrow, but you could also try by
>>>> checking out the svn repository.
>>>>
>>>> Thanks,
>>>>
>>>> Nathalie
>>>>
>>>> in On 09/07/2013 14:19, Xavier Lacoste wrote:
>>>>> Hello,
>>>>>
>>>>> I tried to compare ParSEC and StarPU using full machines (mirage) on
>>>>> PaStiX.
>>>>> We achieve to have good scaling with ParSEC by statically scheduling
>>>>> task on GPUs.
>>>>> So, I tried to reproduce the same scheduling with StarPU 1.1.0rc1 and
>>>>> obtained strange behavious.
>>>>> My GPUs take all there time FetchingInput and PushingOutputs.
>>>>>
>>>>> The output trace can be seen here :
>>>>> http://img401.imageshack.us/img401/5213/1pb.png
>>>>> Or, on plafrim cluster :
>>>>> 9 CPUs + 3 GPUs
>>>>> /lustre/lacoste/rsync/log_mirage_AUDI_dmda+LLT+flop+cmin20+frat8+distFlop+selfCopy+fxt_9_3_4231_20130709_120332_crit2_0.5GPUmem.trace
>>>>> 10 + 2 :
>>>>> /lustre/lacoste/rsync/log_mirage_AUDI_dmda+LLT+flop+cmin20+frat8+distFlop+selfCopy+fxt_10_2_4231_20130709_120332_crit2_0.5GPUmem.trace
>>>>> 11 + 1 :
>>>>> /lustre/lacoste/rsync/log_mirage_AUDI_dmda+LLT+flop+cmin20+frat8+distFlop+selfCopy+fxt_11_1_4231_20130709_120332_crit2_0.5GPUmem.trace
>>>>> 12 + 0 :
>>>>> /lustre/lacoste/rsync/log_mirage_AUDI_dmda+LLT+flop+cmin20+frat8+distFlop+selfCopy+fxt_12_0_4231_20130709_120332_crit2_0.5GPUmem.trace
>>>>>
>>>>> The results are good with 12 CPUs + 0, adding 1 or 2 GPUs bring no
>>>>> loss, but no gain either, adding three slows down the run time by a
>>>>> factor 10.
>>>>>
>>>>> Do you have any idea of what can explain this behaviour ?
>>>>>
>>>>> With a dynamic scheduling, GPUs are usefull with a small number of CPUs
>>>>> but not with the whole machine.
>>>>> (My cost model may not be accurate enough for the moment...
>>>>> http://img515.imageshack.us/img515/2348/njz.png)
>>>>>
>>>>> Thanks,
>>>>>
>>>>> XL.
>>>>> _______________________________________________
>>>>> Starpu-devel mailing list
>>>>> Starpu-devel@lists.gforge.inria.fr
>>>>> http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/starpu-devel
>>>>
>>>
>>>
>>> _______________________________________________
>>> Starpu-devel mailing list
>>> Starpu-devel@lists.gforge.inria.fr
>>> http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/starpu-devel
>>
>>
>> _______________________________________________
>> Starpu-devel mailing list
>> Starpu-devel@lists.gforge.inria.fr
>> http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/starpu-devel
>
>
> _______________________________________________
> Starpu-devel mailing list
> Starpu-devel@lists.gforge.inria.fr
> http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/starpu-devel






Archives gérées par MHonArc 2.6.19+.

Haut de le page