Accéder au contenu.
Menu Sympa

starpu-devel - Re: [Starpu-devel] Strange behaviour using GPUs

Objet : Developers list for StarPU

Archives de la liste

Re: [Starpu-devel] Strange behaviour using GPUs


Chronologique Discussions 
  • From: Nathalie Furmento <nathalie.furmento@labri.fr>
  • To: Xavier Lacoste <xavier.lacoste@inria.fr>
  • Cc: Mathieu Faverge <Mathieu.Faverge@inria.fr>, starpu-devel@lists.gforge.inria.fr, Pierre Ramet <ramet@labri.fr>
  • Subject: Re: [Starpu-devel] Strange behaviour using GPUs
  • Date: Tue, 09 Jul 2013 14:38:42 +0200
  • Authentication-results: iona.labri.fr (amavisd-new); dkim=pass (1024-bit key) reason="pass (just generated, assumed good)" header.d=labri.fr
  • List-archive: <http://lists.gforge.inria.fr/pipermail/starpu-devel>
  • List-id: "Developers list. For discussion of new features, code changes, etc." <starpu-devel.lists.gforge.inria.fr>

Xavier,

There was indeed a bug in 1.1.0rc1 which made dmda queues LIFOs . This has been fixed in the branch. We actually found out by seeing MAGMA performances dropping down.

Could you please try and let us know if that also fixes your problem? I am planning to release 1.1.0rc2 tomorrow, but you could also try by checking out the svn repository.

Thanks,

Nathalie

in On 09/07/2013 14:19, Xavier Lacoste wrote:
Hello,

I tried to compare ParSEC and StarPU using full machines (mirage) on PaStiX.
We achieve to have good scaling with ParSEC by statically scheduling task on
GPUs.
So, I tried to reproduce the same scheduling with StarPU 1.1.0rc1 and
obtained strange behavious.
My GPUs take all there time FetchingInput and PushingOutputs.

The output trace can be seen here :
http://img401.imageshack.us/img401/5213/1pb.png
Or, on plafrim cluster :
9 CPUs + 3 GPUs
/lustre/lacoste/rsync/log_mirage_AUDI_dmda+LLT+flop+cmin20+frat8+distFlop+selfCopy+fxt_9_3_4231_20130709_120332_crit2_0.5GPUmem.trace
10 + 2 :
/lustre/lacoste/rsync/log_mirage_AUDI_dmda+LLT+flop+cmin20+frat8+distFlop+selfCopy+fxt_10_2_4231_20130709_120332_crit2_0.5GPUmem.trace
11 + 1 :
/lustre/lacoste/rsync/log_mirage_AUDI_dmda+LLT+flop+cmin20+frat8+distFlop+selfCopy+fxt_11_1_4231_20130709_120332_crit2_0.5GPUmem.trace
12 + 0 :
/lustre/lacoste/rsync/log_mirage_AUDI_dmda+LLT+flop+cmin20+frat8+distFlop+selfCopy+fxt_12_0_4231_20130709_120332_crit2_0.5GPUmem.trace

The results are good with 12 CPUs + 0, adding 1 or 2 GPUs bring no loss, but
no gain either, adding three slows down the run time by a factor 10.

Do you have any idea of what can explain this behaviour ?

With a dynamic scheduling, GPUs are usefull with a small number of CPUs but
not with the whole machine.
(My cost model may not be accurate enough for the moment...
http://img515.imageshack.us/img515/2348/njz.png)

Thanks,

XL.
_______________________________________________
Starpu-devel mailing list
Starpu-devel@lists.gforge.inria.fr
http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/starpu-devel






Archives gérées par MHonArc 2.6.19+.

Haut de le page