Accéder au contenu.
Menu Sympa

starpu-devel - Re: [Starpu-devel] Performance decreasing by adding empty tasks

Objet : Developers list for StarPU

Archives de la liste

Re: [Starpu-devel] Performance decreasing by adding empty tasks


Chronologique Discussions 
  • From: Samuel Thibault <samuel.thibault@ens-lyon.org>
  • To: Xavier Lacoste <xavier.lacoste@inria.fr>
  • Cc: starpu-devel@lists.gforge.inria.fr
  • Subject: Re: [Starpu-devel] Performance decreasing by adding empty tasks
  • Date: Thu, 23 Feb 2012 18:31:50 +0100
  • List-archive: <http://lists.gforge.inria.fr/pipermail/starpu-devel>
  • List-id: "Developers list. For discussion of new features, code changes, etc." <starpu-devel.lists.gforge.inria.fr>

Xavier Lacoste, le Tue 21 Feb 2012 10:25:42 +0100, a écrit :
> I can understand that most of the time is spent with data management,
> but that's all i can say...

Most of these are actually idle handlers, so it's "normal" that they
take most of the time. The processors are just not enough fed with tasks
to process.

> I found a workaround, by managing this buffer MySelf, copying it on all
> CUDA devices at the begining of the run and getting the good one in the
> CUDA kernel (I give it an array d_blocktab of ndevices cuda ptr, and choose
> the right one in the kernel with cudaGetDevice).
>
> With this workaround I loose no time adding my global READ-Only buffer.

Ok, so it's really an issue in StarPU. I can see several potential
explanations:

* The read-only buffer gets evicted from GPU memory when StarPU has to
make room for new data. And thus it has to be brought back in. I guess
the data of Pastix don't fit entirely in the GPU memory, so it has to do
eviction.

A way to avoid this is to use the write-through mask: call
starpu_data_set_wt_mask() on the data, and the data will be replicated
on all memory nodes, and kept there and prevented from eviction. I
have just commited the last property (which is the one we need here)
in revision r5790, so it's not available in earlier revisions, but the
change is trivial to backport to earlier revisions.

* Since all tasks access the same data in read mode, there is
competition on the spinlock managing the mode of the data. I however
guess that your tasks are long enough for the competition to be loose
enough.

* All tasks register themselves as readers for
the implicit dependencies on the data. This makes
_starpu_release_data_enforce_sequential_consistency have to browse a
very big list and while everybody waits for it. Please try to call
starpu_data_set_sequential_consistency_flag(handle, 0) on that data, to
disable implicit dependency computation.

Samuel





Archives gérées par MHonArc 2.6.19+.

Haut de le page