Accéder au contenu.
Menu Sympa

starpu-devel - Re: [Starpu-devel] Problem with performance model calibration

Objet : Developers list for StarPU

Archives de la liste

Re: [Starpu-devel] Problem with performance model calibration


Chronologique Discussions 
  • From: Samuel Thibault <samuel.thibault@inria.fr>
  • To: Mirko Myllykoski <mirkom@cs.umu.se>
  • Cc: starpu-devel@lists.gforge.inria.fr
  • Subject: Re: [Starpu-devel] Problem with performance model calibration
  • Date: Wed, 20 May 2020 12:44:09 +0200
  • List-archive: <http://lists.gforge.inria.fr/pipermail/starpu-devel/>
  • List-id: "Developers list. For discussion of new features, code changes, etc." <starpu-devel.lists.gforge.inria.fr>
  • Organization: I am not organized

Re,

Mirko Myllykoski, le lun. 18 mai 2020 16:42:29 +0200, a ecrit:
> The ***main problem*** is that the performance is very unpredictable.

Did you have a look at the output of starpu_perfmodel_plot? I am getting
something like the attached.

For CPUs, it shows that the linear regression indeed has troubles
catching anything for small sizes, but a non-linear regression should
be able to catch it fine, it would be useful to see what it gets. The
variability of the measurements is quite high, though, like 10x
difference for the same size :/ So it looks like there is some parameter
other than just M and N that you would want to pass to the MRM.

> The GEMM codelets use the flop count as a "size base".

> It appears that MRMs perform even worse than the linear models.

That's surprising, it should be at least as good. I see in your source
code that in MRM you included only M^2*N and M*N. I guess your flop
count formula uses only these factors? Does MRM not manage to get even
close to the constants for these factors?


For GPUs, the story is very different. The variability is as high (10x
again) but doesn't seem to depend very much on the data size. That's
surprising and it could be useful to investigate it.

> even if I first calibrate the models by setting STARPU_CALIBRATE=1

Which scheduler are you using? When using a dmda scheduler, it will
use the performance model to optimize execution as soon as the model
provides information. For linear regression models, that can happen
quite soon that the model has some estimation, and then the scheduler
will take decisions accordingly. That could explain why on the attached
graph only big sizes were run on the GPU, because the perfmodel already
says it's better to do so. A way to avoid this is to just use an eager
or random scheduler to let tasks just go without caring about the
perfmodel.

Samuel

Attachment: starpu_starneig_left_gemm_update_pm.kebnekaise-v100.eps
Description: PostScript document




Archives gérées par MHonArc 2.6.19+.

Haut de le page