Accéder au contenu.
Menu Sympa

starpu-devel - Re: [Starpu-devel] Problem with performance model calibration

Objet : Developers list for StarPU

Archives de la liste

Re: [Starpu-devel] Problem with performance model calibration


Chronologique Discussions 
  • From: Mirko Myllykoski <mirkom@cs.umu.se>
  • To: Samuel Thibault <samuel.thibault@inria.fr>, starpu-devel@lists.gforge.inria.fr
  • Subject: Re: [Starpu-devel] Problem with performance model calibration
  • Date: Wed, 20 May 2020 17:13:50 +0200
  • Authentication-results: mail2-smtp-roc.national.inria.fr; spf=None smtp.pra=mirkom@cs.umu.se; spf=Pass smtp.mailfrom=mirkom@cs.umu.se; spf=None smtp.helo=postmaster@mail.cs.umu.se
  • Ironport-phdr: 9a23:FF+OCh0nh5NzZb8xsmDT+DRfVm0co7zxezQtwd8ZsesWLPTxwZ3uMQTl6Ol3ixeRBMOHsq8C0rKH+PC6EUU7or+5+EgYd5JNUxJXwe43pCcHRPC/NEvgMfTxZDY7FskRHHVs/nW8LFQHUJ2mPw6arXK99yMdFQviPgRpOOv1BpTSj8Oq3Oyu5pHfeQpFiCe9bL9oMRm6sQHcusYVjId8N6081gbHrnxUdupM2GhmP0iTnxHy5sex+J5s7SFdsO8/+sBDTKv3Yb02QaRXAzo6PW814tbrtQTYQguU+nQcSGQWnQFWDAXD8Rr3Q43+sir+tup6xSmaIcj7Rq06VDi+86tmTgLjhTwZPDAl7m7Yls1wjLpaoB2/oRx/35XUa5yROPZnY6/RYc8WSW9HU8lWSiJBH5i8b5MRAOUdIeZWoY79p14Uohu/AwmnGefjxzBMi3Pz26AxzuYvHhzc3AE4AtwArnrUotXyNKkRX+66wqbHwjffYP1Zwjr99IvFfwo9rf2QU799cczcwlQvGQPfiVWQrJToMTSU1uQXsGib6PdrW+Wvim4jrwFwojuvxsA3ionKh4Ie11fJ9SB4wIYvJt24T0t7bMW4H5tLrS2aKo52Qt44T2Fzpik307sLsoO0cyYW0poo3QLfa+CZfIiS5BLuTOKcLSp6iXxldryymRm8/Eq9xuDhV8S4zktHozRLnNTSqn0AyRzd58eFR/Zz/0qsxTmC2QDO5+xZL044iarWJoInz7UtmJQTtkHDETX3mEXwlKKWeUIk+u+n6+TjfLrqvIOTN4hxig3mL6Qun9G/Df4jMgcQWWWQ5Oey1KX78EHkTrhGkuc6nrfXvZ3eP8gWqbK1DxVL3osj8xqzESuq3MgFkXQCMl5JZA6Lg5bsNlzOOv/3E/WyjlGjnTh12fzLP7jsD5bRIXfek7rseKty5FRYxQcwyN1T+olbB7APLf/2VE/8t93VAQE/Pgep2ejoEs992ZkbWW+XAq+WLqfSsViQ6+IqOeaMZYsVtCzhJPg+/fLukHo5mUIHcamyw5QbcHG4HvJ4LEWFeXfgn9kMHXsQsgYgQuHnjEeOXDFPa3qoQa4x5ik3CIe8AofCQoCtjqaB3CC+HpBOY2BGC0qMEXTvd4WBQPoDdDmSLtV8kjwBSLitUZQh2g+2uA/g17VnNvbU+jEftZ/7z9h1+/fclQsq9TxpFsiSzn+CT39qkWMMXDI22KF/oVdhyleYy6R4jPJYFdtJ5/NGTAg2L5/cz+pgC9DzQA3NZNmJSEz1CumhVAo4S888x5cyY0d3EsimkliXxCOvHrIR0aCLBZY976bA93n3Pcd0jXjcgu1plEUvWNNScGGrmKN73wzSHJLS1UqXkLyleOId2jTM/SGN1zmgpkZdBStxS6bCFVUOYk/HrtPi5UKKG7qvE7cgNyNK0oifL7YMcdC/3gYOf+vqJNmLOzH5oGy3Hxvdg+7VNda4SyAmxCzYTXM8vUUL53/fbFo1HWG8pn+YFzE8TQuyMXOpyvF3rTaAdmFxzwyOaBQ/hb+8+xpTjufaVvYOmKkJ6n94+mdEWW2l1teTMOKu4g9ofaFSe9Q4uQ4V3nmfqglgeIetfflv
  • List-archive: <http://lists.gforge.inria.fr/pipermail/starpu-devel/>
  • List-id: "Developers list. For discussion of new features, code changes, etc." <starpu-devel.lists.gforge.inria.fr>
  • Organization: Umeå University / HPC2N

Hi,

Thank you for your reply!

On 2020-05-20 12:44, Samuel Thibault wrote:

The ***main problem*** is that the performance is very unpredictable.

Did you have a look at the output of starpu_perfmodel_plot? I am getting
something like the attached.

For CPUs, it shows that the linear regression indeed has troubles
catching anything for small sizes, but a non-linear regression should
be able to catch it fine, it would be useful to see what it gets.

I pulled the latest version of StarPU from the GIT repository, changed the GEMM tasks to use to non-linear models (STARPU_NL_REGRESSION_BASED) and re-calibrated with STARPU_CALIBRATE=1 STARPU_SCHED=random. The calibration files can be found from [1]. The starpu_perfmodel_plot output can be found from [3]. More data but quite noisy. Weirdly, it appears that the code still default to the linear models...

I did not see any improvement in performance during my actual test runs.

The variability of the measurements is quite high, though, like 10x
difference for the same size :/ So it looks like there is some parameter
other than just M and N that you would want to pass to the MRM.

The GEMM codelets use the flop count as a "size base".

It appears that MRMs perform even worse than the linear models.

That's surprising, it should be at least as good. I see in your source
code that in MRM you included only M^2*N and M*N. I guess your flop
count formula uses only these factors? Does MRM not manage to get even
close to the constants for these factors?

The flop count is 2 * M^2*N (M X M times M X N matrix multiplication). The M*N factor is related to copy operation. You can find MRM calibration files from [2]. The main problem with MRMs is that I get some speedup with linear models but MRM models seem to make the code slower when GPUs are used. Also, the models [2] look wrong (negative coefficients).

The MRM model for the CPU-only starneig_push_bulges tasks seems to work quite well:

_________________________________________________________________________
vvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvv

$ df <- read.csv("starneig_push_bulges_pm.out")
$ model1 <- lm(data=df, Duration ~ I(N^2*S) + I(N*N))
$ summary(model1)

Call:
lm(formula = Duration ~ I(N^2 * S) + I(N * N), data = df)

Residuals:
Min 1Q Median 3Q Max
-480430 -107 283 404 1748349

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -4.817e+02 2.979e+01 -16.17 <2e-16 ***
I(N^2 * S) 7.252e-04 1.159e-06 625.57 <2e-16 ***
I(N * N) 5.265e-02 6.787e-04 77.57 <2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 11820 on 172325 degrees of freedom
Multiple R-squared: 0.9813, Adjusted R-squared: 0.9813
F-statistic: 4.516e+06 on 2 and 172325 DF, p-value: < 2.2e-16

^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

However, the MRM model for the starneig_left_gemm_update tasks seems to fail:

_________________________________________________________________________
vvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvv

$ df <- read.csv("~/codelets/tmp/starneig_left_gemm_update_pm.out")
$ model1 <- lm(data=df, Duration ~ I(M^2*N) + I(M*N))
$ summary(model1)

Call:
lm(formula = Duration ~ I(M^2 * N) + I(M * N), data = df)

Residuals:
Min 1Q Median 3Q Max
-740061 -965 -719 173 476977

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 1.109e+03 1.675e+01 66.19 <2e-16 ***
I(M^2 * N) 5.369e-05 8.652e-08 620.49 <2e-16 ***
I(M * N) -2.907e-02 1.692e-04 -171.79 <2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 15180 on 921547 degrees of freedom
Multiple R-squared: 0.6984, Adjusted R-squared: 0.6984
F-statistic: 1.067e+06 on 2 and 921547 DF, p-value: < 2.2e-16

^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

I looks like both CPU and GPU measurement are mixed in the *.out files... Is this true? I am just looking at the plot(df) output [4].

For GPUs, the story is very different. The variability is as high (10x
again) but doesn't seem to depend very much on the data size. That's
surprising and it could be useful to investigate it.

Yes, this is quite weird. The same pattern can be seen in the new calibration data [1]. I have also seen the variability in FxT traces. I don't see this behavior on my desktop machine but the V100s in our cluster seem to suffer from this problem. The situation was even worse with K80s. I could not get any reliable performance out of those GPUs.


even if I first calibrate the models by setting STARPU_CALIBRATE=1

Which scheduler are you using? When using a dmda scheduler, it will
use the performance model to optimize execution as soon as the model
provides information. For linear regression models, that can happen
quite soon that the model has some estimation, and then the scheduler
will take decisions accordingly. That could explain why on the attached
graph only big sizes were run on the GPU, because the perfmodel already
says it's better to do so. A way to avoid this is to just use an eager
or random scheduler to let tasks just go without caring about the
perfmodel.

If GPU(s) are detected, the code defaults to dmdas, otherwise, the code deaults to prio. As I said above, I rerun the calibration with STARPU_CALIBRATE=1 STARPU_SCHED=random. See [1].

- Mirko

[1]: https://drive.google.com/open?id=1_BwOxTGo8xr9LyI4i9EUIPerjttiZMyj
[2]: https://drive.google.com/open?id=1wDCjFEteBGNQF2-MPgCjRpTaSdFrSAC1
[3]: https://drive.google.com/open?id=1FJqWMfgoQ5jt4DlPwFbNePSP-9CHGdlk
[4]: https://drive.google.com/open?id=1KV7uwE_E9TcRjlkBunRuP0Q-AXafMc7E

--
Mirko Myllykoski, Ph.D.
Senior research engineer, Department of Computing Science
Application expert, HPC2N
Umeå University

By sending an email to Umeå University, the University processes your personal data. For more information, please read: https://www.umu.se/en/gdpr




Archives gérées par MHonArc 2.6.19+.

Haut de le page