starpu-devel - Re: [Starpu-devel] StarPU's limitations

Objet : Developers list for StarPU

Archives de la liste

Re: [Starpu-devel] StarPU's limitations

From: Samuel Thibault <samuel.thibault@ens-lyon.org>
To: Chris Hennick <christopherhe@trentu.ca>
Cc: starpu-devel@lists.gforge.inria.fr
Subject: Re: [Starpu-devel] StarPU's limitations
Date: Tue, 21 Feb 2012 17:22:51 +0100
List-archive: <http://lists.gforge.inria.fr/pipermail/starpu-devel>
List-id: "Developers list. For discussion of new features, code changes, etc." <starpu-devel.lists.gforge.inria.fr>

Chris Hennick, le Tue 21 Feb 2012 08:02:17 -0500, a écrit :
> On 21 February 2012 05:59, Samuel Thibault <samuel.thibault@ens-lyon.org>
> wrote:
>
> - We don't need to add new drivers that often.
>
> But will that remain the case as HPC branches into quantum computing,
> neuromorphic computing and whatever else may be needed at yottascale?

I wouldn't call these "often" :)

> You mean machine failover? Well, yes, and I believe we'd need to
> rethink quite a bit the structure of StarPU if it had to cope with tasks
> which failed.
>
> Isn't recovering from machine failure a fairly basic requirement at
> petascale
> and above? That was my understanding.

Yes it is.

StarPU is not ready for that yet, and I just meant that it would be a
big step to do.

> > • It can't optimize the choice of scheduling policy or performance
> models
> > based on the trade-off between accuracy and overhead.
>
> Do you mean e.g. single precision vs double precision? A problem I see
> is that this involves the application quite a bit, making the interface
> more complex. But that could be interesting to have a look at for e.g.
> cg/gmres etc.
>
> No, I mean e.g. pheft vs pgreedy, or STARPU_HISTORY_BASED vs
> STARPU_NL_REGRESSION_BASED. For example, if we can eliminate 5 GFLOPS of
> idle
> time by using a more accurate performance model and more sophisticated
> scheduling algorithm, but the algorithm changes require an extra 10 GFLOPS
> of
> overhead, then the change isn't worthwhile.

Ok. For now we believed that the cost of regression vs history is
completely neglectible, and thus you'd just take what works best
(some function have predictable, but not regression-able times, some
applications don't behave well at all with heft, etc).

When that is not true (we could want to run a very smart scheduler
for some time to find very good schedules), it would indeed be more
questionable. I however believe that StarPU can already allow it: you
can write a scheduler which decides whether to run lengthy computations,
or just revert to basic rules. Helpers for combining schedulers could
however be added to StarPU to make it easier.

> > nor for optional best-effort tasks that aren't always
> > worth their power cost.
>
> I've been wondering about possibly cancelling tasks for some time. The
> difficult part here is how to trade "worth" and "power cost". How to
> quantify "worth"? :)
>
> Most corporate and government users probably have accountants who can figure
> that out.

Ok, so it seems it would be a matter of adding a term in the cost
function, -delta*execute, which reduces the cost when execute is 1, and
just not execute the task if that seems better.

> - have a scheduling "window": make starpu_submit block when 'n' tasks
> have already been submitted, to avoid filling memory with myriads of
> tasks when the computation is huge (expected to last months).
>
> Could that be solved in practice by simplifying the representation of
> recurring
> tasks, and/or by moving the long-term queue to a database (which would also
> help with failure recovery)?

Representing recurring tasks in a general enough way that covers all
schemes is quite complex. In the case of StarPU, the idea can be that
callbacks of tasks submit the next tasks, a continuation style actually.

That's however not the easiest way to express tasks, it's way simpler to
just run the for loop and let it block as needed.

Distributed queues of tasks is however an idea that can help, yes. We
consider extending StarPU in that direction at least to make it
scheduler tasks over MPI (instead of inside MPI nodes only as is done
now).

Samuel

[Starpu-devel] StarPU's limitations, Chris Hennick, 21/02/2012
- <Suite(s) possible(s)>
- Re: [Starpu-devel] StarPU's limitations, Samuel Thibault, 21/02/2012
  - Re: [Starpu-devel] StarPU's limitations, Chris Hennick, 21/02/2012
    - Re: [Starpu-devel] StarPU's limitations, Samuel Thibault, 21/02/2012

Archives gérées par MHonArc 2.6.19+.

Archives de la liste

Re: [Starpu-devel] StarPU's limitations