starpu-devel - Re: [Starpu-devel] StarPU for distributed computing

Objet : Developers list for StarPU

Archives de la liste

Re: [Starpu-devel] StarPU for distributed computing

From: Anders Sjögren <anders.sjogren@chalmers.se>
To: Samuel Thibault <samuel.thibault@ens-lyon.org>
Cc: "starpu-devel@lists.gforge.inria.fr" <starpu-devel@lists.gforge.inria.fr>
Subject: Re: [Starpu-devel] StarPU for distributed computing
Date: Mon, 17 Sep 2012 08:28:56 +0000
Accept-language: en-GB, en-US
List-archive: <http://lists.gforge.inria.fr/pipermail/starpu-devel>
List-id: "Developers list. For discussion of new features, code changes, etc." <starpu-devel.lists.gforge.inria.fr>

Hi Samuel.

Thanks for the reply!

On Sep 14, 2012, at 4:25 PM, Samuel Thibault <samuel.thibault@ens-lyon.org>
wrote:

> Hello,
>
> Anders Sjögren, le Fri 14 Sep 2012 11:05:18 +0200, a écrit :
>> Is there any ongoing work on taking StarPU to the next level, forming a
>> runtime
>> scheduler of distributed heterogenous resources?
>
> Yes. What we currently already have is interaction between StarPU and
> MPI communications. It is still up to the application to distribute the
> work over the machines, and StarPU will only handle the scheduling
> within the machines. This is documented on
>
> http://runtime.bordeaux.inria.fr/StarPU/starpu.html#StarPU-MPI-support
Thanks for the info. I will look into that.

Two (possibly naïve) ideas below, please bear with me if its nonsense:
1: A unified approach on distributed/heterogenous computing in the core of
StarPU

To me it seems that heterogenous computing is a special case of a general
computing/memory topology, and that it should be possible to extend StarPU to
work on a grid of heterogenous nodes without changing the API too much. It's
"just" that computing and memory transfers need to be split among the CPUs
and GPUs of different nodes, instead of solely between CPUs and GPUs on one
and the same node.

There is ongoing work on handling complex topologies among compute nodes, but
most or all are not including accelerators. See e.g.:
* http://charm.cs.illinois.edu/talks/TopoSPSS09.pptx
* http://web.eecs.utk.edu/~dongarra/CCGSC-2012/talk17-hoefler.pdf (from
Clusters, Clouds, and Data for Scientific Computing (CCDSC), 2012).

A unified approach should add performance (since one runtime has total
knowledge of everything).
It should also decreases programming complexity, since one doesn't need to
know both MPI and StarPU.

Here, a low-latency messaging framework such as ZeroMQ or might be useful? On
the other hand, some GPU devices on different nodes can speak directly via
Infiniband RDMA, so that should also be taken into the picture. Maybe just
using MPI as the messaging framework inside a distributed StarPU framework?

2: Integrating StarPU as an engine of an Algorithmic Skeleton library
Algorithmic Skeletons have been proposed to describe inherently parallel
patterns in a natural way, separating the description of the problem from the
implementation. Here, it seems to me that the library could we a somewhat
thin wrapper around StarPU, and act by creating the appropriate StarPU tasks
and then letting StarPU handle the data and computing control.

Most or all algorithmic skeleton libraries today do not have a good runtime
scheduler. They also typically don't work with both distributed and
heterogenous computing. Furthermore, they typically take either a task
centric or data centric approach (i.e. sending data to predefined nodes or
sending tasks to nodes having that data), where it seems like StarPU would
solve this choice more dynamically. I think this could be a really important
contribution in the field.

>> Could you give some advice on how StarPU is typically used in distributed
>> applications, when a high-level view such as algorithmic skeletons is
>> preferred?
>> It seems one would need to combine StarPU with a library with capabilities
>> for
>> distributed computing (e.g. Muesli from the group in Münster).
>
> For instance, yes. The difficulty usually lies in letting StarPU handle
> the actual execution of the task, so that it can happen on both CPUs and
> GPUs. Synchronization between computation and communication can also be
> hard.
>
> In the example given by the Muesli website:
>
> double square(double x) {
> return x * x;
> }
>
> int main(int argc, char** argv) {
> ...
> msl::DistributedSparseMatrix<double> dsm(n, m, r, c, 0);
> ...
> dsm.mapInPlace(&square);
> }
>
> You would need to find a way to register the actual Muesli DSM buffers
> to StarPU, and the square function would be where you submit the task
> to StarPU and wait for its termination. I don't know the asynchronous
> capabilities of Muesli, but ideally it would permit to make several
> calls at a time, thus submitting several tasks in advance, so that
> StarPU can actually achieve task parallelism.
Thanks for these ideas!

Best regards

Anders

[Starpu-devel] StarPU for distributed computing, Anders Sjögren, 14/09/2012
- Re: [Starpu-devel] StarPU for distributed computing, Samuel Thibault, 14/09/2012
  - Re: [Starpu-devel] StarPU for distributed computing, Anders Sjögren, 17/09/2012
    - Re: [Starpu-devel] StarPU for distributed computing, Samuel Thibault, 18/09/2012
      - Re: [Starpu-devel] StarPU for distributed computing, Anders Sjögren, 18/09/2012

Archives gérées par MHonArc 2.6.19+.

Archives de la liste

Re: [Starpu-devel] StarPU for distributed computing