starpu-devel - Re: [Starpu-devel] Performance with SOCL on multiple devices

Objet : Developers list for StarPU

Archives de la liste

Re: [Starpu-devel] Performance with SOCL on multiple devices

From: Malcolm Roberts <malcolm.i.w.roberts@gmail.com>
To: Samuel Thibault <samuel.thibault@inria.fr>, starpu-devel@lists.gforge.inria.fr, "helluy@math.unistra.fr" <helluy@math.unistra.fr>, Bruno Weber <bruno.weber@axessim.fr>
Subject: Re: [Starpu-devel] Performance with SOCL on multiple devices
Date: Fri, 8 Jan 2016 16:26:13 +0100
Authentication-results: mail3-smtp-sop.national.inria.fr; spf=None smtp.pra=malcolm.i.w.roberts@gmail.com; spf=Pass smtp.mailfrom=malcolm.i.w.roberts@gmail.com; spf=None smtp.helo=postmaster@mail-wm0-f53.google.com
Ironport-phdr: 9a23:hsv4ERZwXFehQlRE8hRNhyL/LSx+4OfEezUN459isYplN5qZo82+bnLW6fgltlLVR4KTs6sC0LqI9fi4EUU7or+/81k6OKRWUBEEjchE1ycBO+WiTXPBEfjxciYhF95DXlI2t1uyMExSBdqsLwaK+i760zceF13FOBZvIaytQ8iJ35rxj7j60qaQSjsLrQL1Wal1IhSyoFeZnegtqqwmFJwMzADUqGBDYeVcyDAgD1uSmxHh+pX4p8Y7oGx48sgs/M9YUKj8Y79wDfkBVGxnYCgJ45jQvBzeQA/H2nsdWGwLlgYAVxPM6Qz3WtHtsirwv/d5xAGbO9f3RPY6Q2Lmp7x3QQXwlWILOiA09EnTi9dsl+RUrhW7qBE5wojOYYjTOuA6NundcNceWHtpW89NV2pcBIKnc5EGSeQbd64Mr4T5o0YfhR63GQnqGeXu0SNSjzn4x/t+m847DAfBlDMtFd8U+CDetsjzM+ENXOq41oHB3TjYdPJTnznnvtvmaBck9NSFWbl3dc+Z804lGhnZlR3ErIXhNDSR06IM9Xfd9e1qWPizkEYorghwpn6kwcJ62dqBvZ4c1l2RrXYx+40yP9DtDRcjOdM=
List-archive: <http://lists.gforge.inria.fr/pipermail/starpu-devel/>
List-id: "Developers list. For discussion of new features, code changes, etc." <starpu-devel.lists.gforge.inria.fr>

Hello, Samuel.

Thanks for taking a look at the code! How were you able to detect that multiple GPUs were being used?

I agree that the macrocell interface code needs to be optimised and run in parallel. I'm currently working on that, as well as reducing the amount of memory that is transferred.

However, I still find it strange that SOCL is unable to parallelize the simulation. In particular, using SOCL (with either one or several GPUs) is twice as slow as compared to using just one GPU with OpenCL directly.

The test case that I'm considering is

./schnaps -G ../geo/sphereincube.msh -n 3

(NB: one must construct the .msh from the .geo using gmsh before running this). This is a geometry which gives about 3000 different buffers arranged in a 3D geometry. The domain could be divided into different domains which would minimize the number of transfers between GPUs. We can do this in pre-processing using gmsh, for example. With SOCL or StarPU, StarPU would need to divide the data itself, which might either take a large number of iterations or some pre-processing which would be very specific to our problem.

For example, with a 1D geometry, using one GPU (A), we could imagine having the macrocells all in a line as in

AAAAAAAA

Using two GPUs (A and B) the optimal parallelization would be

AAAABBBB

which would mean that just one cell needs to be transffered in each direction. The worst-case scenario would be

ABABABAB

which would require a total transfer of all the data. Similar cases exist for 2D and 3D grids.

Do you think that StarPU (either directly or via SOCL) will be able to get close to AAAABBBB?

Thanks again and have a good weekend,

~Malcolm

On 06/01/2016 17:16, Samuel Thibault wrote:

Samuel Thibault, on Wed 06 Jan 2016 17:03:42 +0100, wrote:

With these changes, I do see kernels running on various GPUs. It does
not seem faster, but that is probably due to data transfers.

And also serialized kernels

You will probably want to use FxT to dump traces and read them with
Vite.

It notably seems that the DGMacroCellInterface kernels are completely
serialized, thus awfully bad performance due to no parallelism and
data transfers :) DGFlux does get parallelized, on the other hand. The
duration is however quite tiny (60µs), so it's really not efficient
compared with the runtime overhead.

Samuel

--
http://malcolmiwroberts.com

[Starpu-devel] Performance with SOCL on multiple devices, Malcolm Roberts, 06/01/2016
- Re: [Starpu-devel] Performance with SOCL on multiple devices, Samuel Pitoiset, 06/01/2016
- Re: [Starpu-devel] Performance with SOCL on multiple devices, Samuel Thibault, 06/01/2016
  - Re: [Starpu-devel] Performance with SOCL on multiple devices, Malcolm Roberts, 06/01/2016
    - Re: [Starpu-devel] Performance with SOCL on multiple devices, Samuel Thibault, 06/01/2016
      - Re: [Starpu-devel] Performance with SOCL on multiple devices, Samuel Thibault, 06/01/2016
        
        Re: [Starpu-devel] Performance with SOCL on multiple devices, Malcolm Roberts, 08/01/2016
        
        Re: [Starpu-devel] Performance with SOCL on multiple devices*, Samuel Thibault, 08/01/2016
        
        Re: [Starpu-devel] Performance with SOCL on multiple devices, Malcolm Roberts, 13/01/2016

Archives gérées par MHonArc 2.6.19+.

Archives de la liste

Re: [Starpu-devel] Performance with SOCL on multiple devices