Objet : Developers list for StarPU
Archives de la liste
- From: Malcolm Roberts <malcolm.i.w.roberts@gmail.com>
- To: Samuel Thibault <samuel.thibault@inria.fr>, starpu-devel@lists.gforge.inria.fr, "helluy@math.unistra.fr" <helluy@math.unistra.fr>, Bruno Weber <bruno.weber@axessim.fr>
- Subject: Re: [Starpu-devel] Performance with SOCL on multiple devices
- Date: Fri, 8 Jan 2016 16:26:13 +0100
- Authentication-results: mail3-smtp-sop.national.inria.fr; spf=None smtp.pra=malcolm.i.w.roberts@gmail.com; spf=Pass smtp.mailfrom=malcolm.i.w.roberts@gmail.com; spf=None smtp.helo=postmaster@mail-wm0-f53.google.com
- Ironport-phdr: 9a23:hsv4ERZwXFehQlRE8hRNhyL/LSx+4OfEezUN459isYplN5qZo82+bnLW6fgltlLVR4KTs6sC0LqI9fi4EUU7or+/81k6OKRWUBEEjchE1ycBO+WiTXPBEfjxciYhF95DXlI2t1uyMExSBdqsLwaK+i760zceF13FOBZvIaytQ8iJ35rxj7j60qaQSjsLrQL1Wal1IhSyoFeZnegtqqwmFJwMzADUqGBDYeVcyDAgD1uSmxHh+pX4p8Y7oGx48sgs/M9YUKj8Y79wDfkBVGxnYCgJ45jQvBzeQA/H2nsdWGwLlgYAVxPM6Qz3WtHtsirwv/d5xAGbO9f3RPY6Q2Lmp7x3QQXwlWILOiA09EnTi9dsl+RUrhW7qBE5wojOYYjTOuA6NundcNceWHtpW89NV2pcBIKnc5EGSeQbd64Mr4T5o0YfhR63GQnqGeXu0SNSjzn4x/t+m847DAfBlDMtFd8U+CDetsjzM+ENXOq41oHB3TjYdPJTnznnvtvmaBck9NSFWbl3dc+Z804lGhnZlR3ErIXhNDSR06IM9Xfd9e1qWPizkEYorghwpn6kwcJ62dqBvZ4c1l2RrXYx+40yP9DtDRcjOdM=
- List-archive: <http://lists.gforge.inria.fr/pipermail/starpu-devel/>
- List-id: "Developers list. For discussion of new features, code changes, etc." <starpu-devel.lists.gforge.inria.fr>
Hello, Samuel.
Thanks for taking a look at the code! How were you able to detect that multiple GPUs were being used?
I agree that the macrocell interface code needs to be optimised and run in parallel. I'm currently working on that, as well as reducing the amount of memory that is transferred.
However, I still find it strange that SOCL is unable to parallelize the simulation. In particular, using SOCL (with either one or several GPUs) is twice as slow as compared to using just one GPU with OpenCL directly.
The test case that I'm considering is
./schnaps -G ../geo/sphereincube.msh -n 3
(NB: one must construct the .msh from the .geo using gmsh before running this). This is a geometry which gives about 3000 different buffers arranged in a 3D geometry. The domain could be divided into different domains which would minimize the number of transfers between GPUs. We can do this in pre-processing using gmsh, for example. With SOCL or StarPU, StarPU would need to divide the data itself, which might either take a large number of iterations or some pre-processing which would be very specific to our problem.
For example, with a 1D geometry, using one GPU (A), we could imagine having the macrocells all in a line as in
AAAAAAAA
Using two GPUs (A and B) the optimal parallelization would be
AAAABBBB
which would mean that just one cell needs to be transffered in each direction. The worst-case scenario would be
ABABABAB
which would require a total transfer of all the data. Similar cases exist for 2D and 3D grids.
Do you think that StarPU (either directly or via SOCL) will be able to get close to AAAABBBB?
Thanks again and have a good weekend,
~Malcolm
On 06/01/2016 17:16, Samuel Thibault wrote:
Samuel Thibault, on Wed 06 Jan 2016 17:03:42 +0100, wrote:
With these changes, I do see kernels running on various GPUs. It doesAnd also serialized kernels
not seem faster, but that is probably due to data transfers.
You will probably want to use FxT to dump traces and read them withIt notably seems that the DGMacroCellInterface kernels are completely
Vite.
serialized, thus awfully bad performance due to no parallelism and
data transfers :) DGFlux does get parallelized, on the other hand. The
duration is however quite tiny (60µs), so it's really not efficient
compared with the runtime overhead.
Samuel
--
http://malcolmiwroberts.com
- [Starpu-devel] Performance with SOCL on multiple devices, Malcolm Roberts, 06/01/2016
- Re: [Starpu-devel] Performance with SOCL on multiple devices, Samuel Pitoiset, 06/01/2016
- Re: [Starpu-devel] Performance with SOCL on multiple devices, Samuel Thibault, 06/01/2016
- Re: [Starpu-devel] Performance with SOCL on multiple devices, Malcolm Roberts, 06/01/2016
- Re: [Starpu-devel] Performance with SOCL on multiple devices, Samuel Thibault, 06/01/2016
- Re: [Starpu-devel] Performance with SOCL on multiple devices, Samuel Thibault, 06/01/2016
- Re: [Starpu-devel] Performance with SOCL on multiple devices, Malcolm Roberts, 08/01/2016
- Re: [Starpu-devel] Performance with SOCL on multiple devices*, Samuel Thibault, 08/01/2016
- Re: [Starpu-devel] Performance with SOCL on multiple devices, Malcolm Roberts, 13/01/2016
- Re: [Starpu-devel] Performance with SOCL on multiple devices, Malcolm Roberts, 08/01/2016
- Re: [Starpu-devel] Performance with SOCL on multiple devices, Samuel Thibault, 06/01/2016
- Re: [Starpu-devel] Performance with SOCL on multiple devices, Samuel Thibault, 06/01/2016
- Re: [Starpu-devel] Performance with SOCL on multiple devices, Malcolm Roberts, 06/01/2016
Archives gérées par MHonArc 2.6.19+.