Accéder au contenu.
Menu Sympa

starpu-devel - Re: [Starpu-devel] Performance with SOCL on multiple devices

Objet : Developers list for StarPU

Archives de la liste

Re: [Starpu-devel] Performance with SOCL on multiple devices


Chronologique Discussions 
  • From: Samuel Pitoiset <samuel.pitoiset@inria.fr>
  • To: Malcolm Roberts <malcolm.i.w.roberts@gmail.com>, starpu-devel@lists.gforge.inria.fr
  • Cc: "helluy@math.unistra.fr" <helluy@math.unistra.fr>, Bruno Weber <bruno.weber@axessim.fr>
  • Subject: Re: [Starpu-devel] Performance with SOCL on multiple devices
  • Date: Wed, 6 Jan 2016 11:44:41 +0100
  • List-archive: <http://lists.gforge.inria.fr/pipermail/starpu-devel/>
  • List-id: "Developers list. For discussion of new features, code changes, etc." <starpu-devel.lists.gforge.inria.fr>

Hi,

SOCL is known as an experimental feature and it doesn't seem to be maintained and/or used. Anyway, it should not reduce your performance as much.

Did you try to run your Galerkin solver with only one GPU?

It would be good to know if your issue is related to using SOCL with multiple devices or if it's a major bottleneck even with a single compute device.

Thanks.

On 01/06/2016 11:22 AM, Malcolm Roberts wrote:
Hello.

Thanks to your help, I managed to compile and install StarPU on the
machines that I plan on using. I get some strange segfaults on some
machines with certain tests in my project (which is schnaps,
http://schnaps.gforge.inria.fr/ ). Importantly, however, it works fine
on atlas4 in our local cluster, which has 4 k80 GPUs. However, the code
is twice as slow when using SOCL than it is when using OpenCL.

The code is a discontinuous Galerkin solver using a second or fourth
order Runge Kutta time-stepper. The field is divided into macrocells
(we can choose the number of macrocells as we want), and each macrocell
has its own OpenCL buffer. My understanding is that SOCL should
distribute the buffers and associated tasks between OpenCL devices and
take care of memory allocation and transfers. However, it seems to just
run on one GPU, and takes twice as long as OpenCL while doing so. I am
using the default settings, however, so perhaps there's something to be
done here. Is there some way to improve this performance?

Best,

~Malcolm





Archives gérées par MHonArc 2.6.19+.

Haut de le page