Accéder au contenu.
Menu Sympa

starpu-devel - [Starpu-devel] Performance with SOCL on multiple devices

Objet : Developers list for StarPU

Archives de la liste

[Starpu-devel] Performance with SOCL on multiple devices


Chronologique Discussions 
  • From: Malcolm Roberts <malcolm.i.w.roberts@gmail.com>
  • To: starpu-devel@lists.gforge.inria.fr
  • Cc: "helluy@math.unistra.fr" <helluy@math.unistra.fr>, Bruno Weber <bruno.weber@axessim.fr>
  • Subject: [Starpu-devel] Performance with SOCL on multiple devices
  • Date: Wed, 6 Jan 2016 11:22:46 +0100
  • Authentication-results: mail2-smtp-roc.national.inria.fr; spf=None smtp.pra=malcolm.i.w.roberts@gmail.com; spf=Pass smtp.mailfrom=malcolm.i.w.roberts@gmail.com; spf=None smtp.helo=postmaster@mail-wm0-f51.google.com
  • Ironport-phdr: 9a23:6+zeSBIkF8LXFx0Z69mcpTZWNBhigK39O0sv0rFitYgUIv3xwZ3uMQTl6Ol3ixeRBMOAu6wC07KempujcFJDyK7JiGoFfp1IWk1NouQttCtkPvS4D1bmJuXhdS0wEZcKflZk+3amLRodQ56mNBXsq3G/pQQfBg/4fVIsYL+lRMiK14ye7KObxd76W01wnj2zYLd/fl2djD76kY0ou7ZkMbs70RDTo3FFKKx8zGJsIk+PzV6nvp/jtLYqySlbuuog+shcSu26Ov1gFf0LRAghZnsp7dfzqFzPQBWC4lMYU34KiVxHDQ/f4xy8X5HrsyK8uPAu9jOdOJjTRLQ/XjCnp51sQhLylDxPYzsw9GrQjsk2jeRE5gimrRFl35X8b4ScNf44daTYK4BJDVFdV9pcAnQSSri3aJECWq9YZb5V
  • List-archive: <http://lists.gforge.inria.fr/pipermail/starpu-devel/>
  • List-id: "Developers list. For discussion of new features, code changes, etc." <starpu-devel.lists.gforge.inria.fr>

Hello.

Thanks to your help, I managed to compile and install StarPU on the machines that I plan on using. I get some strange segfaults on some machines with certain tests in my project (which is schnaps, http://schnaps.gforge.inria.fr/ ). Importantly, however, it works fine on atlas4 in our local cluster, which has 4 k80 GPUs. However, the code is twice as slow when using SOCL than it is when using OpenCL.

The code is a discontinuous Galerkin solver using a second or fourth order Runge Kutta time-stepper. The field is divided into macrocells (we can choose the number of macrocells as we want), and each macrocell has its own OpenCL buffer. My understanding is that SOCL should distribute the buffers and associated tasks between OpenCL devices and take care of memory allocation and transfers. However, it seems to just run on one GPU, and takes twice as long as OpenCL while doing so. I am using the default settings, however, so perhaps there's something to be done here. Is there some way to improve this performance?

Best,

~Malcolm

--
http://malcolmiwroberts.com





Archives gérées par MHonArc 2.6.19+.

Haut de le page