Accéder au contenu.
Menu Sympa

starpu-devel - Re: [Starpu-devel] Worker Binding Problem

Objet : Developers list for StarPU

Archives de la liste

Re: [Starpu-devel] Worker Binding Problem


Chronologique Discussions 
  • From: Berenger Bramas <berenger.bramas@inria.fr>
  • To: Samuel Thibault <samuel.thibault@inria.fr>
  • Cc: starpu-devel@lists.gforge.inria.fr
  • Subject: Re: [Starpu-devel] Worker Binding Problem
  • Date: Wed, 9 Sep 2015 13:39:27 +0200 (CEST)
  • List-archive: <http://lists.gforge.inria.fr/pipermail/starpu-devel/>
  • List-id: "Developers list. For discussion of new features, code changes, etc." <starpu-devel.lists.gforge.inria.fr>

I checked, and it looks like OpenMP is also using the default system values.
only the openmp thread 0 has a small difference.

- Here is the output for 3 threads:
===========================================
Test with OpenMP:
[0] stackaddr 0x7fff6a08d000
[0] stacksize 8380416
[0] guard_size 0
[1] stackaddr 0x7faa06236000
[1] stacksize 8392704
[1] guard_size 4096
[2] stackaddr 0x7faa05a35000
[2] stacksize 8392704
[2] guard_size 4096
[2]Done = 0.81137s
[0]Done = 0.812685s
[1]Done = 0.827188s

Test with StarPU:
[0] stackaddr 0x7faa05234000
[0] stacksize 8392704
[0] guard_size 4096
[2] stackaddr 0x7faa04232000
[2] stacksize 8392704
[2] guard_size 4096
[1] stackaddr 0x7faa04a33000
[1] stacksize 8392704
[1] guard_size 4096
[0]Done = 0.810964s
[2]Done = 0.827862s
[1]Done = 1.61425s
===========================================


If I use alloca(16384); and then alloca to store my counter, I get:
===========================================
Test with OpenMP:
[1] stackaddr 0x7f9cf64f0000
[1] stacksize 8392704
[1] guard_size 4096
[0] stackaddr 0x7ffe53726000
[0] stacksize 8384512
[0] guard_size 0
[2] stackaddr 0x7f9cf5cef000
[2] stacksize 8392704
[2] guard_size 4096
[1]Done = 0.721168s
[0]Done = 0.721216s
[2]Done = 0.722121s

Test with StarPU:
[0] stackaddr 0x7f9cf54ee000
[0] stacksize 8392704
[0] guard_size 4096
[starpu][starpu_task_wait_for_all] Waiting for tasks submitted to context 0
[2] stackaddr 0x7f9cf44ec000
[2] stacksize 8392704
[2] guard_size 4096
[1] stackaddr 0x7f9cf4ced000
[1] stacksize 8392704
[1] guard_size 4096
[0]Done = 0.743813s
[2]Done = 0.744685s
[1]Done = 1.52117s
===========================================

This improve the performance of all the threads but there is still one thread
that is slow with StarPU.

If I changed the stack size (ulimit -s 699999) ~ 700MB and use alloca
===========================================
Test with OpenMP:
[2] stackaddr 0x7f3b441f8000
[2] stacksize 716800000
[2] guard_size 4096
[1] stackaddr 0x7f3b6ed90000
[1] stacksize 716800000
[1] guard_size 4096
[0] stackaddr 0x7ffec0149000
[0] stacksize 716787712
[0] guard_size 0
[2]Done = 0.721143s
[1]Done = 0.72139s
[0]Done = 0.754094s

Test with StarPU:
[0] stackaddr 0x7f3b09467000
[0] stacksize 716800000
[0] guard_size 4096
[2] stackaddr 0x7f3aad468000
[2] stacksize 716800000
[2] guard_size 4096
[1] stackaddr 0x7f3ade8cf000
[1] stacksize 716800000
[1] guard_size 4096
[2]Done = 0.750809s
[0]Done = 0.75761s
[1]Done = 1.51807s
===========================================

I also tried to allocate the counter in the heap and ask for a local
allocation (numactl -l) but the results are similar.

So I still cannot figure it out, I was also thinking that it might come from
the binding or the stack but both look OK.


Bérenger Bramas

HiePACS Project

Tel (05 24 57) 40 76
INRIA BORDEAUX Sud Ouest


----- Mail original -----
| De: "Samuel Thibault" <samuel.thibault@inria.fr>
| À: "Berenger Bramas" <berenger.bramas@inria.fr>
| Cc: starpu-devel@lists.gforge.inria.fr
| Envoyé: Mercredi 9 Septembre 2015 12:00:59
| Objet: Re: [Starpu-devel] Worker Binding Problem
|
| Berenger Bramas, le Wed 09 Sep 2015 11:42:01 +0200, a écrit :
| > With two threads it is OK:
| > Test with OpenMP:
| > [0]Done = 0.826635s
| > [1]Done = 0.828059s
| > Starpu:
| > [0]Done = 0.813087s
| > [1]Done = 0.826654s
| >
| > With three is start to be strange:
| > Test with OpenMP:
| > [0]Done = 0.825707s
| > [2]Done = 0.826624s
| > [1]Done = 0.827749s
| > Starpu:
| > [2]Done = 0.826262s
| > [0]Done = 0.826653s
| > [1]Done = 1.64255s
|
| That's really odd indeed, there's no ground reason why threads
| started by OpenMP and threads started by StarPU should behave
| differently. I suspect it could be related with the stack allocation,
| which StarPU currently just leaves to the OS. Could you check with
| pthread_attr_getstack what the addresses and sizes look like in both
| OpenMP and StarPU on your target machine? You could also try to call
| alloca(16384) before calling alloca again to allocate the variable you
| are working on, to make sure it gets allocated really locally (but there
| may also be cache association conflicts, that's why addresses should
| be checked too, in case the default pthread stack size happens to just
| always bring conflicts while perhaps OpenMP uses smaller stacks by
| default, in which case making the size in alloca(16384) vary according
| to worker may avoid the conflict).
|
| Samuel
|




Archives gérées par MHonArc 2.6.19+.

Haut de le page