Objet : Developers list for StarPU
Archives de la liste
- From: Berenger Bramas <berenger.bramas@inria.fr>
- To: Samuel Thibault <samuel.thibault@inria.fr>
- Cc: starpu-devel@lists.gforge.inria.fr
- Subject: Re: [Starpu-devel] Worker Binding Problem
- Date: Wed, 9 Sep 2015 14:32:55 +0200 (CEST)
- List-archive: <http://lists.gforge.inria.fr/pipermail/starpu-devel/>
- List-id: "Developers list. For discussion of new features, code changes, etc." <starpu-devel.lists.gforge.inria.fr>
Ok, it looks like it is not because of StarPU,
In the test file, I do something like:
==========================
openmp_test();
starpu_test();
==========================
If I comment the OpenMP test, the starpu test is perfect...
More precisely, for any openmp threads (excluding the master) that is added
and bind, it makes the starpu thread created later slower.
If I do (5 OMP BIND 3 STARPU):
======================================================
OMP_DYNAMIC=false OMP_NUM_THREADS=5 OMP_PROC_BIND=TRUE STARPU_NCPUS=3 numactl
-l ./testStarPUOpenMPv2.exe
Test with OpenMP:
[1]Done = 0.721296s
[2]Done = 0.721693s
[4]Done = 0.721859s
[3]Done = 0.721989s
[0]Done = 0.72314s
Starpu:
[0]Done = 0.722181s
[1]Done = 1.44202s
[2]Done = 1.45352s
======================================================
Now the same but if I remove the binding for openmp (5 OMP NO-BIND 3 STARPU):
======================================================
OMP_DYNAMIC=false OMP_NUM_THREADS=5 OMP_PROC_BIND=FALSE STARPU_NCPUS=3
numactl -l ./testStarPUOpenMPv2.exe
OpenMP:
[2]Done = 0.721822s
[3]Done = 0.722332s
[4]Done = 0.72412s
[0]Done = 0.729278s
[1]Done = 0.72942s
Starpu:
[2]Done = 0.721866s
[1]Done = 0.722248s
[0]Done = 0.729274s
======================================================
What if I used only one OpenMP threads (just the master) and bind it (1 OMP
BIND 3 STARPU):
======================================================
OMP_DYNAMIC=false OMP_NUM_THREADS=1 OMP_PROC_BIND=FALSE STARPU_NCPUS=3
numactl -l ./testStarPUOpenMPv2.exe
Test with OpenMP:
[0]Done = 0.726867s
Test with StarPU:
[2]Done = 0.721073s
[1]Done = 0.722398s
[0]Done = 0.725718s
======================================================
I set OMP_DYNAMIC=false to ensure the OpenMP thread to not spin while waiting
(even if I am out of the OpenMP parallel section when I execute StarPU test
so I suppose the GCC implementation put the thread to sleep anyway)
So I clearly do not understand why there is this behavior.
But in my real application I have something similar, an OpenMP precomputation
stage and a StarPU execution
(and I suppose I have the same problem there even if I need to check in
details).
Bérenger Bramas
HiePACS Project
Tel (05 24 57) 40 76
INRIA BORDEAUX Sud Ouest
----- Mail original -----
| De: "Berenger Bramas" <berenger.bramas@inria.fr>
| À: "Samuel Thibault" <samuel.thibault@inria.fr>
| Cc: starpu-devel@lists.gforge.inria.fr
| Envoyé: Mercredi 9 Septembre 2015 13:39:27
| Objet: Re: [Starpu-devel] Worker Binding Problem
|
| I checked, and it looks like OpenMP is also using the default system values.
| only the openmp thread 0 has a small difference.
|
| - Here is the output for 3 threads:
| ===========================================
| Test with OpenMP:
| [0] stackaddr 0x7fff6a08d000
| [0] stacksize 8380416
| [0] guard_size 0
| [1] stackaddr 0x7faa06236000
| [1] stacksize 8392704
| [1] guard_size 4096
| [2] stackaddr 0x7faa05a35000
| [2] stacksize 8392704
| [2] guard_size 4096
| [2]Done = 0.81137s
| [0]Done = 0.812685s
| [1]Done = 0.827188s
|
| Test with StarPU:
| [0] stackaddr 0x7faa05234000
| [0] stacksize 8392704
| [0] guard_size 4096
| [2] stackaddr 0x7faa04232000
| [2] stacksize 8392704
| [2] guard_size 4096
| [1] stackaddr 0x7faa04a33000
| [1] stacksize 8392704
| [1] guard_size 4096
| [0]Done = 0.810964s
| [2]Done = 0.827862s
| [1]Done = 1.61425s
| ===========================================
|
|
| If I use alloca(16384); and then alloca to store my counter, I get:
| ===========================================
| Test with OpenMP:
| [1] stackaddr 0x7f9cf64f0000
| [1] stacksize 8392704
| [1] guard_size 4096
| [0] stackaddr 0x7ffe53726000
| [0] stacksize 8384512
| [0] guard_size 0
| [2] stackaddr 0x7f9cf5cef000
| [2] stacksize 8392704
| [2] guard_size 4096
| [1]Done = 0.721168s
| [0]Done = 0.721216s
| [2]Done = 0.722121s
|
| Test with StarPU:
| [0] stackaddr 0x7f9cf54ee000
| [0] stacksize 8392704
| [0] guard_size 4096
| [starpu][starpu_task_wait_for_all] Waiting for tasks submitted to context 0
| [2] stackaddr 0x7f9cf44ec000
| [2] stacksize 8392704
| [2] guard_size 4096
| [1] stackaddr 0x7f9cf4ced000
| [1] stacksize 8392704
| [1] guard_size 4096
| [0]Done = 0.743813s
| [2]Done = 0.744685s
| [1]Done = 1.52117s
| ===========================================
|
| This improve the performance of all the threads but there is still one
thread
| that is slow with StarPU.
|
| If I changed the stack size (ulimit -s 699999) ~ 700MB and use alloca
| ===========================================
| Test with OpenMP:
| [2] stackaddr 0x7f3b441f8000
| [2] stacksize 716800000
| [2] guard_size 4096
| [1] stackaddr 0x7f3b6ed90000
| [1] stacksize 716800000
| [1] guard_size 4096
| [0] stackaddr 0x7ffec0149000
| [0] stacksize 716787712
| [0] guard_size 0
| [2]Done = 0.721143s
| [1]Done = 0.72139s
| [0]Done = 0.754094s
|
| Test with StarPU:
| [0] stackaddr 0x7f3b09467000
| [0] stacksize 716800000
| [0] guard_size 4096
| [2] stackaddr 0x7f3aad468000
| [2] stacksize 716800000
| [2] guard_size 4096
| [1] stackaddr 0x7f3ade8cf000
| [1] stacksize 716800000
| [1] guard_size 4096
| [2]Done = 0.750809s
| [0]Done = 0.75761s
| [1]Done = 1.51807s
| ===========================================
|
| I also tried to allocate the counter in the heap and ask for a local
| allocation (numactl -l) but the results are similar.
|
| So I still cannot figure it out, I was also thinking that it might come from
| the binding or the stack but both look OK.
|
|
| Bérenger Bramas
|
| HiePACS Project
|
| Tel (05 24 57) 40 76
| INRIA BORDEAUX Sud Ouest
|
|
| ----- Mail original -----
| | De: "Samuel Thibault" <samuel.thibault@inria.fr>
| | À: "Berenger Bramas" <berenger.bramas@inria.fr>
| | Cc: starpu-devel@lists.gforge.inria.fr
| | Envoyé: Mercredi 9 Septembre 2015 12:00:59
| | Objet: Re: [Starpu-devel] Worker Binding Problem
| |
| | Berenger Bramas, le Wed 09 Sep 2015 11:42:01 +0200, a écrit :
| | > With two threads it is OK:
| | > Test with OpenMP:
| | > [0]Done = 0.826635s
| | > [1]Done = 0.828059s
| | > Starpu:
| | > [0]Done = 0.813087s
| | > [1]Done = 0.826654s
| | >
| | > With three is start to be strange:
| | > Test with OpenMP:
| | > [0]Done = 0.825707s
| | > [2]Done = 0.826624s
| | > [1]Done = 0.827749s
| | > Starpu:
| | > [2]Done = 0.826262s
| | > [0]Done = 0.826653s
| | > [1]Done = 1.64255s
| |
| | That's really odd indeed, there's no ground reason why threads
| | started by OpenMP and threads started by StarPU should behave
| | differently. I suspect it could be related with the stack allocation,
| | which StarPU currently just leaves to the OS. Could you check with
| | pthread_attr_getstack what the addresses and sizes look like in both
| | OpenMP and StarPU on your target machine? You could also try to call
| | alloca(16384) before calling alloca again to allocate the variable you
| | are working on, to make sure it gets allocated really locally (but there
| | may also be cache association conflicts, that's why addresses should
| | be checked too, in case the default pthread stack size happens to just
| | always bring conflicts while perhaps OpenMP uses smaller stacks by
| | default, in which case making the size in alloca(16384) vary according
| | to worker may avoid the conflict).
| |
| | Samuel
| |
| _______________________________________________
| Starpu-devel mailing list
| Starpu-devel@lists.gforge.inria.fr
| http://lists.gforge.inria.fr/mailman/listinfo/starpu-devel
|
- [Starpu-devel] Worker Binding Problem, Berenger Bramas, 09/09/2015
- Re: [Starpu-devel] Worker Binding Problem, Samuel Thibault, 09/09/2015
- Re: [Starpu-devel] Worker Binding Problem, Berenger Bramas, 09/09/2015
- Re: [Starpu-devel] Worker Binding Problem, Samuel Thibault, 09/09/2015
- Re: [Starpu-devel] Worker Binding Problem, Samuel Thibault, 09/09/2015
- Re: [Starpu-devel] Worker Binding Problem, Samuel Thibault, 09/09/2015
- Re: [Starpu-devel] Worker Binding Problem, Berenger Bramas, 09/09/2015
- Re: [Starpu-devel] Worker Binding Problem, Samuel Thibault, 09/09/2015
- Re: [Starpu-devel] Worker Binding Problem, Berenger Bramas, 09/09/2015
- Re: [Starpu-devel] Worker Binding Problem, Samuel Thibault, 09/09/2015
- Re: [Starpu-devel] Worker Binding Problem, Berenger Bramas, 09/09/2015
- Re: [Starpu-devel] Worker Binding Problem, Andra Hugo, 09/09/2015
- Re: [Starpu-devel] Worker Binding Problem, Samuel Thibault, 09/09/2015
- Re: [Starpu-devel] Worker Binding Problem, Andra Hugo, 09/09/2015
- Re: [Starpu-devel] Worker Binding Problem, Berenger Bramas, 09/09/2015
- Re: [Starpu-devel] Worker Binding Problem, Samuel Thibault, 09/09/2015
- Re: [Starpu-devel] Worker Binding Problem, Berenger Bramas, 09/09/2015
- Re: [Starpu-devel] Worker Binding Problem, Samuel Thibault, 09/09/2015
Archives gérées par MHonArc 2.6.19+.