Objet : Developers list for StarPU
Archives de la liste
- From: Usman Dastgeer <usman.dastgeer@liu.se>
- To: "starpu-devel@lists.gforge.inria.fr" <starpu-devel@lists.gforge.inria.fr>
- Subject: [Starpu-devel] OpenMP timing with StarPU pheft
- Date: Thu, 1 Mar 2012 18:30:26 +0100
- Accept-language: en-US
- Acceptlanguage: en-US
- List-archive: <http://lists.gforge.inria.fr/pipermail/starpu-devel>
- List-id: "Developers list. For discussion of new features, code changes, etc." <starpu-devel.lists.gforge.inria.fr>
Hello,
I am trying StarPU OpenMP support with pheft on vector_scale example given. I have some problem with scaling behavior as it shows really bad scaling from 1 to 8 OpenMP threads on our server (fermi).
The modified vector_scale.c is attached with email. The execution is done 100 times in a "for" loop and after execution I get:
starpu_perfmodel_display -s vector_scale_parallel.fermi
note: loading history from vector_scale_parallel instead of vector_scale_parallel.fermi
performance model for cpu_impl_0
# hash size
mean dev
n
6530e077 8192000
2.689401e+05 3.754193e+01
30
performance model for cpu_2_impl_0
# hash size
mean dev
n
6530e077 8192000
2.703561e+05 1.484666e+03
10
performance model for cpu_3_impl_0
# hash size
mean dev
n
6530e077 8192000
2.705295e+05 2.117665e+03
10
performance model for cpu_4_impl_0
# hash size
mean dev
n
6530e077 8192000
2.708724e+05 3.354376e+03
10
performance model for cpu_5_impl_0
# hash size
mean dev
n
6530e077 8192000
2.712075e+05 4.227176e+03
10
performance model for cpu_6_impl_0
# hash size
mean dev
n
6530e077 8192000
2.726255e+05 4.926117e+03
10
performance model for cpu_7_impl_0
# hash size
mean dev
n
6530e077 8192000
2.743637e+05 4.627217e+03
10
performance model for cpu_8_impl_0
# hash size
mean dev
n
6530e077 8192000
2.727368e+05 1.467062e+03
10
As you can see from above that using 1 cpu is better than 8 cpus; atleast that's how it shows but it shouldn't be. To investigate I have added some custom timers in "void scal_cpu_func(void *buffers[], void *_args)", using
...
clock_gettime(CLOCK_PROCESS_CPUTIME_ID, &time1);
clock_gettime(CLOCK_THREAD_CPUTIME_ID, &rtime1);
... // see attached vector_scale.c for full source code
The execution time I got for both thread and process time atleast shows this scaling, almost linear for both thread and process time. Please see the execution trace also attached with this email (exec_output_fermi.out). Here I summarize timings from custom
timers:
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
|
Count |
AvgExecTime (..._PROCESS_...)* | AvgExecTime (..._THREAD_...)*
1 OpenMP threads
| 30
| 2.00655 |
0.26845
8 OpenMP threads | 10 | 0.272621
| 0.0341865
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
* Timings are in seconds.
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Please note that both trace timings and perf_model timings are from the same execution so we can compare it.
In short, my custom timers show almost linear scaling but the timings calculated by StarPU does not show any scaling. Where is the problem?
--
Usman.
PS: Please note that scaling same application (same problem size) scales on another platform I tried so can you tell me where is the problem?
Attachment:
exec_output_fermi.out
Description: exec_output_fermi.out
Attachment:
vector_scal.c
Description: vector_scal.c
- [Starpu-devel] OpenMP timing with StarPU pheft, Usman Dastgeer, 01/03/2012
- Re: [Starpu-devel] OpenMP timing with StarPU pheft, Usman Dastgeer, 01/03/2012
- <Suite(s) possible(s)>
- Re: [Starpu-devel] OpenMP timing with StarPU pheft, Samuel Thibault, 02/03/2012
- Re: [Starpu-devel] OpenMP timing with StarPU pheft, Usman Dastgeer, 02/03/2012
Archives gérées par MHonArc 2.6.19+.