Accéder au contenu.
Menu Sympa

starpu-devel - [Starpu-devel] OpenMP timing with StarPU pheft

Objet : Developers list for StarPU

Archives de la liste

[Starpu-devel] OpenMP timing with StarPU pheft


Chronologique Discussions 
  • From: Usman Dastgeer <usman.dastgeer@liu.se>
  • To: "starpu-devel@lists.gforge.inria.fr" <starpu-devel@lists.gforge.inria.fr>
  • Subject: [Starpu-devel] OpenMP timing with StarPU pheft
  • Date: Thu, 1 Mar 2012 18:30:26 +0100
  • Accept-language: en-US
  • Acceptlanguage: en-US
  • List-archive: <http://lists.gforge.inria.fr/pipermail/starpu-devel>
  • List-id: "Developers list. For discussion of new features, code changes, etc." <starpu-devel.lists.gforge.inria.fr>

Hello,

I am trying StarPU OpenMP support with pheft on vector_scale example given. I have some problem with scaling behavior as it shows really bad scaling from 1 to 8 OpenMP threads on our server (fermi).

The modified vector_scale.c is attached with email. The execution is done 100 times in a "for" loop and after execution I get:

starpu_perfmodel_display -s vector_scale_parallel.fermi
note: loading history from vector_scale_parallel instead of vector_scale_parallel.fermi
performance model for cpu_impl_0
# hash size mean dev n
6530e077 8192000         2.689401e+05   3.754193e+01   30
performance model for cpu_2_impl_0
# hash size mean dev n
6530e077 8192000         2.703561e+05   1.484666e+03   10
performance model for cpu_3_impl_0
# hash size mean dev n
6530e077 8192000         2.705295e+05   2.117665e+03   10
performance model for cpu_4_impl_0
# hash size mean dev n
6530e077 8192000         2.708724e+05   3.354376e+03   10
performance model for cpu_5_impl_0
# hash size mean dev n
6530e077 8192000         2.712075e+05   4.227176e+03   10
performance model for cpu_6_impl_0
# hash size mean dev n
6530e077 8192000         2.726255e+05   4.926117e+03   10
performance model for cpu_7_impl_0
# hash size mean dev n
6530e077 8192000         2.743637e+05   4.627217e+03   10
performance model for cpu_8_impl_0
# hash size mean dev n
6530e077 8192000         2.727368e+05   1.467062e+03   10

As you can see from above that using 1 cpu is better than 8 cpus; atleast that's how it shows but it shouldn't be. To investigate I have added some custom timers in "void scal_cpu_func(void *buffers[], void *_args)", using
...
clock_gettime(CLOCK_PROCESS_CPUTIME_ID, &time1); 
clock_gettime(CLOCK_THREAD_CPUTIME_ID, &rtime1);  
... // see attached vector_scale.c for full source code

The execution time I got for both thread and process time atleast shows this scaling, almost linear for both thread and process time. Please see the execution trace also attached with this email (exec_output_fermi.out). Here I summarize timings from custom timers:
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
| Count | AvgExecTime (..._PROCESS_...)*         |     AvgExecTime (..._THREAD_...)* 
1 OpenMP threads |   30  | 2.00655       |        0.26845
8 OpenMP threads      |          10        |                       0.272621               |        0.0341865 
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
* Timings are in seconds.
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

Please note that both trace timings and perf_model timings are from the same execution so we can compare it.


In short, my custom timers show almost linear scaling but the timings calculated by StarPU does not show any scaling. Where is the problem?


--
Usman.



PS: Please note that scaling same application (same problem size) scales on another platform I tried so can you tell me where is the problem?

Attachment: exec_output_fermi.out
Description: exec_output_fermi.out

Attachment: vector_scal.c
Description: vector_scal.c




Archives gérées par MHonArc 2.6.19+.

Haut de le page