Objet : Developers list for StarPU
Archives de la liste
- From: Usman Dastgeer <usman.dastgeer@liu.se>
- To: "starpu-devel@lists.gforge.inria.fr" <starpu-devel@lists.gforge.inria.fr>
- Subject: Re: [Starpu-devel] OpenMP timing with StarPU pheft
- Date: Fri, 2 Mar 2012 13:56:41 +0100
- Accept-language: en-US
- Acceptlanguage: en-US
- List-archive: <http://lists.gforge.inria.fr/pipermail/starpu-devel>
- List-id: "Developers list. For discussion of new features, code changes, etc." <starpu-devel.lists.gforge.inria.fr>
Thanks. You are right. The problem was that "hwloc" was not installed on all
those machines where I got poor scaling. Now, it works correctly. Perhaps,
you can clearly specify this dependency in the documentation.
On Mar 2, 2012, at 1:30 PM, Samuel Thibault wrote:
> Usman Dastgeer, le Thu 01 Mar 2012 18:52:20 +0100, a écrit :
>> I am trying StarPU OpenMP support with pheft on vector_scale example
>> given. I
>> have some problem with scaling behavior as it shows really bad scaling
>> from 1
>> to 8 OpenMP threads on our server (fermi).
>
> But AIUI you said in private mails that the exact same example works
> fine on another machine?
>
>> starpu_perfmodel_display -s vector_scale_parallel.fermi
>> note: loading history from vector_scale_parallel instead of
>> vector_scale_parallel.fermi
>> performance model for cpu_impl_0
>> # hash size mean dev n
>> 6530e077 8192000 2.689401e+05 3.754193e+01 30
>> performance model for cpu_2_impl_0
>> # hash size mean dev n
>> 6530e077 8192000 2.703561e+05 1.484666e+03 10
>> performance model for cpu_3_impl_0
>> # hash size mean dev n
>> 6530e077 8192000 2.705295e+05 2.117665e+03 10
>> performance model for cpu_4_impl_0
>> # hash size mean dev n
>> 6530e077 8192000 2.708724e+05 3.354376e+03 10
>> performance model for cpu_5_impl_0
>> # hash size mean dev n
>> 6530e077 8192000 2.712075e+05 4.227176e+03 10
>> performance model for cpu_6_impl_0
>> # hash size mean dev n
>> 6530e077 8192000 2.726255e+05 4.926117e+03 10
>> performance model for cpu_7_impl_0
>> # hash size mean dev n
>> 6530e077 8192000 2.743637e+05 4.627217e+03 10
>> performance model for cpu_8_impl_0
>> # hash size mean dev n
>> 6530e077 8192000 2.727368e+05 1.467062e+03 10
>>
>> As you can see from above that using 1 cpu is better than 8 cpus; atleast
>> that's how it shows but it shouldn't be.
>
> So there must be something fishy. Could you check with top that the
> program is really making use of more than 1 core?
>
>> To investigate I have added some custom timers in "void
>> scal_cpu_func(void *buffers[], void *_args)", using
>> ...
>> clock_gettime(CLOCK_PROCESS_CPUTIME_ID, &time1);
>> clock_gettime(CLOCK_THREAD_CPUTIME_ID, &rtime1);
>> ... // see attached vector_scale.c for full source code
>
> That only measures CPU time, and not wallclock time. I.e.
> CLOCK_THREAD_CPUTIME_ID returns how much CPU time the thread spends. But
> if there is binding issues (which I really believe is the problem here),
> threads are indeed not slowed down, they just have a portion of the CPU
> time of the core shared by all the threads, and overall the time is the
> same.
>
> Could you check in _starpu_bind_thread_on_cpus whether the hwloc binding
> function really succeeds? Actually, do you have hwloc support enabled?
> That'd explain everything: we don't currently have code for rebinding
> FORKJOIN parallel tasks without hwloc. Actually, combined workers
> without hwloc support is generally a bad idea since combined workers
> are then built without any topology information, and thus, depending on
> OS numbering, good or bad.
>
> Samuel
- [Starpu-devel] OpenMP timing with StarPU pheft, Usman Dastgeer, 01/03/2012
- Re: [Starpu-devel] OpenMP timing with StarPU pheft, Usman Dastgeer, 01/03/2012
- <Suite(s) possible(s)>
- Re: [Starpu-devel] OpenMP timing with StarPU pheft, Samuel Thibault, 02/03/2012
- Re: [Starpu-devel] OpenMP timing with StarPU pheft, Usman Dastgeer, 02/03/2012
Archives gérées par MHonArc 2.6.19+.