Objet : Developers list for StarPU
Archives de la liste
- From: ASUDE ASUDE <ajitsdeshpande@gmail.com>
- To: Nathalie Furmento <nathalie.furmento@labri.fr>, starpu-devel@lists.gforge.inria.fr
- Subject: Re: [Starpu-devel] Query about profiling data
- Date: Thu, 1 Dec 2011 22:15:11 +0000
- List-archive: <http://lists.gforge.inria.fr/pipermail/starpu-devel>
- List-id: "Developers list. For discussion of new features, code changes, etc." <starpu-devel.lists.gforge.inria.fr>
Hello Nathalie,
Thanks for the reply.
But the numbers below still don't add up, not able to interpret them correctly. How are these execution times to be interpreted?
Executing on CPU 0
Avg. delay : 41.16 us
Avg. length : 2613.87 us
Worker CUDA 0 (GeForce GTX 470):
total time : 2.95 ms
exec time : 0.00 ms (0.00 %)
blocked time : 0.00 ms (0.00 %)
Worker CPU 0:
total time : 2.96 ms
exec time : 5.44 ms (183.80 %)
blocked time : 0.00 ms (0.00 %)
Executing on CPU 0
Avg. delay : 37.06 us
Avg. length : 2736.63 us
Worker CUDA 0 (GeForce GTX 470):
total time : 2.86 ms
exec time : 0.00 ms (0.00 %)
blocked time : 0.00 ms (0.00 %)
Worker CPU 0:
total time : 2.87 ms
exec time : 5.52 ms (192.56 %)
blocked time : 0.00 ms (0.00 %)
Executing =CUDA 0 (GeForce GTX 470)
Avg. delay : 647.63 us
Avg. length : 378.75 us
Worker CUDA 0 (GeForce GTX 470):
total time : 1.15 ms
exec time : 0.84 ms (72.70 %)
blocked time : 0.00 ms (0.00 %)
Worker CPU 0:
total time : 1.16 ms
exec time : 0.00 ms (0.00 %)
blocked time : 0.00 ms (0.00 %)
On Thu, Dec 1, 2011 at 6:46 PM, Nathalie Furmento <nathalie.furmento@labri.fr> wrote:
Hello,Yes.
On 01/12/2011 19:16, ASUDE ASUDE wrote:Hello,
I have a question about interpreting the profiling data for tasks and workers which I get from starpu.
My setup shows
StarPU has found :
1 CPU cores
CPU 0
1 CUDA devices
CUDA 0 (GeForce GTX 470)
on starpu_machine_display.
To explain a little about my application - When I execute it w/o profiling code added, I get below messages printed on my console:
Executing on CPU 0
Executing =CUDA 0 (GeForce GTX 470)
Executing =CUDA 0 (GeForce GTX 470)
I have for loop which calls the starpu function which creates/submits the task for 3 times in loop, so above 3 messages.
Sometimes I get below:
Executing =CUDA 0 (GeForce GTX 470)
Executing on CPU 0
Executing on CPU 0
Does that mean I means starpu assigns my 3 tasks for execution to CPU 0 device, CUDA 0 device ,CUDA 0 device in first execution and
CUDA 0 device,CPU 0 device ,CPU 0 device in second execution?
You see all the workers here, the one who is executing your task, and who has an non-zero exec time and the other one.
Then I added the profiling code as I understood from the profiling example at StarPU/examples/profiling/profiling.c
Now I get below times printed on console
Executing on CPU 0
Avg. delay : 41.16 us
Avg. length : 2613.87 us
Worker CUDA 0 (GeForce GTX 470):
total time : 2.95 ms
exec time : 0.00 ms (0.00 %)
blocked time : 0.00 ms (0.00 %)
Worker CPU 0:
total time : 2.96 ms
exec time : 5.44 ms (183.80 %)
blocked time : 0.00 ms (0.00 %)
Executing on CPU 0
Avg. delay : 37.06 us
Avg. length : 2736.63 us
Worker CUDA 0 (GeForce GTX 470):
total time : 2.86 ms
exec time : 0.00 ms (0.00 %)
blocked time : 0.00 ms (0.00 %)
Worker CPU 0:
total time : 2.87 ms
exec time : 5.52 ms (192.56 %)
blocked time : 0.00 ms (0.00 %)
Executing =CUDA 0 (GeForce GTX 470)
Avg. delay : 647.63 us
Avg. length : 378.75 us
Worker CUDA 0 (GeForce GTX 470):
total time : 1.15 ms
exec time : 0.84 ms (72.70 %)
blocked time : 0.00 ms (0.00 %)
Worker CPU 0:
total time : 1.16 ms
exec time : 0.00 ms (0.00 %)
blocked time : 0.00 ms (0.00 %)
My question is for above case, If it says first task is Executing on CPU 0, why do I get non-zero execution time for "Worker CUDA 0"
and other way round i.e. when it says Executing =Cuda 0 , why do I see a Worker CPU 0?
As explained in the glossary,
Also another question is then in this profiling case, execution times dont add up or how to make sense of those times displayed above?
Slightly confused about the terminology - Task, Worker ?
http://runtime.bordeaux.inria.fr/StarPU/starpu.html#Glossary
A task represents a scheduled execution of a codelet on some data handles.
A worker execute tasks. There is typically one per CPU computation core and one per accelerator (for which a whole CPU core is dedicated).
The numbers represent the bandwidths between the different workers.A different/unrelated question I have is, while reading the handbook, I see this kind of info -
How do I get this kind of online prformacne data for my starpu app:
StarPU has found :
3 CUDA devices
CUDA 0 (Tesla C2050 02:00.0)
CUDA 1 (Tesla C2050 03:00.0)
CUDA 2 (Tesla C2050 84:00.0)
from to RAM to CUDA 0 to CUDA 1 to CUDA 2
RAM 0.000000 5176.530428 5176.492994 5191.710722
CUDA 0 4523.732446 0.000000 2414.074751 2417.379201
CUDA 1 4523.718152 2414.078822 0.000000 2417.375119
CUDA 2 4534.229519 2417.069025 2417.060863 0.000000
What are these numbers RAM to CUDA 0 is it data transfer time in milliseconds or something?
How do I obtain this kind of Bus related info from RAM to CUDA 0 etc... What code changes?
Regards,
Nathalie
- [Starpu-devel] Query about profiling data, ASUDE ASUDE, 01/12/2011
- Re: [Starpu-devel] Query about profiling data, Nathalie Furmento, 01/12/2011
- Re: [Starpu-devel] Query about profiling data, ASUDE ASUDE, 01/12/2011
- Re: [Starpu-devel] Query about profiling data, Nathalie Furmento, 07/12/2011
- Re: [Starpu-devel] Query about profiling data, ASUDE ASUDE, 01/12/2011
- Re: [Starpu-devel] Query about profiling data, Nathalie Furmento, 01/12/2011
Archives gérées par MHonArc 2.6.19+.