Accéder au contenu.
Menu Sympa

starpu-devel - Re: [Starpu-devel] Query about profiling data

Objet : Developers list for StarPU

Archives de la liste

Re: [Starpu-devel] Query about profiling data


Chronologique Discussions 
  • From: ASUDE ASUDE <ajitsdeshpande@gmail.com>
  • To: Nathalie Furmento <nathalie.furmento@labri.fr>, starpu-devel@lists.gforge.inria.fr
  • Subject: Re: [Starpu-devel] Query about profiling data
  • Date: Thu, 1 Dec 2011 22:15:11 +0000
  • List-archive: <http://lists.gforge.inria.fr/pipermail/starpu-devel>
  • List-id: "Developers list. For discussion of new features, code changes, etc." <starpu-devel.lists.gforge.inria.fr>

Hello Nathalie,
 Thanks for the reply.
But the numbers below still don't add up, not able to interpret them correctly. How are these execution times to be interpreted?

Executing on CPU 0
Avg. delay : 41.16 us
Avg. length : 2613.87 us
Worker CUDA 0 (GeForce GTX 470):
        total time : 2.95 ms
        exec time  : 0.00 ms (0.00 %)
        blocked time  : 0.00 ms (0.00 %)
Worker CPU 0:
        total time : 2.96 ms
        exec time  : 5.44 ms (183.80 %)
        blocked time  : 0.00 ms (0.00 %)

Executing on CPU 0
Avg. delay : 37.06 us
Avg. length : 2736.63 us
Worker CUDA 0 (GeForce GTX 470):
        total time : 2.86 ms
        exec time  : 0.00 ms (0.00 %)
        blocked time  : 0.00 ms (0.00 %)
Worker CPU 0:
        total time : 2.87 ms
        exec time  : 5.52 ms (192.56 %)
        blocked time  : 0.00 ms (0.00 %)

Executing =CUDA 0 (GeForce GTX 470)
Avg. delay : 647.63 us
Avg. length : 378.75 us
Worker CUDA 0 (GeForce GTX 470):
        total time : 1.15 ms
        exec time  : 0.84 ms (72.70 %)
        blocked time  : 0.00 ms (0.00 %)
Worker CPU 0:
        total time : 1.16 ms
        exec time  : 0.00 ms (0.00 %)
        blocked time  : 0.00 ms (0.00 %)


On Thu, Dec 1, 2011 at 6:46 PM, Nathalie Furmento <nathalie.furmento@labri.fr> wrote:
Hello,


On 01/12/2011 19:16, ASUDE ASUDE wrote:
Hello,
I have a question about interpreting the profiling data for tasks and workers which I get from starpu.
My setup shows
StarPU has found :
        1 CPU cores
                CPU 0
        1 CUDA devices
                CUDA 0 (GeForce GTX 470)
on starpu_machine_display.

To explain a little about my application - When I execute it w/o profiling code added, I get below messages printed on my console:
Executing on CPU 0
Executing =CUDA 0 (GeForce GTX 470)
Executing =CUDA 0 (GeForce GTX 470)

I have for loop which calls the starpu function which creates/submits the task for 3 times in loop, so above 3 messages.
Sometimes I get below:
Executing =CUDA 0 (GeForce GTX 470)
Executing on CPU 0
Executing on CPU 0

Does that mean I means starpu assigns my 3 tasks for execution to CPU 0 device, CUDA 0 device ,CUDA 0 device in first execution and
CUDA 0 device,CPU 0 device ,CPU 0 device in second execution?

Yes.


Then I added the profiling code as I understood from the profiling example at StarPU/examples/profiling/profiling.c

Now I get below times printed on console

Executing on CPU 0
Avg. delay : 41.16 us
Avg. length : 2613.87 us
Worker CUDA 0 (GeForce GTX 470):
        total time : 2.95 ms
        exec time  : 0.00 ms (0.00 %)
        blocked time  : 0.00 ms (0.00 %)
Worker CPU 0:
        total time : 2.96 ms
        exec time  : 5.44 ms (183.80 %)
        blocked time  : 0.00 ms (0.00 %)

Executing on CPU 0
Avg. delay : 37.06 us
Avg. length : 2736.63 us
Worker CUDA 0 (GeForce GTX 470):
        total time : 2.86 ms
        exec time  : 0.00 ms (0.00 %)
        blocked time  : 0.00 ms (0.00 %)
Worker CPU 0:
        total time : 2.87 ms
        exec time  : 5.52 ms (192.56 %)
        blocked time  : 0.00 ms (0.00 %)

Executing =CUDA 0 (GeForce GTX 470)
Avg. delay : 647.63 us
Avg. length : 378.75 us
Worker CUDA 0 (GeForce GTX 470):
        total time : 1.15 ms
        exec time  : 0.84 ms (72.70 %)
        blocked time  : 0.00 ms (0.00 %)
Worker CPU 0:
        total time : 1.16 ms
        exec time  : 0.00 ms (0.00 %)
        blocked time  : 0.00 ms (0.00 %)


My question is for above case, If it says first task is Executing on CPU 0, why do I get non-zero execution time for  "Worker CUDA 0"
and other way round i.e. when it says Executing =Cuda 0 , why do I see a Worker CPU 0?

You see all the workers here, the one who is executing your task, and who has an non-zero exec time and the other one.


Also another question is then in this profiling case, execution times dont add up or how to make sense of those times displayed above?

Slightly confused about the terminology - Task, Worker ?

As explained in the glossary,
http://runtime.bordeaux.inria.fr/StarPU/starpu.html#Glossary

A task represents a scheduled execution of a codelet on some data handles.
A worker execute tasks. There is typically one per CPU computation core and one per accelerator (for which a whole CPU core is dedicated).

A different/unrelated question I have is, while reading the handbook, I see this kind of info -
How do I get this kind of online prformacne data for my starpu app:

StarPU has found :
             3 CUDA devices
                     CUDA 0 (Tesla C2050 02:00.0)
                     CUDA 1 (Tesla C2050 03:00.0)
                     CUDA 2 (Tesla C2050 84:00.0)
     from    to RAM          to CUDA 0       to CUDA 1       to CUDA 2
     RAM     0.000000        5176.530428     5176.492994     5191.710722
     CUDA 0  4523.732446     0.000000        2414.074751     2417.379201
     CUDA 1  4523.718152     2414.078822     0.000000        2417.375119
     CUDA 2  4534.229519     2417.069025     2417.060863     0.000000
    
What are these numbers RAM to CUDA 0 is it data transfer time in milliseconds or something?
How do I obtain this kind of Bus related info from RAM to CUDA 0 etc... What code changes?

The numbers represent the bandwidths between the different workers.

Regards,

Nathalie





Archives gérées par MHonArc 2.6.19+.

Haut de le page