Objet : Developers list for StarPU
Archives de la liste
- From: Samuel Thibault <samuel.thibault@inria.fr>
- To: XAVIER LACOSTE <xavier.lacoste@eviden.com>
- Cc: "starpu-devel@inria.fr" <starpu-devel@inria.fr>
- Subject: Re: [starpu-devel] Late data transfers
- Date: Thu, 22 Aug 2024 15:13:16 +0200
- Authentication-results: mail3-relais-sop.national.inria.fr; dkim=none (message not signed) header.i=none; spf=SoftFail smtp.mailfrom=samuel.thibault@inria.fr; dmarc=fail (p=none dis=none) d=inria.fr
- Organization: I am not organized
Hello,
It would be useful to provide the paje trace, they contain much more
information than what is displayed by default.
XAVIER LACOSTE, le mer. 07 août 2024 13:23:57 +0000, a ecrit:
> One the attached trace of a LU factorization,
Which implementation is this? Chameleon?
> GETRF (black) and TRSMs (blue)
> are executed on CPU except for the first ones in row and column and GEMMs
> are
> all batched on the GPU (except the first one wich is on the GPU but not
> batch).
> Here are some remarks I can't explain on the trace:
>
> • The TRSMs on host wait until the block is sent to GPU to start (while
> they
> could start just after it).
> • data transfers from host to device do not start immediatly after GETRF
I'm not sure what exactly you refer to. Do you refer to the red piece
just after the black GETRF? I don't know what that is but that is
surprising indeed. That being said, I see at the bottom a lot of tasks
submission right there, and the number of submitted tasks only grow
there, so it looks like you have some spurious synchronization in your
submission thread?
> nor after each TRSM.
I also seems there that you have task submissions showing up at the
bottom, so probably more spurious synchronization?
> • data transfer from device to host, before next GETRF do not occurs
> immediatly after the tiny GEMM on the GPU (just after the TRSMs on the
> GPU)
Again, this looks to me like spurious synchronization in the submission
loop.
> I use 2 streams on the CUDA device because there are small idle time between
> each kernel....
Is there really, how do you see it? StarPU already pipelines kernel
submission, by submitting kernel number n+1 while kernel n is running.
Samuel
- [starpu-devel] Late data transfers, XAVIER LACOSTE, 07/08/2024
- RE: [starpu-devel] Late data transfers, XAVIER LACOSTE, 07/08/2024
- Re: [starpu-devel] Late data transfers, Samuel Thibault, 22/08/2024
- Re: [starpu-devel] Late data transfers, Samuel Thibault, 22/08/2024
- RE: [starpu-devel] Late data transfers, XAVIER LACOSTE, 28/08/2024
- Re: [starpu-devel] Late data transfers, Samuel Thibault, 28/08/2024
- RE: [starpu-devel] Late data transfers, XAVIER LACOSTE, 28/08/2024
- RE: [starpu-devel] Late data transfers, XAVIER LACOSTE, 07/08/2024
Archives gérées par MHonArc 2.6.19+.