Objet : Developers list for StarPU
Archives de la liste
- From: Xavier Lacoste <xavier.lacoste@inria.fr>
- To: starpu-devel@lists.gforge.inria.fr
- Subject: Re: [Starpu-devel] Performance decreasing by adding empty tasks
- Date: Wed, 15 Feb 2012 11:50:59 +0100
- List-archive: <http://lists.gforge.inria.fr/pipermail/starpu-devel>
- List-id: "Developers list. For discussion of new features, code changes, etc." <starpu-devel.lists.gforge.inria.fr>
Hello,
I have some news about that.
I'm now working with StarPU 5070.
When I'm not using my small tasks I get the good performance I used to get.
(on "audi" quite big case 180s with 5070 (with FxT) and the best I ever had
was 135s (without FxT))
I'm trying to avoid using this small CPU tasks to prepare data for GPU and
replaced it by a new GPU kernel and adding a 2*blocknbr integer read-only
buffer to all my SPARSE_GEMM tasks.
I tested my GPU Kernel externaly and it seems to have the same good
performances as the previous one.
In PaStiX, adding this quite big read-only buffer to all my StarPU GEMM tasks
increase quite a lot the decomposition time, even if StarPU decide not using
the CUDA kernel.
Kernel 1, only CPU, no special buffer, no stride tasks
audi_llt_5070_NCPUS8_NCUDA3_heft:----- sopalin time 189.970112
audi_llt_5070_NCPUS8_NCUDA3_heft:----- sopalin time 188.608316
Kernel 2, CPU/CUDA, special buffer added but no CUDA by StarPU (to small
tasks maybe but that's not the issue), CPU kernel doesn't use the special
buffer :
audi_llt_5070_NCPUS8_NCUDA3_heft_sparse_blocktab:----- sopalin time
628.092450
audi_llt_5070_NCPUS8_NCUDA3_heft_sparse_blocktab:----- sopalin time 754.126243
Kernel 2, NCUDAS=0
audi_llt_5070_NCPUS8_NCUDA0_heft_sparse_blocktab:----- sopalin time 558.798516
audi_llt_5070_NCPUS8_NCUDA0_heft_sparse_blocktab:----- sopalin time 799.072369
Kernel 1 but with the special buffer added :
audi_llt_5070_NCPUS8_NCUDA0_heft_blocktabunused:----- sopalin time 652.212924
audi_llt_5070_NCPUS8_NCUDA0_heft_blocktabunused:----- sopalin time 499.506077
I though, as this buffer is read-only, each CPU threads would access the
original buffer and a copy would be made once on GPU and never moved after.
Thus it wouldn't cost much time.
I don't see any obvious problem with my way to add this special buffer…
Here are my modifications :
Added 1 to the gemm_codelet.nbuffer and STARPU_R,
{
int iterblock;
MALLOC_INTERN(blocktab, SYMB_BLOKNBR*2, int);
for (iterblock = 0; iterblock < SYMB_BLOKNBR; iterblock++)
{
blocktab[2*iterblock] = SYMB_FROWNUM(iterblock);
blocktab[2*iterblock+1] = SYMB_LROWNUM(iterblock);
}
starpu_vector_data_register(&blocktab_handle, 0, (uintptr_t)blocktab,
SYMB_BLOKNBR,
2*sizeof(int));
}
….
// for all GEMM task
task_gemm->handles[handle_idx++] = blocktab_handle;
….
starpu_data_unregister(blocktab_handle);
memFree_null(blocktab);
I hope these new informations will be useful,
XL.
Le 8 févr. 2012 à 15:07, Xavier Lacoste a écrit :
> Hello,
>
> I'm working on a StarPU based sparse factorisation.
>
> I have 2 types of tasks :
> - (1) Diagonal factorisation and update of the extra diagonal blocks of
> the column block.
> - (2) Multiply the transpose of a block by the block and all the blocks
> bellow in the column block and update the column block facing the current
> block.
>
> In order to use GPU i added a new SPARSE_GEMM operation that perform this
> products at once and update the facing column block with the correct
> offsets.
>
> To use this kernel, I need to compute the offsets.
> Thus I added a CPU task which compute this offset (this can't be done on
> GPUs).
> There is one handle for each offset array and each task (2) depend on one
> of this building offset task.
> The building offset task don't depend on any task.
>
> Simply adding this tasks result in a loss of performance (I perform 2
> consecutive factorization in my test):
> shipsec_0.9.2_heft_8CPU_0CUDA:----- sopalin time 11.219737
> shipsec_0.9.2_heft_8CPU_0CUDA:----- sopalin time 6.038163
>
> shipsec_0.9.2_heft_8CPU_0CUDA_SPARSE:----- sopalin time 34.779351
> shipsec_0.9.2_heft_8CPU_0CUDA_SPARSE:----- sopalin time 22.334903
>
> Even if I use an empty task doing nothing :
> void
> starpu_init_strides(void * buffers[], void * _args)
> {
> }
> shipsec_0.9.2_heft_8CPU_0CUDA_SPARSE_EMPTY:----- sopalin time 29.753910
> shipsec_0.9.2_heft_8CPU_0CUDA_SPARSE_EMPTY:----- sopalin time 20.411781
>
> I don't understand how i can loose so much time by adding this tasks.
>
> I'm using 0.9.2 version of StarPU because I noticed regression with trunk
> version :
> shipsec_r5550_heft_8CPU_0CUDA:----- sopalin time 33.981822
> shipsec_r5550_heft_8CPU_0CUDA:----- sopalin time 33.579352
>
> Do you have any clue ?
>
> I also noticed on all my Gantt diagramms (even with the best ones) that I
> have one task isolated, a large amount of "blocked" time and then my task
> (See attached Gantts)
>
> I hope my explanations are understandable,
>
> Have a nice day,
>
> XL.
>
>
> <shipsec_0.9.2_heft_8CPU_0CUDA.svg><shipsec_0.9.2_heft_8CPU_0CUDA_SPARSE_EMPTY.svg>
- [Starpu-devel] Performance decreasing by adding empty tasks, Xavier Lacoste, 08/02/2012
- <Suite(s) possible(s)>
- Re: [Starpu-devel] Performance decreasing by adding empty tasks, Samuel Thibault, 13/02/2012
- Re: [Starpu-devel] Performance decreasing by adding empty tasks, Samuel Thibault, 13/02/2012
- Re: [Starpu-devel] Performance decreasing by adding empty tasks, Samuel Thibault, 13/02/2012
- Re: [Starpu-devel] Performance decreasing by adding empty tasks, Xavier Lacoste, 13/02/2012
- Re: [Starpu-devel] Performance decreasing by adding empty tasks, Samuel Thibault, 13/02/2012
- Re: [Starpu-devel] Performance decreasing by adding empty tasks, Samuel Thibault, 13/02/2012
- Re: [Starpu-devel] Performance decreasing by adding empty tasks, Samuel Thibault, 13/02/2012
- Re: [Starpu-devel] Performance decreasing by adding empty tasks, Xavier Lacoste, 15/02/2012
- Re: [Starpu-devel] Performance decreasing by adding empty tasks, Xavier Lacoste, 16/02/2012
- Message indisponible
- Re: [Starpu-devel] Performance decreasing by adding empty tasks, Samuel Thibault, 16/02/2012
- Re: [Starpu-devel] Performance decreasing by adding empty tasks, Xavier Lacoste, 17/02/2012
- Re: [Starpu-devel] Performance decreasing by adding empty tasks, Xavier Lacoste, 21/02/2012
- Message indisponible
- Re: [Starpu-devel] Performance decreasing by adding empty tasks, Samuel Thibault, 23/02/2012
- Re: [Starpu-devel] Performance decreasing by adding empty tasks, Xavier Lacoste, 24/02/2012
- Re: [Starpu-devel] Performance decreasing by adding empty tasks, Xavier Lacoste, 24/02/2012
- Re: [Starpu-devel] Performance decreasing by adding empty tasks, Samuel Thibault, 23/02/2012
- Re: [Starpu-devel] Performance decreasing by adding empty tasks, Xavier Lacoste, 17/02/2012
- Re: [Starpu-devel] Performance decreasing by adding empty tasks, Samuel Thibault, 16/02/2012
Archives gérées par MHonArc 2.6.19+.