Accéder au contenu.
Menu Sympa

starpu-devel - Re: [Starpu-devel] [LU factorisation: gdb debug output]

Objet : Developers list for StarPU

Archives de la liste

Re: [Starpu-devel] [LU factorisation: gdb debug output]


Chronologique Discussions 
  • From: Maxim Abalenkov <maxim.abalenkov@gmail.com>
  • To: Samuel Thibault <samuel.thibault@inria.fr>, Olivier Aumage <olivier.aumage@inria.fr>
  • Cc: starpu-devel@lists.gforge.inria.fr
  • Subject: Re: [Starpu-devel] [LU factorisation: gdb debug output]
  • Date: Mon, 5 Feb 2018 12:16:54 +0000
  • Authentication-results: mail3-smtp-sop.national.inria.fr; spf=None smtp.pra=maxim.abalenkov@gmail.com; spf=Pass smtp.mailfrom=maxim.abalenkov@gmail.com; spf=None smtp.helo=postmaster@mail-wr0-f170.google.com
  • List-archive: <http://lists.gforge.inria.fr/pipermail/starpu-devel/>
  • List-id: "Developers list. For discussion of new features, code changes, etc." <starpu-devel.lists.gforge.inria.fr>

Dear all,

I’m on a mission to apply the SPMD capability of the StarPU (http://starpu.gforge.inria.fr/doc/html/TasksInStarPU.html#ParallelTasks) for a panel factorisation stage of the LU algorithm. Please see the figure attached for an example of my scenario.

The matrix is viewed as a set of tiles (rectangular or square matrix blocks). A column of tiles is called a panel.

In the first stage of the LU algorithm I would like to take a panel, find the pivots, swap the necessary rows, scale and update the underlying matrix elements. To track the dependencies I created tile descriptors, that keep the information about the access mode and the tile handle. Essentially, the tile descriptors are used to “lock” the entire panel, all the operations inside are parallelised manually using a custom barrier and auxiliary arrays, to store the maximum values and their indices. To be able to assign a particular task to a thread (processing the panel factorisation) I use ranks. Depending on a rank each thread will get its portion of the data to work on. Inside the panel threads are synchronised manually and wait for each other at the custom barrier.

Please pay attention to the attached figure. A panel consisting of five tiles is passed to the StarPU task. Imagine we have three treads processing the panel. To find the first pivot we assign the first column of each tile to a certain thread in the Round-Robin manner (0,1,2,0,1). Once the maximum per tile is found by each thread, the master thread (with rank 0) will select the global maximum. I would like to apply the SPMD capability of StarPU to process the panel and use a custom barrier inside.

Please consider the C code below. The code works, but the threads wait infinitely at the first barrier. My questions are:

1) Am I passing the barrier structure correctly, so that it is “shared" amongst all the threads and the threads “know” about the status of the other threads. To achieve this I pass the barrier structure by reference.
2) Maybe it is the tile descriptors that “block” the execution of the threads inside the panel? Maybe the threads with ranks 1, 2 can not proceed, since all the tiles are blocked by rank 0? Therefore, I can make a conclusion that “blocking” the tiles like I do is incorrect?
3) Is there a way to pass a variable to the codelet to set the “max_parallelism” value instead of hard-coding it?

4) If I may, I would like to make a general comment, please. I like StarPU very much. I think you have invested a great deal of time and effort into it. Thank you. But to my mind the weakest point (from my user experience) is passing the values to StarPU, while inserting a task. There is no type checking of the variables here. The same applies to the routine “starpu_codelet_unpack_args()”, when you want to obtain the values “on the other side”. Sometimes, it becomes a nightmare and a trial-and-error exercise. If the type checks could be enforced there, it would make a user’s life much easier.

// StarPU LU panel factorisation function
/******************************************************************************/
void core_zgetrf(plasma_desc_t A, plasma_complex64_t **pnl, int *piv,
                 volatile int *max_idx, volatile plasma_complex64_t *max_val,
                 int ib, int rank, int mtpf, volatile int *info,
                 plasma_barrier_t *barrier)
{
}

/******************************************************************************/
// StarPU ZGETRF SPMD CPU kernel
static void core_starpu_cpu_zgetrf_spmd(void *desc[], void *cl_arg) {

    plasma_desc_t A;
    int ib, mtpf, k, *piv;
    volatile int *max_idx, *info;
    volatile plasma_complex64_t *max_val;
    plasma_barrier_t *barrier;

    // Unpack scalar arguments
    starpu_codelet_unpack_args(cl_arg, &A, &max_idx, &max_val, &ib, &mtpf,
                               &k, &info, &barrier);

    int rank = starpu_combined_worker_get_rank();

    // Array of pointers to subdiagonal tiles in panel k (incl. diagonal tile k)
    plasma_complex64_t **pnlK =
        (plasma_complex64_t**) malloc((size_t)A.mt * sizeof(plasma_complex64_t*));
    assert(pnlK != NULL);

    printf("Panel: %d\n", k);

    // Unpack tile data
    for (int i = 0; i < A.mt; i++) {
        pnlK[i] = (plasma_complex64_t *) STARPU_MATRIX_GET_PTR(desc[i]);
    }

    // Unpack pivots vector
    piv = (int *) STARPU_VECTOR_GET_PTR(desc[A.mt]);

    // Call computation kernel
    core_zgetrf(A, pnlK, &piv[k*A.mb], max_idx, max_val,
                ib, rank, mtpf, info, barrier);

    // Deallocate container panel
    free(pnlK);
}

/******************************************************************************/
// StarPU SPMD codelet
static struct starpu_codelet core_starpu_codelet_zgetrf_spmd =
{
    .type            = STARPU_SPMD,
    .max_parallelism = 2,
    .cpu_funcs       = { core_starpu_cpu_zgetrf_spmd },
    .cpu_funcs_name  = { "zgetrf_spmd" },
    .nbuffers        = STARPU_VARIABLE_NBUFFERS,
};

/******************************************************************************/
// StarPU task inserter
void core_starpu_zgetrf_spmd(plasma_desc_t A, starpu_data_handle_t hPiv,
                             volatile int *max_idx, volatile plasma_complex64_t *max_val,
                             int ib, int mtpf, int k,
                             volatile int *info, plasma_barrier_t *barrier) {

    // Pointer to first (top) tile in panel k
    struct starpu_data_descr *pk = &(A.tile_desc[k*(A.mt+k+1)]);

    // Set access modes for subdiagonal tiles in panel k (incl. diagonal tile k)
    for (int i = 0; i < A.mt; i++) {
        (pk+i)->mode = STARPU_RW;
    }

    int retval = starpu_task_insert(
        &core_starpu_codelet_zgetrf_spmd,
        STARPU_VALUE,               &A,         sizeof(plasma_desc_t),
        STARPU_DATA_MODE_ARRAY,      pk,        A.mt,
        STARPU_RW,                   hPiv,
        STARPU_VALUE,               &max_idx,   sizeof(volatile int*),
        STARPU_VALUE,               &max_val,   sizeof(volatile plasma_complex64_t*),
        STARPU_VALUE,               &ib,        sizeof(int),
        STARPU_VALUE,               &mtpf,      sizeof(int),
        STARPU_VALUE,               &k,         sizeof(int),
        STARPU_VALUE,               &info,      sizeof(volatile int*),
        STARPU_VALUE,               &barrier,   sizeof(plasma_barrier_t*),
        STARPU_NAME,                "zgetrf",
        0);

    STARPU_CHECK_RETURN_VALUE(retval, "core_starpu_zgetrf: starpu_task_insert() failed");
}

Best wishes,
Maxim


Maxim Abalenkov \\ maxim.abalenkov@gmail.com
+44 7 486 486 505 \\ http://mabalenk.gitlab.io

On 24 Jan 2018, at 17:52, Maxim Abalenkov <maxim.abalenkov@gmail.com> wrote:

Hello Samuel,

Thank you very much! Yes, in this particular use-case “STARPU_NONE” would come handy and make the source code much more “elegant”.

Best wishes,
Maxim

Maxim Abalenkov \\ maxim.abalenkov@gmail.com
+44 7 486 486 505 \\ http://mabalenk.gitlab.io

On 24 Jan 2018, at 17:47, Samuel Thibault <samuel.thibault@inria.fr> wrote:

Hello,

Maxim Abalenkov, on lun. 15 janv. 2018 18:04:48 +0000, wrote:
I have a very simple question. What is the overhead of using the STARPU_NONE
access mode for some handles in the STARPU_DATA_MODE_ARRAY?

It is not implemented, we hadn't thought it could be useful. I have now
added it to the TODO list (but that list is very long and doesn't tend
to progress quickly).

The overhead would be quite small: StarPU would just write it down in
the array of data to fetch, and just not process that element. Of course
the theoretical complexity will be O(number of data).

In order to avoid using complicated offsets in my computation routines
I would like to pass them a column of matrix tiles, while setting the
“unused” tiles to “STARPU_NONE”.

I see.

Samuel





Archives gérées par MHonArc 2.6.19+.

Haut de le page