Accéder au contenu.
Menu Sympa

starpu-devel - Re: [Starpu-devel] [BUG] tests/datawizard/reclaim.c: hangs forever in task_wait_for_all when using OpenCL.

Objet : Developers list for StarPU

Archives de la liste

Re: [Starpu-devel] [BUG] tests/datawizard/reclaim.c: hangs forever in task_wait_for_all when using OpenCL.


Chronologique Discussions 
  • From: Nathalie Furmento <nathalie.furmento@labri.fr>
  • To: Cyril Roelandt <cyril.roelandt@inria.fr>
  • Cc: starpu-devel@lists.gforge.inria.fr, ludovic.stordeur@inria.fr
  • Subject: Re: [Starpu-devel] [BUG] tests/datawizard/reclaim.c: hangs forever in task_wait_for_all when using OpenCL.
  • Date: Fri, 09 Mar 2012 09:36:38 +0100
  • List-archive: <http://lists.gforge.inria.fr/pipermail/starpu-devel>
  • List-id: "Developers list. For discussion of new features, code changes, etc." <starpu-devel.lists.gforge.inria.fr>

All,

hannibal has been acting weirdly last night. The buildbot profiles took much longer than normal to run and failed with segfaults in libcudart.

I am rebooting the machine, hoping it will reset the system in a proper state, you should re-run your program.

Cheers,

Nathalie

On 09/03/2012 03:29, Cyril Roelandt wrote:
Hello,

I am currently trying to write OpenCL versions of our CUDA codelets for all the programs found in examples/ and tests/.

It is going well so far, but I jave just stumbled upon the weirdest bug in tests/datawizard/reclaim.c. I applied the following patch :

--- tests/datawizard/reclaim.c (revision 6089)
+++ tests/datawizard/reclaim.c (working copy)
@@ -34,7 +34,7 @@
# define BLOCK_SIZE (64*1024*1024)
#endif

-static unsigned ntasks = 1000;
+static unsigned ntasks = 1;

#ifdef STARPU_HAVE_HWLOC
static uint64_t get_total_memory_size(void)
@@ -54,9 +54,9 @@

static struct starpu_codelet dummy_cl =
{
- .where = STARPU_CPU|STARPU_CUDA,
.cpu_funcs = {dummy_func, NULL},
.cuda_funcs = {dummy_func, NULL},
+ .opencl_funcs = {dummy_func, NULL},
.nbuffers = 3,
.modes = {STARPU_RW, STARPU_R, STARPU_R}
};

This patch is quite simple. It is worth noting that "dummy_func" is an empty function: it could not be any dummier. I have changed the value of ntasks because otherwise, the test takes quite a lot of time to run.

When running this test on a CPU, everything works as expected :

$ STARPU_NCPUS=1 STARPU_NCUDA=0 STARPU_NOPENCL=0 time -p ./tests/datawizard/reclaim
...
real 2.46
user 0.02
sys 1.98

With CUDA, things start to take an awful lot of time :

$ STARPU_NCPUS=0 STARPU_NCUDA=1 STARPU_NOPENCL=0 time -p ./tests/datawizard/reclaim
...
real 46.55
user 33.84
sys 15.13

With OpenCL, the program just hangs for ever: the program is stuck in starpu_task_wait_for_all().

I've run all these commands on Hannibal.

This reminds me of the bug Ludovic is currently struggling with : starpu_task_wait_for_all() never returns, and he uses OpenCL kernels. Ludovic, iirc, you started to debug this issue, so, could you tell us more about it ?

Cyril.

_______________________________________________
Starpu-devel mailing list
Starpu-devel@lists.gforge.inria.fr
http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/starpu-devel






Archives gérées par MHonArc 2.6.19+.

Haut de le page