Accéder au contenu.
Menu Sympa

starpu-devel - [Starpu-devel] pthread_rwlock_t-related failures on Darwin.

Objet : Developers list for StarPU

Archives de la liste

[Starpu-devel] pthread_rwlock_t-related failures on Darwin.


Chronologique Discussions 
  • From: Cyril Roelandt <cyril.roelandt@inria.fr>
  • To: "starpu-devel@lists.gforge.inria.fr" <starpu-devel@lists.gforge.inria.fr>
  • Subject: [Starpu-devel] pthread_rwlock_t-related failures on Darwin.
  • Date: Fri, 31 Aug 2012 23:23:26 +0200
  • List-archive: <http://lists.gforge.inria.fr/pipermail/starpu-devel>
  • List-id: "Developers list. For discussion of new features, code changes, etc." <starpu-devel.lists.gforge.inria.fr>

Hey !

We've been seeing a lot of failures on Darwin lately, all located in src/core/progress_hook.c:

unsigned _starpu_execute_registered_progression_hooks(void)
{
/* If there is no hook registered, we short-cut loop. */
_STARPU_PTHREAD_RWLOCK_RDLOCK(&progression_hook_rwlock);
int no_hook = (active_hook_cnt == 0);
_STARPU_PTHREAD_RWLOCK_UNLOCK(&progression_hook_rwlock);
...
}

The following errors (randomly) happen:

- pthread_rwlock_rdlock fails with EBUSY (which is not possible according to the man page) or EINVAL.
- pthread_rwlock_unlock fails with EINVAL.

Note that most of the time, there is no error at all.

Taking a closer look at src/core/progress_hook.c, we may notice that:
- progression_hook_rwlock is is a static variable initialized with PTHREAD_RWLOCK_INITIALIZER.
- _STARPU_PTHREAD_RWLOCK_WRLOCK is called on this rwlock in starpu_progression_hook_register() and in starpu_progression_hook_deregister(), but these 2 functions are never called in the tests (except for the MPI tests, but I disabled MPI).
- _STARPU_PTHREAD_RWLOCK_RDLOCK is only called on this rwlock in _starpu_execute_registered_progression_hooks().


I tried to dynamically initialize progression_hook_rwlock, with this patch:

--- a/trunk/src/core/progress_hook.c
+++ b/trunk/src/core/progress_hook.c
@@ -29,7 +29,9 @@ struct progression_hook
};

/* protect the hook table */
-static pthread_rwlock_t progression_hook_rwlock = PTHREAD_RWLOCK_INITIALIZER;
+static pthread_rwlock_t progression_hook_rwlock;
+static int init = 0;
+static pthread_mutex_t init_lock;

static struct progression_hook hooks[NMAXHOOKS] = {{NULL, NULL, 0}};
static int active_hook_cnt = 0;
@@ -76,6 +78,14 @@ void starpu_progression_hook_deregister(int hook_id)

unsigned _starpu_execute_registered_progression_hooks(void)
{
+ _STARPU_PTHREAD_MUTEX_LOCK(&init_lock)
+ if (init == 0)
+ {
+ _STARPU_PTHREAD_RWLOCK_INIT(&progression_hook_rwlock, NULL);
+ init = 1;
+ }
+ _STARPU_PTHREAD_MUTEX_UNLOCK(&init_lock)
+
/* If there is no hook registered, we short-cut loop. */
_STARPU_PTHREAD_RWLOCK_RDLOCK(&progression_hook_rwlock);
int no_hook = (active_hook_cnt == 0)



OK, the implementation sucks: we should not take "init_lock" every time we enter _starpu_execute_regsitered_progression_hooks(), but this was the quickest way for me to make sure to initialize progression_hook_rwlock only once. Ideally, we should initialize progression_hook_rwlock from starpu_init().

The thing is, after applying this patch, I can run "make check" on a Darwin machine several times and never witness any problem with the calls to _STARPU_PTHREAD_RWLOCK_RDLOK/_STARPU_PTHREAD_RWLOCK_UNLOCK. When running "make check" without the patch, at least one test fails.

Does anyone understand what's going on here ?

Cyril.





  • [Starpu-devel] pthread_rwlock_t-related failures on Darwin., Cyril Roelandt, 31/08/2012

Archives gérées par MHonArc 2.6.19+.

Haut de le page