Objet : Developers list for StarPU
Archives de la liste
- From: Cyril Roelandt <cyril.roelandt@inria.fr>
- To: "starpu-devel@lists.gforge.inria.fr" <starpu-devel@lists.gforge.inria.fr>
- Subject: [Starpu-devel] pthread_rwlock_t-related failures on Darwin.
- Date: Fri, 31 Aug 2012 23:23:26 +0200
- List-archive: <http://lists.gforge.inria.fr/pipermail/starpu-devel>
- List-id: "Developers list. For discussion of new features, code changes, etc." <starpu-devel.lists.gforge.inria.fr>
Hey !
We've been seeing a lot of failures on Darwin lately, all located in src/core/progress_hook.c:
unsigned _starpu_execute_registered_progression_hooks(void)
{
/* If there is no hook registered, we short-cut loop. */
_STARPU_PTHREAD_RWLOCK_RDLOCK(&progression_hook_rwlock);
int no_hook = (active_hook_cnt == 0);
_STARPU_PTHREAD_RWLOCK_UNLOCK(&progression_hook_rwlock);
...
}
The following errors (randomly) happen:
- pthread_rwlock_rdlock fails with EBUSY (which is not possible according to the man page) or EINVAL.
- pthread_rwlock_unlock fails with EINVAL.
Note that most of the time, there is no error at all.
Taking a closer look at src/core/progress_hook.c, we may notice that:
- progression_hook_rwlock is is a static variable initialized with PTHREAD_RWLOCK_INITIALIZER.
- _STARPU_PTHREAD_RWLOCK_WRLOCK is called on this rwlock in starpu_progression_hook_register() and in starpu_progression_hook_deregister(), but these 2 functions are never called in the tests (except for the MPI tests, but I disabled MPI).
- _STARPU_PTHREAD_RWLOCK_RDLOCK is only called on this rwlock in _starpu_execute_registered_progression_hooks().
I tried to dynamically initialize progression_hook_rwlock, with this patch:
--- a/trunk/src/core/progress_hook.c
+++ b/trunk/src/core/progress_hook.c
@@ -29,7 +29,9 @@ struct progression_hook
};
/* protect the hook table */
-static pthread_rwlock_t progression_hook_rwlock = PTHREAD_RWLOCK_INITIALIZER;
+static pthread_rwlock_t progression_hook_rwlock;
+static int init = 0;
+static pthread_mutex_t init_lock;
static struct progression_hook hooks[NMAXHOOKS] = {{NULL, NULL, 0}};
static int active_hook_cnt = 0;
@@ -76,6 +78,14 @@ void starpu_progression_hook_deregister(int hook_id)
unsigned _starpu_execute_registered_progression_hooks(void)
{
+ _STARPU_PTHREAD_MUTEX_LOCK(&init_lock)
+ if (init == 0)
+ {
+ _STARPU_PTHREAD_RWLOCK_INIT(&progression_hook_rwlock, NULL);
+ init = 1;
+ }
+ _STARPU_PTHREAD_MUTEX_UNLOCK(&init_lock)
+
/* If there is no hook registered, we short-cut loop. */
_STARPU_PTHREAD_RWLOCK_RDLOCK(&progression_hook_rwlock);
int no_hook = (active_hook_cnt == 0)
OK, the implementation sucks: we should not take "init_lock" every time we enter _starpu_execute_regsitered_progression_hooks(), but this was the quickest way for me to make sure to initialize progression_hook_rwlock only once. Ideally, we should initialize progression_hook_rwlock from starpu_init().
The thing is, after applying this patch, I can run "make check" on a Darwin machine several times and never witness any problem with the calls to _STARPU_PTHREAD_RWLOCK_RDLOK/_STARPU_PTHREAD_RWLOCK_UNLOCK. When running "make check" without the patch, at least one test fails.
Does anyone understand what's going on here ?
Cyril.
- [Starpu-devel] pthread_rwlock_t-related failures on Darwin., Cyril Roelandt, 31/08/2012
Archives gérées par MHonArc 2.6.19+.