Accéder au contenu.
Menu Sympa

starpu-devel - Re: [Starpu-devel] Error handling in StarPU.

Objet : Developers list for StarPU

Archives de la liste

Re: [Starpu-devel] Error handling in StarPU.


Chronologique Discussions 
  • From: Samuel Thibault <samuel.thibault@ens-lyon.org>
  • To: Cyril Roelandt <cyril.roelandt@inria.fr>
  • Cc: starpu-devel@lists.gforge.inria.fr
  • Subject: Re: [Starpu-devel] Error handling in StarPU.
  • Date: Mon, 26 Mar 2012 18:59:10 +0200
  • List-archive: <http://lists.gforge.inria.fr/pipermail/starpu-devel>
  • List-id: "Developers list. For discussion of new features, code changes, etc." <starpu-devel.lists.gforge.inria.fr>

Cyril Roelandt, le Mon 26 Mar 2012 12:22:46 +0200, a écrit :
> We tend to use a lot of assertions in StarPU, and I think we may want to be
> more flexible. Some errors could probably be handled by users, rather than
> crash their programs. For instance, I believe that starpu_data_acquire()
> should return -EINVAL when the handle provided by the user is NULL or
> already partitioned.
>
> We may encounter more critical errors, requiring us to crash.

I believe the example above is already a quite critical error.

> - should we try to let users handle more errors themselves (and yes, they
> __must__ check the return values of the functions they use) ?

I'd say there are about three kinds of errors:

- errors that the application can cope with by itself, such as
ENODEV. We should let the user have to handle it.
- errors that the application can not cope with anyway: if StarPU
entered an unthinkable state, the application can not do much about
it except bailing out.
- obvious errors that are not obvious to beginners.

Making users clutter their code with error checking will make them
really unhappy. They will soon just use -Wno-unused-result and be done
with it; not really what we want, because they will soon report bugs
about code which makes errors of the 3rd category. Also, just returning
an error code (EINVAL) does not tell the user what is being done wrong.

The example of acquiring a partitioned data is very clear to me:
it can not work in any such situation, and it's clearly a beginner
programming error, so to help him StarPU should at the very minimum
print a warning that explains what is wrong about it (which is the point
of STARPU_ASSERT_MSG). Then, crashing or returning an error, I don't
really care. But making sure that an explanation warning is printed is
really very important to help the user.

Instead of systematically crashing, we can let the application register
a hook that gets called instead of the crash (and crash if the hook
returns).

> - should we use STARPU_WARN_UNUSED_RESULT (defined for clang and gcc users)
> more often ?

In the first kind of error, with care, because it makes users angry.

> - when should we use STARPU_ABORT() and STARPU_ASSERT() ?

Their respective meaning has changed several times already, so it's hard
to tell which is which nowadays... except that STARPU_ABORT() always
aborts, and STARPU_ASSERT() aborts on a given condition and is disabled
by --enable-fast. Abortion can mean just calling the hook mentioned
above, to be nice with the application. Whether the test should be done
or not, I'd say it depends whether the test is a bit expensive or not,
since it'll be disabled with --enable-fast.

Samuel





Archives gérées par MHonArc 2.6.19+.

Haut de le page