Objet : Developers list for StarPU
Archives de la liste
- From: "SYLVAND, Guillaume" <guillaume.sylvand@airbus.com>
- To: "starpu-devel@lists.gforge.inria.fr" <starpu-devel@lists.gforge.inria.fr>
- Subject: [Starpu-devel] 2 small patches
- Date: Tue, 12 Jul 2016 09:14:46 +0000
- Accept-language: fr-FR, en-US
- Authentication-results: mail2-smtp-roc.national.inria.fr; spf=None smtp.pra=guillaume.sylvand@airbus.com; spf=Pass smtp.mailfrom=guillaume.sylvand@airbus.com; spf=None smtp.helo=postmaster@airbus-sf7.airbus.gmessaging.net
- Ironport-phdr: 9a23:bzQSgxxe+l1TQXvXCy+O+j09IxM/srCxBDY+r6Qd0OwUIJqq85mqBkHD//Il1AaPBtSDrakbwLOO+4nbGkU4qa6bt34DdJEeHzQksu4x2zIaPcieFEfgJ+TrZSFpVO5LVVti4m3peRMNQJW2WVTerzWI4CIIHV2nbEwud7yzQdGZ1pz//tvx0qWbWx9Piju5bOE6BzSNhiKViPMrh5B/IL060BrDrygAUe1XwWR1OQDbxE6ktY/jtKJkpm5Lp/s779MFXajkcqAQSb1DEC9gPG4y/sLm8xjFVwqGoHUGGC1CiQZBGRDYqR33QJr1mi/7rfZmniaUOtf5QPY1Xy6j5uFlUkm7siofMy8F9zSdsf1Usot9jFbpiAF+x4rdfYSYfrIqcb7cedcTWG9MGN1AUzJIBI+UZooVBfcae+1fqt+uiUEJqE70PjiIOsrJ5nsAqWL32akzzuEsW0mS0xEhG9YHrXHZ6s7kMLkbV+ydxajSxyjYKfhR3GGuu8Dzbhk9rKTUDvpLes3LxBx3Gg==
- List-archive: <http://lists.gforge.inria.fr/pipermail/starpu-devel/>
- List-id: "Developers list. For discussion of new features, code changes, etc." <starpu-devel.lists.gforge.inria.fr>
Hi,
Is it possible to integrate these two small patches :
- the first one extends the list of error core returned by MPI and
prints the unknown error codes ;
- the second modifies the watchdog function so as to avoid waiting for a
full timeout at the end of the computation (especially usefull e.g. if
the timeout is 1 minute and the computation lasts 5 seconds...)
Let me know if the format of these files is not adequate ;-)
Regards,
GS
--
Guillaume SYLVAND
Airbus Group Innovations / INRIA Bordeaux Sud-Ouest
Std : +33 (0)5 35 00 26 27
GSM : +33 (0)6 72 87 47 36
Fax : +33 (0)5 61 16 88 05
--------------------------
Postal Adress:
Inria Bordeaux – Sud-Ouest
HiePACS Team
200 avenue de la vieille tour
33405 Talence Cedex France
The information in this e-mail is confidential. The contents may not be
disclosed or used by anyone other than the addressee. Access to this e-mail
by anyone else is unauthorised.
If you are not the intended recipient, please notify Airbus immediately and
delete this e-mail.
Airbus cannot accept any responsibility for the accuracy or completeness of
this e-mail as it has been sent over public networks. If you have any
concerns over the content of this message or its Accuracy or Integrity,
please contact Airbus immediately.
All outgoing e-mails from Airbus are checked using regularly updated virus
scanning software but you should take whatever measures you deem to be
appropriate to ensure that this message and any attachments are virus free.
From 75c0c9c306048d5fa0255eef8236263a80e9657b Mon Sep 17 00:00:00 2001
From: SYLVAND Guillaume <guillaume.sylvand@airbus.com>
Date: Wed, 25 May 2016 14:32:15 -0500
Subject: [PATCH] Adapt to watchdog function to avoid waiting too long at the
end of the computation
---
src/core/task.c | 14 +++++++++++++-
1 file changed, 13 insertions(+), 1 deletion(-)
diff --git a/src/core/task.c b/src/core/task.c
index cb9e62e..2b3e8fe 100644
--- a/src/core/task.c
+++ b/src/core/task.c
@@ -1175,6 +1175,7 @@ static void *watchdog_func(void *arg)
#else
timeout = ((float) atoll(timeout_env)) / 1000000;
#endif
+
struct _starpu_machine_config *config = (struct
_starpu_machine_config *)_starpu_get_machine_config();
STARPU_PTHREAD_MUTEX_LOCK(&config->submitted_mutex);
@@ -1184,7 +1185,18 @@ static void *watchdog_func(void *arg)
config->watchdog_ok = 0;
STARPU_PTHREAD_MUTEX_UNLOCK(&config->submitted_mutex);
- starpu_sleep(timeout);
+ // If we do a sleep(timeout), we might have to wait too long
at the end of the computation.
+ //starpu_sleep(timeout);
+ // To avoid that, we do several sleep() of 1s (and check
after each if starpu is still running)
+ float t;
+ for (t=timeout ; t>1. ; t--) {
+ starpu_sleep(1.);
+ if (_starpu_machine_is_running())
+ return NULL;
+ }
+ // and one final sleep (of less than 1 s) with the rest (if
needed)
+ if (t>0.)
+ starpu_sleep(t);
STARPU_PTHREAD_MUTEX_LOCK(&config->submitted_mutex);
if (!config->watchdog_ok && last_nsubmitted
--
2.6.2
From 3894006d62e8bac89cc6081c2f79af208971fb1a Mon Sep 17 00:00:00 2001
From: Jerome Robert <jeromerobert@gmx.com>
Date: Wed, 25 May 2016 08:34:52 +0200
Subject: [PATCH 1/2] MPI: Replace "UNKNOWN_MPI_CODE" by a meaningful integer
---
mpi/src/starpu_mpi_private.c | 6 +++++-
1 file changed, 5 insertions(+), 1 deletion(-)
diff --git a/mpi/src/starpu_mpi_private.c b/mpi/src/starpu_mpi_private.c
index 9bf12a5..a396d7d 100644
--- a/mpi/src/starpu_mpi_private.c
+++ b/mpi/src/starpu_mpi_private.c
@@ -45,6 +45,7 @@ void starpu_mpi_set_communication_tag(int tag)
char *_starpu_mpi_get_mpi_code(int code)
{
+ char * r;
switch (code)
{
case MPI_SUCCESS: return "MPI_SUCCESS";
@@ -210,6 +211,9 @@ char *_starpu_mpi_get_mpi_code(int code)
#if defined(MPI_ERR_LASTCODE) && MPI_ERR_LASTCODE != MPI_SUCCESS
case MPI_ERR_LASTCODE: return "MPI_ERR_LASTCODE";
#endif
- default: return "UNKNOWN_MPI_CODE";
+ default:
+ r = malloc(12);
+ snprintf(r, 12, "%d", code);
+ return r;
}
}
--
2.6.2
From 8f80c14185af49f7f987790f8f4960c8f439babc Mon Sep 17 00:00:00 2001
From: SYLVAND Guillaume <guillaume.sylvand@airbus.com>
Date: Wed, 25 May 2016 08:59:26 -0500
Subject: [PATCH 2/2] starpu_mpi_get_mpi_code: Add somme error codes found in
mpi.h (platform MPI 9.1)
---
mpi/src/starpu_mpi_private.c | 12 ++++++++++++
1 file changed, 12 insertions(+)
diff --git a/mpi/src/starpu_mpi_private.c b/mpi/src/starpu_mpi_private.c
index a396d7d..dd70c30 100644
--- a/mpi/src/starpu_mpi_private.c
+++ b/mpi/src/starpu_mpi_private.c
@@ -207,6 +207,18 @@ char *_starpu_mpi_get_mpi_code(int code)
#endif
#ifdef MPI_ERR_WIN
case MPI_ERR_WIN: return "MPI_ERR_WIN";
+#endif
+#ifdef MPI_ERR_EXITED
+ case MPI_ERR_EXITED: return "MPI_ERR_EXITED";
+#endif
+#ifdef MPI_ERR_CONNECT
+ case MPI_ERR_CONNECT: return "MPI_ERR_CONNECT";
+#endif
+#ifdef MPI_ERR_PROC_FAILED
+ case MPI_ERR_PROC_FAILED: return "MPI_ERR_PROC_FAILED";
+#endif
+#ifdef MPI_ERR_REVOKED
+ case MPI_ERR_REVOKED: return "MPI_ERR_REVOKED";
#endif
#if defined(MPI_ERR_LASTCODE) && MPI_ERR_LASTCODE != MPI_SUCCESS
case MPI_ERR_LASTCODE: return "MPI_ERR_LASTCODE";
--
2.6.2
- [Starpu-devel] 2 small patches, SYLVAND, Guillaume, 12/07/2016
- Re: [Starpu-devel] 2 small patches, Samuel Thibault, 12/07/2016
Archives gérées par MHonArc 2.6.19+.