Accéder au contenu.
Menu Sympa

starpu-devel - [Starpu-devel] Mixing starpu_task_insert() and starpu_mpi_task_insert()

Objet : Developers list for StarPU

Archives de la liste

[Starpu-devel] Mixing starpu_task_insert() and starpu_mpi_task_insert()


Chronologique Discussions 
  • From: Mirko Myllykoski <mirkom@cs.umu.se>
  • To: starpu-devel@lists.gforge.inria.fr
  • Subject: [Starpu-devel] Mixing starpu_task_insert() and starpu_mpi_task_insert()
  • Date: Tue, 21 Aug 2018 15:31:39 +0200
  • Authentication-results: mail2-smtp-roc.national.inria.fr; spf=None smtp.pra=mirkom@cs.umu.se; spf=Pass smtp.mailfrom=mirkom@cs.umu.se; spf=None smtp.helo=postmaster@mail.cs.umu.se
  • Ironport-phdr: 9a23:sijCTRNCk+26AN2s4lcl6mtUPXoX/o7sNwtQ0KIMzox0LfjzrarrMEGX3/hxlliBBdydt6obzbKO+4nbGkU4qa6bt34DdJEeHzQksu4x2zIaPcieFEfgJ+TrZSFpVO5LVVti4m3peRMNQJW2aFLduGC94iAPERvjKwV1Ov71GonPhMiryuy+4ZLebxlKiTanfb9+MAi9oBnMuMURnYZsMLs6xAHTontPdeRWxGdoKkyWkh3h+Mq+/4Nt/jpJtf45+MFOTav1f6IjTbxFFzsmKHw65NfqtRbYUwSC4GYXX3gMnRpJBwjF6wz6Xov0vyDnuOdxxDWWMMvrRr0yRD+s7bpkSAXwhSgIKzE3/2/ZhMxugqxGoxygqBJwzpXIYIyXNvpyYr/RcMkESWdHQ81fVzZBAoS5b4YXCOQBPPxYr4r6p1ATqhW/BQ2sBOfvyz9LgX/2xq460+U8GgzB2QwgHsgOsHfTrNXwL6odTfu1wLPVzTXGcvNawyz955bRfx0nvPqCXqpwfNLMxUQhCw/JlEucpILhMj+P2ekBr3KX4/RgWO63lWIrtx19riWqy8otkYbFmocYxU7B+Ch23Io4KsG0RUt+bNOlE5ZdsTyROZFsTcM4WW5ovT43yr0Ytp6/eygH0JEnyATea/yDaYSH/gnjWPyMITd9mXJpYqm/iAiq/UihzO3zSNW03U5XoidLjtXArG4B2hLX58SdRfZx4l2t1SiP2gzL7+FLO0E0la7VK547xb4wk4IesUHCHi/sm0X2i6qWe1449eiz8ejnf7DmpoKGO49vlA7yKr4uldCnAeQkLggOWHCW+f+g1LL55035WKhKguQrnabHrpDVO8Abq7W9Aw9UyYYj9w2/Ay2p0NQWmnkHNl1Fdwydg4joPVHOOvH4Au2lj1Siijc4j8zBa6b9C4/VMz3PnanseZ556lVA00w8w9dF6J8SC7cbIfu1VFWimsbfC0ocMha3xK7CGdF5x4AZQmGOSvucMbnRtlqg7flpPuyRIpQY7mWuY8M57uLj2Cdq0WQWerOkiMNOOSKIW89+KkDcWkLCx9IIEGMEpA07FbW4g0bETDtOIW2/DftlumMLTbm+BIKGfbiDxaSb1X7iTJZNIH1DFxaXHCWwLtjWa7I3cCuXZ/RZvHkEWLymEtFz0BivsEnx0Px6K/eS4SBK7Z8=
  • List-archive: <http://lists.gforge.inria.fr/pipermail/starpu-devel/>
  • List-id: "Developers list. For discussion of new features, code changes, etc." <starpu-devel.lists.gforge.inria.fr>

Hi,

Are there any dangers in mixing starpu_task_insert() and
starpu_mpi_task_insert() functions in a same code? I assumed that
this is ok as long as I make sure that a node that inserts a task
using the starpu_task_insert() function owns the related data
handles (or it at least has a valid local copy of the data).

I have developed a StarPU-based code that has a distributed memory
support through StarPU-MPI. I am mainly using the
starpu_mpi_task_insert() function, gather operations and the
starpu_mpi_get_data_on_all_nodes_detached() function.

I am also using the starpu_task_insert() function in some places.
As I mentioned above, I assumed that this is ok in certain
situations. However, I have recently encountered a problem where one
node is left with unused early data:

=============================== BEGIN ===============================

[starpu][_starpu_mpi_early_data_check_termination] Unexpected message with comm 94690090821856 source 2 tag 251
[starpu][_starpu_mpi_early_data_check_termination] Unexpected message with comm 94690090821856 source 2 tag 252
[starpu][_starpu_mpi_early_data_check_termination] Unexpected message with comm 94690090821856 source 2 tag 256
...
[starpu][_starpu_mpi_early_data_check_termination] Unexpected message with comm 94690090821856 source 1 tag 58
/home/mirkom/.starpu_install/starpu_1.2.4_debug/lib/libstarpumpi-1.2.so.3(_starpu_mpi_early_data_check_termination+0x11a)[0x7fe791e01fe4]
/home/mirkom/.starpu_install/starpu_1.2.4_debug/lib/libstarpumpi-1.2.so.3(+0x13d68)[0x7fe791de7d68]
/lib/x86_64-linux-gnu/libpthread.so.0(+0x76db)[0x7fe791bbc6db]
/lib/x86_64-linux-gnu/libc.so.6(clone+0x3f)[0x7fe78ea1488f]

[starpu][_starpu_mpi_early_data_check_termination][assert failure] Number of unexpected received messages left is not 0 (but 2), did you forget to post a receive corresponding to a send?

nlafet-test: starpu_mpi_early_data.c:52: _starpu_mpi_early_data_check_termination: Assertion `_starpu_mpi_early_data_handle_hashmap_count == 0' failed.

=============================== END ===============================

Disabling or periodically flushing the MPI cache fixes the problem.
Other nodes finish without problems.

The problem seems to be somehow related to the starpu_task_insert()
function. My code has a special copy-codelet that is used to copy a
section of a (tiled/blocked) matrix. The related tasks are inserted
using the starpu_mpi_task_insert() function. The data is copied to
data handles that are owner by rank X. I insert the copy tasks in all
nodes. The node/rank X will then insert a set of tasks using the
starpu_task_insert() function. This process involves some additional
data handles that are local to node/rank X (starpu_mpi_data_register
function is not used). I then insert a set of copy tasks in all nodes
that copy the data back using the starpu_mpi_task_insert() function.
In the end, the node X is left with "unexpected messages".

In summary:
Initial: Matrix A is distributed among all nodes
Step 1. All nodes: "MPI insert" a set of copy task:
B (rank X, RW) <- part of A (any rank, R)
Step 2. Node X: Insert a set of compute tasks:
part of B (RW) <- C_1 (R), ...
Step 3. All nodes: "MPI insert" a set of copy task:
part of A (any rank, RW) <- B (rank X, R)
Outcome: Node X has "unexpected messages" that are part of A.

I do not understand how modifying B locally on node X could cause
the other nodes to send excess data to node X.

Best Regards,
Mirko Myllykoski




Archives gérées par MHonArc 2.6.19+.

Haut de le page