Objet : Developers list for StarPU
Archives de la liste
- From: Xavier Lacoste <xl64100@gmail.com>
- To: starpu-devel@lists.gforge.inria.fr
- Subject: [Starpu-devel] Assert : Number of copy requests left is not zero
- Date: Tue, 4 Nov 2014 10:05:38 +0100
- List-archive: <http://lists.gforge.inria.fr/pipermail/starpu-devel/>
- List-id: "Developers list. For discussion of new features, code changes, etc." <starpu-devel.lists.gforge.inria.fr>
Hello,
I did some changes to my application to improve MPI scalability.
I moved insertion of data involving incoming MPI communication from the
beginning of the submission loop to just before the task that will need it
(i.e. incoming GEMM(remote, ColumnBlock_i) are called just before
POTRF(ColumnBlock_i), to be more efficient I insert all update from
GEMM(remote_i, local) at the same time and skip it before POTRF if it has
been done earlier).
I flushed the memory cache as soon as I knew I could (i.e. after all
GEMM(remote_i, local) from remote_i are inserted, flush(remote_i) and after
GEMM(local_i, remote) flush(local_i))
Thus the algorithm is :
Fan-out :
For all local column block c1
** For all incoming GEMM on c1 from c2
**** If (not seen(c2))
****** For all updates from c2 on local column block c3
******** insert(GEMM, c2, c3)
****** flush(c2) /* c2 won't be sent anymore to "me" */
****** seen(c2) = true
** insert(XXTRF_TRSM, c1)
** for all updates from c1 on c4
**** insert(GEMM, c1, c4)
**** flush(c1) /* flush weither c4 is local or not */
wait_for_all()
Fan-in :
For all local column block c1
** For all incoming fanin on c1 from f2_1
**** insert(ADD, f2_1, c1)
**** flush(f2_1)
** insert(XXTRF_TRSM, c1)
** for all updates from c1 on c4
**** insert(GEMM, c1, c4)
**** if (c4 is fanin buffer)
****** contrib_count(c4) --;
****** if (contrib_count(c4) == 0)
******** insert(ADD, c4, target(c4))
******** flush(c4)
wait_for_all()
The good news is the obtained performances :
For example from 298 GFLOPS on a given test case on 16 nodes, I come to
402.94 with submitting the incomming GEMM at start and 400.62 GFLOPs with
inserting before POTRF (on Avakas cluster).
So the flushing at soon as possible versus flush all seems to be very
efficient (I don't think an other of my modifications could lead to this
performance improvement).
Cases which didn't succeed due to memory consumption succeeded now.
The moving of the receiving task insertion does not seem to have an effect
maybe when I'll have a window (I have a version inserting only ready tasks
from inside tasks but it is broken right now).
The bad news is that some time, when the number of nodes get large I get
failures, with fan-in and fan-out, when I insert incoming tasks before XXTRF
or at the beginning.
Insert at start, fanout, fanin : https://lut.im/43OAmwHu/NUg17aGf
Insert before XXTRF, fanout, fanin : https://lut.im/17JdWQN0/OEVOqMxz
It can be an assert error (assert marked on the results picture):
[starpu][_starpu_mpi_early_data_check_termination][assert failure] Number of
copy requests left is not zero
dsimple: ../../../mpi/src/starpu_mpi_early_data.c:45:
_starpu_mpi_early_data_check_termination: Assertion
`_starpu_mpi_early_data_handle_hashmap_count == 0' failed.
Have you get an idea of what could cause this assert ?
Or a deadlock can also happen (deadlock marked on the result picture), I
don't know if this can be linked.
(The "-" marked results are not yet runned or not launched (for the
10Millions test case I don't run with small number of nodes)
Regards,
XL.
Attachment:
signature.asc
Description: Message signed with OpenPGP using GPGMail
- [Starpu-devel] Assert : Number of copy requests left is not zero, Xavier Lacoste, 04/11/2014
- Re: [Starpu-devel] Assert : Number of copy requests left is not zero, Xavier Lacoste, 04/11/2014
- Re: [Starpu-devel] Assert : Number of copy requests left is not zero, Samuel Thibault, 04/11/2014
- Re: [Starpu-devel] Assert : Number of copy requests left is not zero, Xavier Lacoste, 04/11/2014
- Re: [Starpu-devel] Assert : Number of copy requests left is not zero, Samuel Thibault, 04/11/2014
- Re: [Starpu-devel] Assert : Number of copy requests left is not zero, Xavier Lacoste, 04/11/2014
- Re: [Starpu-devel] Assert : Number of copy requests left is not zero, Xavier Lacoste, 04/11/2014
- Re: [Starpu-devel] Assert : Number of copy requests left is not zero, Xavier Lacoste, 04/11/2014
- Re: [Starpu-devel] Assert : Number of copy requests left is not zero, Samuel Thibault, 04/11/2014
- Re: [Starpu-devel] Assert : Number of copy requests left is not zero, Xavier Lacoste, 04/11/2014
Archives gérées par MHonArc 2.6.19+.