Accéder au contenu.
Menu Sympa

starpu-devel - Re: [starpu-devel] [StarPU for stencil-based code: logic, starpu_shutdown error]

Objet : Developers list for StarPU

Archives de la liste

Re: [starpu-devel] [StarPU for stencil-based code: logic, starpu_shutdown error]


Chronologique Discussions 
  • From: Maxim Abalenkov <maxim.abalenkov@gmail.com>
  • To: starpu-devel@inria.fr
  • Subject: Re: [starpu-devel] [StarPU for stencil-based code: logic, starpu_shutdown error]
  • Date: Mon, 2 Dec 2024 11:35:02 +0000
  • Authentication-results: mail2-smtp-roc.national.inria.fr; spf=None smtp.pra=maxim.abalenkov@gmail.com; spf=Pass smtp.mailfrom=maxim.abalenkov@gmail.com; spf=None smtp.helo=postmaster@mail-wr1-f43.google.com
  • Ironport-data: A9a23:M0NLkagbiVwsoNmaXQjDkTeNX161yhQKZh0ujC45NGQN5FlHY01je htvWDvXPqreMWfweohyOdu/8hkBvZ/RydI1TQU9+CtkFyhjpJueD7x1DG+gZnLIdpWroGFPt phFNIGYdKjYaleG+39B55C49SEUOZmgH+a6UqieUsxIbVcMYD87jh5+kPIOjIdtgNyoayuAo tqaT/f3YTdJ4BYqdDtOg06/gEk35qir4mtC5gVWic1j5TcyqVFFVPrzGonqdxMUcqEMdsamS uDKyq2O/2+x13/B3fv4+lpTWhRiro/6ZWBiuFIOM0SRqkQqShgJ70oOHKF0hXG7JNm+t4sZJ N1l7fRcQOqyV0HGsLx1vxJwS0mSMUDakVNuzLfWXcG7liX7n3XQL/pGNG11HaE0ocdNBWBz0 +cKMw9KbUiyiLfjqF67YrEEasULKcDqOMYAoCglw22CS/khRp/HTuPB4towMDUY3JgfW6aDI ZBDMHwzN3wsYDUXUrsTIJc3jOatwHD1ejlVrlGSu4I45mHSyEp6172F3N/9I43UFZ4Jwh/Az o7A1yffJ00HDPKy8h6qsXaBgsDghCDLZrtHQdVU8dYx3QTLmT1NYPEMbnOgvfCjklP7V99BJ kg84TsrtaF09UqxT9C7UQfQnZKflhsVWt4VDPdjrQ/Rkezb5AGWAmVCRTlEADA7iCMobQQGl X+ls4jIPzhujZuOW2misbTMnDznbED5MlQ+TSMDSAIE5fzqr4cykg/DQ75f/Eid3oKd9dbYk 2DikcQuu4j/m/LnwElSwLwqqzelp5yMURRsowuLBSSq6QR2YIPjbIutgbQ60RqiBNfEJrVil CFb8yR70AzoJc/V/MBqaLtUdIxFH97fbFXhbadHRvHNDQiF9X+5Zpx36zpjPkpvOctsUWa2O xSL5lwBtMIIYyTCgUpLj2SZW5tCIU/IRYWNaxwoRoMfCnSMXFbYrHEwOhTIt4wTuBRzy/lgZ f93jvpA/V5BVP0/k2voLwvs+bAswS86yCvSQ5u9pylLIpLPDEN5vYwtaQPUBshgtP3siFyMr 753aZHWoz0BC7aWSneMoeYuwaUidyRT6Wbe8ZEPLrbrz8sPMD1JNsI9Npt9I9Y4w/4IyruWl px/M2cBoGfCabT8AV3iQhhehHnHAP6TdFpiZXdwb2W7kWMue5iu56o5fp46N+tvvu96wPI+C 7FPd8ycC74dAn7K6hYMX6nb9YZCTRWMgR7RHiyHZDNkQYVsaTaU8fDZfyzu1hI0MAyJieUEr YaN6CbnULsYZgE7DM/pePOllFywmn4GmdNNZUjDI/gNWUC18IFVNDD91d4nB/49cTDSmz2Qj VeQCzgluNiX8pMU8cbIt4+AvYyGA+tzJWsEPmj5vJKdFzjWwXqn+qBECN23RDH6UHjm3YmTf sNXxOHYHNxeu3gSqKt6MbJg7Zxm1uvVv7UAkzhVRiTaXWqkGpZLAyeg3/AWkoZv27UAmw+9e nzXy+lgIb/TZf/USg8AFjEEMNaG++ofwATJzPIPJ07/2i970ZyHXWhWPDiOkCZtF6R0Aqx03 dYevNMq1CLnhioIKtqmigVmx1aIJFEEUIQlscg+K63vgQwJ1FpDQML9Dgnb3ZKxUOhPY3Iae mKsuKn/hrpn1hXjdVg3HiPzxuZzv8kFly1L614gHG63vOT5qMU55iAMzgRvfD9plk1G98lRJ llUM1ZEIPTS3jVw2+lGcWOeOyBAIxy7/EbO5UMDvzDbRRPwV0jmDm40CcCS9m83rkNeeTl6+ umD6WDHCDzFQuD47hEQa2VE9cPxbIVW3RLQveyaBOK5JokeTRu5p76xdEwKhgDCA8hstHbYp OJvwvl8WZf7OQEUvac/LYuQjpYUdzypO01ARuNH7ooSPGSBZgy35yeCG3qxduxJOfbO102yU O5qB8BXUiWBxDS8lS8aCYEMMo1LsqYQvvRaQYzSJEkCr7e7hRhqusiJ9iHB2Ukac+83msM5c o7sZzaOF1KLvkRtmkjPkdJlP1SpatxVdSz+2+GIqN8yLawhi901U08O0eqThU63YS9H5BOfu T3Ra5DGl9JCzZtepKqyM6FhKTjtF/bNerWpyi6Rve5KT+vzCubVlgZMqlDYLwVcZrQQfNJsl IWyit383WKbnbM6T1Hmn4KlEo9X7/6TR8tSCNr8d1NBrBuBWejtwhoNwH+5Ipp3i+Fg5tGra g+7Scmof/sXZot56FhKTRNBSjAxJr/SbKjygQ+c9dG3FQk7wwjLCPiF5E3ZRzhXWQFQMqKvF zKuneil4+5pibhlBTgGIqpDKIB5KlqyYpkWXYT9mhfAB1b5n27YnKXpkCchzjT5CnOkNsLey rCdTzjccCWCgo35/Ot7gadT4CJOVG1chNMudH0z49R10jC2LFAXJNQnbKkpNMtmrTzQ5rrZO hf2c2oQOQfsV29lcDL9wujZcCWxO+gsAur9dxsVpx66SiHvCI25Ve4rsm8q5nptYTLswd22M dxUqDW6IhG1xYovXuoJoOCyheB83P7B23YU4gbHntfvBwoFS6A/vJC78NGhiQScey0MqKnKG YTxbWVNQUX+VlSoVMg8IDhaHxYWuD6pxDItBctKLBAzpK3DpNCsCtWmUw0w7lHHRMsPLb8KA 3jwQgNhJkiIj2cLt/JBV80B2MdJ5DHiIiR+BKDmTAwW2aq37wzL+i/EcTUnFKkfxeKUL78Re vRALZTz6IRp5X29AIGr9Dg=
  • Ironport-hdrordr: A9a23:T7ESUqneM4m3gUwHHX868VNnwoXpDfII3DAbv31ZSRFFG/Fw9v re+cjzsCWftN9/YgBHpTntAtjjfZqYz+8P3WBuB8bZYOCWghrNEGgK1+KLqEyCJ8S9zJ8l6U 4JSchD4bbLfD9HZKjBkXGFOudl7t3C3q7Av4jjJrRWIT2CqZsM0+60MGim+zVNNXR77FMCda ah2g==
  • Ironport-phdr: A9a23:X2Dd6hbs3TFDW7Zuq6C7mpP/LTGM2YqcDmcuAnoPtbtCf+yZ8oj4O wSHvLMx1waPANSQsawMy7KP9fy7ASpYudfJmUtBWaIPfidGs/lepxYnDs+BBB+zB9/RRAt+I v5/UkR49WqwK0lfFZW2TVTTpnqv8WxaQU2nZkJ6KevvB4Hdkdm82fys9J3PeQVIgye2ba9vI BmsogjcuNQajZFiJ6s1xRfFvHpFcPlSyW90OF6fhRnx68ms8JJ57yhcp/ct/NNcXKvneKg1U bNXADM6Pm4v48HlqQfNRhaV6HsGVWUWnBtIAwzb4xz/Q5z8rCj0uPdj1SeDJcH5Qqw6Vjqk7 6dwVR/nkzwHOCIj8GHWkcN/kqRWqw+8qhNlwo7UZIaVNOdifq7YYNgXS3ZNUtpXWidcBI63c okBAPcbPetAr4fyu1QBowawCwmiGu3gyDxGiHjt0KIgz+ghFBvL3Aw8E98MtnnfsdX7NL0VU eCw1KTGyi/MbvxX2Tf49YPFbgsuruuIXb1ud8rRyk4vFx3YhViXrIzqJTyU1uUIs2SB9eVvT vigi2o5pAF3oTivwdksh5LGhoIQ0F/E9CF5zJwpKt2/TU52eNipG4ZfuC+GLYV5WN8iQ312t yYgzL0LoZ22cTQKxpk7xxDRa/yJfYeH7x/sW+ucPTR1iW5kdb+ihhu88UqtxO3+W8Wp31hGs CRInNfDu30T1BHe9MeJR/l780y81ziP0AXT5ftFIUAyjafUMYUuzaQ0lpUNt0TDHSj2mFvsg K+LdUUo4ueo6+X7YrXmu5+TLJV4hR35MqQrgsC/AOI4MhIPX2eB4+i82qfj8VXiQLVKlPE5i bTZsJHeJcsGvqK5HxVa0p0g6xakFzimy9UYnHgZI11dfxKHkZbmO0vOIP/mF/iwnk6gkCxrx /3AI7bvAY3NI2DdnLv9ebtx8U1RxQopwdxB+Z5YF6sNLf39V0PpqtDXFAM1PBKxzur6Fdpw2 YETVGeSDaOHLK/fv1qF6f4zLOWQeIMYvSvyJvck6vHyiHI2hEIRcKq30pYSdXy1Gu9pLFmcb HXxnNsOC3wFswwjR+H0jF2CTCVTZ2qsUKIm+z87CYOnApnbS4yxhrGKwT21EYdMZm9cD1CBC XfoeJuAW/cLcC+SJ9Vukj0AVbS4RY4hzwyiuBb0y7doIOfY4CIYtZXk1Nh64+3ciws++iBzD 8SYy22NTmd0kX0URzIuwqxzvUhwxk2A3KV4mfBUC8Jf6+5TXgo1O5PQ1+l6BMrzWgLFcNeJU lGmQtC+DDE0VN0wzN4OY0NnF9q4lR/MxS6qDKETl7yIA5w07LnR33f0J8lnynbG0LMtgEUhQ stKLWGmgLVw+BDJB4HVi0WZi7qqdaME0SLQ7GuD13aBvFlEUA5sVqXIRWsfZlfMotTj/kPCU qaiBqk5PQtf18GCLrBHasbpjFVDXPfsIs7SY2O3m2eqBBaH3KmAbITwezZV4CKIE1QNiRgOu HqLKwU6LianuWPXSjJ0Rnz1ZEa51OBuqXbzZ0U9ywyMbEp/n+6w8w4UgLqVQfoX07QHvzkJp DB9HVL71NXTXYnT7zF9dblRNItuqGxM0njU4lQV1v2IKqljggRbaAFrpwb00B4xDIxckM8sp XdszQxoKKve3kkSPyiA08XWPbvaYnL34Ajpc7Tfj1TXwNeTvK4F4fA1rVbkoimmE0Mj9zNs1 NwGm2CE6MDyBREJGYn0Tl5x8hF7o7/AZSxo4orE1HgqOKO9tjLE2NU3LOQgwxekOdxYNfDMD xf8RusdAcXmM+k2gx6pYxYDafhV77IxNti6euGu3aeqOKN/hmvjgzgcpo961U2I+mx3TeugM 48t5fae00PHUj79iA3kqcXrgcVeYjpUGGOjyC/iDYoXZ6tofI9NB338a8uwjs5zgZLgQRs6v BaqGk8G1cm1eBGTc0213Ate0l4SqGCmni3wxiJ9kjUgpK6SlCLUxOGqeB0CM29NDG5s6DWka Y29n9EfGkGiZgwkkhii+271wqFaoOJ0KGySCUZEci7qLn1zB7Oqv+nnAYYH45cpvCNLFeWkN ArCG/it/l1AiXqlQzANoVJzPyunsZj4gRFg3WeULXIp6WHcZdk13xDUotrVWf9W2DMCAih+k zjeQFambLzLtZ2ZkYnOtuemWievTJpWJGPuxJOBsW2y72RuDBu8mO6bldjuEAx82ij+nYoPN 22AvFPnb4/n2r7ve+dubkhtQlb24s53HIJ5jKM/gZgR3T4RgZDfrh9l2S/jdN5c36z5dn8EQ zUGlsXU7AbS00pmNnuVxoj9Wx1x2+NZbsKhKiMT0yM5tYVRDbuMqaZDlm1zq0a5qgTYZb58m C0cwL0g8ixSj+YMsQsrhiKTZ9JaVUxRLS3p0R2C5tSzqqZQek6gdLGx0Ax1mtXpALyZow5aU Wr0Yd94RX42vpg5agiVliGoooj/HbuYJcoerBiVjwvNg6BOJZQ9m+BLzStrNGThvGE0nusyj Bhgx5a/78CML2Rg+r78AwYNbGWkIZNOvGu31OACxZXzvcjnBJhqFzQVUYG9SPupFGhXrvH7L 0OUFzZ6rH6HGL3ZFAvZ6UF8rnuJHYr4UhPfbHQf09hmQwGQYUJFhwVBFjAzjp88UAmgzcjsd kR++Bge41f5rl1HzeciZHydGi/P4RylbDs5Us3VKRNG7wQE7k3cNcGY7eVtNy5d95yl6geKL ybIAmYARXFMUUuCCVf5O7Co7tSV6OmUCN21KP7WaKmPo+hTBL+YgIii2Yx883OQJ92CazN8W uYj1BMJDhUbU4zJ3i8CQCsNm2fRYt6H8V2irzZvoJn38ey3Clmyo9LeU/0IbYopo1fs3e+CL 7LC2no/c20DkMpSnTmQj+FOuTxawyB2K2vzT/JZ7XSLFOSI3fUPRx8DN3Etao0Ss/N6jlELY YmB0pv0zuIq0aRzUgsDDA273JnuPJxvQSn1NUubVhnXcu3ceHuThZmwOP30SKUM3rwM51vp5 mndQwm7eW7a3zjxC0L2broK1XDHekQY4MbkLHMPQSDiVI60MEXqdo8qy2RskftswSqVfW8Eb WomKh0L8+3WtHICxK04Qj0J72I5f7PdxWDDtLieccxQ6bwyU0EW36pM6XA+gdO59QliQ/p40 GvXp99q+RS9l/WXjyBgW1xIoypKg4SCuQNjP7/Y/99OQySM+hVF9miWBxkQwrktQtTyp6Bdz MTOn6PvOX9D9dzT584VG8nTLoqOLnMgNRPjHDOcAhECSHanMmTWhkoVl//3lDXdtp8htp3lg 4YDUJdeXV0xU+wAUwFrQIdEL5BwUTcp17WciY9A5HazqgXQWNQPvp3DUaH3Y72nIzKYgL9YI hoQlOmgfMJDa8uhgRUkNgIp+eaCU1DdVt1MvCB7Owo9oUEXtWN7UnV2wUXuLAWk/H4UE/ew2 B8wkAp3J+o3p1KOqx86IETHoCwonQw/g9Lg1HqUeS7wKeG5V4hSBiz3sFYZPZbyQgIzZgq31 x8BVn+MV/dKgr1seHo+whfboodKEOVAQLdsZRYRwbSIfaxt3wgH7CqgwkBD6K3ODp4ox25IO da86nlH3Qxkdts8I6fdcbFIwlZnjaWLpia01+o1zWf2wm4C9WqTfGgDv0labtHOyAKn++Vt7 UqJnD4RIQDkttIvq/Nus1wnYqGOk3mm3LlEJUS8ceeYKvHB01U=
  • Ironport-sdr: 674d9b71_3vm5W6OKl6+VLZAljzLELMfJc3Wpwniyzd31660P9PRhr/M pr31ikdq/55Z+kNYIlg2xnE5P9htjUg0IGxaD5A==

Dear all,

How are you? I hope all is well with you.  I understand that your resources are limited and as many of us you are working under pressure. But if possible, would you please find half an hour to look at my code and advise me on what am I doing wrong? Thank you for your help and have a great day ahead!

Best wishes,
Maxim

Maxim Abalenkov \\ maxim.abalenkov@gmail.com
+44 7 486 486 505 \\ www.maxim.abalenkov.uk

On 5 Nov 2024, at 17:22, Maxim Abalenkov <maxim.abalenkov@gmail.com> wrote:

Dear all,

I hope all is well with you. Let me please share the link to a GitHub repository with my code:


The wrapper routine Engine::profile_run calls the six phases of computation at each iteration of the time loop:


1) Intercept phase:


2) Overland depth phase:


3) Infiltration phase:


4) Diffusive routing phase:


5) Outlet + Storm phase  // currently sequential:


If possible, can you please spend 15 minutes to review my StarPU code and provide insight on what I could do to improve parallel performance? Thank you very much for all your help so far! Have a wonderful evening ahead!

Best wishes,
Maxim

Maxim Abalenkov \\ maxim.abalenkov@gmail.com
+44 7 486 486 505 \\ www.maxim.abalenkov.uk

On 5 Nov 2024, at 12:33, Maxim Abalenkov <maxim.abalenkov@gmail.com> wrote:

Dear all,

Let me please describe my finite-difference code in more detail.

There are four computationally expensive phases of the code. Each of them is formulated as C function. These functions are called at every iteration of a time loop.

for t in 0..t_max {
    comp_intercept_starpu
    comp_overland_depth_starpu
    comp_infiltration_starpu
    comp_diffusive_routing_starpu
    comp_outlet_starpu
    comp_storm_starpu
}

The code uses multiple two-dimensional arrays with data values. These arrays (as common in C) are flattened into one dimension. A data array is an m x n  matrix, where m — no. of rows and n — no. of columns. When flattened an m * n matrix becomes a vector of length m * n. To divide the data into blocks I divide my arrays along the first (row) dimension only. The second (column) dimension remains constant and is equal to n. Please see an attached sketch for details on data arrangement in arrays.

The first three routines (intercept, overland_depth and infiltration) can benefit from applying StarPU’s block filter and submitting tasks on these blocks. This situation is depicted in the leftmost figure on the sketch. 
Diffusive routing routine is the most complex and needs to omit the last row and the last column from its calculation. To make it work I created my own data structure called a matrix block. I manually divide the data arrays into these matrix blocks, manually register them with StarPU and submit tasks. This access pattern is illustrated in the middle figure in the sketch.

Finally, there are two low effort routines: outlet and storm. Their data access pattern is shown in the rightmost figure on the sketch. It is probably not worthy of parallelisation. But my final goal is to enable distributed-memory parallelism via StarPU-MPI. It may be beneficial to register these data points with StarPU and compute all outlet and storm values on a single MPI rank. I hope this makes sense.

May I please ask a few more questions:

1) Is it normal to have multiple data views registered with StarPU (‘automatic’ blocks created with a StarPU vector filter), manual matrix blocks manually registered with StarPU and finally a handful of data points manually registered with StarPU? Will these multiple views conflict with each other and degrade the overall parallel performance?

2) Do I understand correctly that it is best for StarPU performance to create and register all of these data partitions before the main loop over time and unregistered after the main loop?

3) What else can I try to improve performance of my code? Currently, everything apart from ‘outlet’ and ‘storm’ routines is ported to StarPU. I’m using starpu_pause and starpu_resume before and after calling the ‘outlet’ and ‘storm’ routines.

I will share my code in a public GitHub repository and send you a link in a very near future. Thank you very much for your attention and help so far! Have a great day ahead!

Best wishes,
Maxim

<data_arrangement_access_patterns.jpg>

Maxim Abalenkov \\ maxim.abalenkov@gmail.com
+44 7 486 486 505 \\ www.maxim.abalenkov.uk

On 5 Nov 2024, at 09:46, Maxim Abalenkov <maxim.abalenkov@gmail.com> wrote:

Dear Samuel, Philippe et al.,

How are you? I hope all is well with you. I re-compiled StarPU and ran the ‘task_size_overhead.sh’ script. Please find its output attached. Does it mean that for my system task sizes of 512 elements and above will be beneficial? In my StarPU experiments I have been using tasks of size 488. However, the performance I see is far from optimal. Thank you for your help and have a great day ahead!

Best wishes,
Maxim

<tasks_size_overhead.pdf>

Maxim Abalenkov \\ maxim.abalenkov@gmail.com
+44 7 486 486 505 \\ www.maxim.abalenkov.uk

On 4 Nov 2024, at 14:03, Maxim Abalenkov <maxim.abalenkov@gmail.com> wrote:

Dear Samuel, Philippe et al.,

How are you? I hope all is well with you. Thank you very much for your suggestions. I tried:

a) using starpu_pause, starpu_resume around sequential part of the code
b) using starpu_data_acquire, starpu_data_release —//—
c) examining the task size overhead on my system with the ‘task_size_overhead.sh’ script

find -name tasks_size_overhead.sh
./tests/microbenchs/tasks_size_overhead.sh
./starpupy/benchmark/tasks_size_overhead.sh
./build/tests/microbenchs/tasks_size_overhead.sh
./build/starpupy/benchmark/tasks_size_overhead.sh

The a) and b) didn’t not bring any performance gain. I run my StarPU code with

  STARPU_NCUDA=0 STARPU_NOPENCL=0 STARPU_NCPU=8 STARPU_SCHED=dmda python ./app.py

Python script ‘app.py’ is a simple wrapper pre-processing the input data and launching the main C + StarPU executable.

I also fail to run the shell script c). It crashes because it is unable to find the executable ‘tasks_size_overhead’. I couldn’t find it either using a ‘find’ command. Do I need to configure and compile StarPU with some extra options to make the executable ‘tasks_size_overhead’?

I’m using StarPU 1.4.7 @ macOS Sonoma 14.7.1 with an M2 CPU.

I will share my source code in a public GitHub repository for you to see in the very near future. In the meantime I’m going to convert the final phase of my finite-difference code to StarPU and test performance again. Thank you for your help and have a wonderful day ahead!

Best wishes,
Maxim

Maxim Abalenkov \\ maxim.abalenkov@gmail.com
+44 7 486 486 505 \\ www.maxim.abalenkov.uk

On 11 Oct 2024, at 16:13, Philippe SWARTVAGHER <philippe.swartvagher@inria.fr> wrote:



Le 11/10/2024 à 14:43, Samuel Thibault a écrit :
No, that should not really be a problem. But how large your tasks
are? See the checklist item about task sizes:
https://files.inria.fr/starpu/doc/html/CheckListWhenPerformanceAreNotThere.html#CheckTaskSize
That being said, you will probably want to put starpu_pause() /
starpu_resume() around your not-yet-starpufied code, otherwise StarPU
monopolizes all CPU cores, which can thus degrade the performance of the
non-starpufied code.

If your non-StarPU code accesses data managed by StarPU, you should also call

starpu_data_acquire(data_handle, STARPU_R /* or STARPU_RW */)

before your non-StarPU code reads (or writes) the StarPU data. This makes sure the tasks which manipulate data_handle are finished. Then, to tell StarPU you are finished working with this memory and it can start other tasks using this handle, you have to call

starpu_data_release(data_handle)

Have a look for instance on this example: https://gitlab.inria.fr/starpu/starpu/-/blob/f3e318b326666c5b279680d5a57fc0468e8c1876/examples/filters/fvector_pick_variable.c

Or, instead, you can just call

starpu_task_wait_for_all()

but depending on what your application does, you may not need to wait for all tasks to finish to start reading/writing one piece of data.

Not sure if this will solve your problems, but it was missing from your description of how your program uses StarPU.

--
Philippe SWARTVAGHER
Assistant Professor @ ENSEIRB-MATMECA
Research team Topal @ Inria Bordeaux-Sud Ouest







  • Re: [starpu-devel] [StarPU for stencil-based code: logic, starpu_shutdown error], Maxim Abalenkov, 02/12/2024

Archives gérées par MHonArc 2.6.19+.

Haut de le page