Objet : Developers list for StarPU
Archives de la liste
Re: [starpu-devel] [StarPU for stencil-based code: logic, starpu_shutdown error]
Chronologique Discussions
- From: Maxim Abalenkov <maxim.abalenkov@gmail.com>
- To: starpu-devel@inria.fr
- Subject: Re: [starpu-devel] [StarPU for stencil-based code: logic, starpu_shutdown error]
- Date: Tue, 5 Nov 2024 12:33:02 +0000
- Authentication-results: mail3-smtp-sop.national.inria.fr; spf=None smtp.pra=maxim.abalenkov@gmail.com; spf=Pass smtp.mailfrom=maxim.abalenkov@gmail.com; spf=None smtp.helo=postmaster@mail-wm1-f42.google.com
- Ironport-data: A9a23:YMGYIK4foVWstdiBCSWHFQxRtAXMchMFZxGqfqrLsTDasI4TYw2z/ BJLCC3XVa7IIDOrLIwrPdPymhtV6sOWjbkjGUYo+GtrVDRIscWt6b+xJE2gM3vPcMecFhw24 pkSY4maJc45HyTX9kb3P7HtpyUljPjZGOGsVOLOZ318TgVoFHx+2ENqluRh39M02YXhU2th1 T+ST+j3YTdJjBYubTxKt/LrRGpTgcnPVBMkUn0Wb6wa5gfTxiVEXM9OKazrfnClHtAPFLPqT rmawOG39TPw8kZ2ALtJsJ6rKxxQGua60Sum0ycNBfD62nCuggRoj87X4dJFMR8/Zw2hxow3k pMX3XCJYV9BFrXWn+gAWAVvHSh7PKlXkJfKOnHXXfa7liUqSFOyhaw3ZK0KFddAoLouWDkTr adwxA0lN3hvucrmmNpXdcE33qzPHOGzVKsDt3dpyy3uDPpOafgvlI2XjTPw9G5YavFmRZ4yV eJBAdZcREiojyl0B7siIMlWcNFEKZXIW2YwRFq9/cLb6oVIpeB7+OCF3NH9IrRmSSjJ96oxS 62vE2nRW3kn2NKjJTWt836gnu6VzQXAfYNKF5ng+8Rm202K/zlGYPEWfQPTTfiRj0e/X5dAL hVR9HZ066c180OvQ5/2WBjQTHys5EZNHYoNVbRguEfRkMI44C7BboQAZjtIeN0j8sY8TDgn0 FyOgfvmADVutPueTnf1GrK89GvqZnhLcT5fDcMCZSAV4oLor4YhtzngE8RCOq6xi9OoPz6ll lhmqwBl2uxL0p9Vv0mhxnjcnzu2voWMQgMr6wH/RXOg9go/ZYi/ZoXu50Kz0BpbBIOQT13Eo 31d3sbCtqYBCpaCkCHLS+IIdF202xqbGDzQjUZuRckHzBGso32HJY1q7GpuPX48Z67oZgTVS EPUvApQ4rpaM32rcbJ7buqN5yICnfiI+TPNBqC8UzZeXqWdYjNr682HWKJ992XkkUxpjqRmf JnHK4CjCnEVDakhxz2zLwv87VPJ7nFnrY8wbcmkp/hC7VZ4TCDPIVviGAXUBt3VFIve/G3oH y93bqNmMSl3XuzkeTXw+oUON10MJnVTLcmp8JQIKrPbclc9RzhJ5xrtLVUJK90Nc0N9xragw 51BchYJoLYCrSSYeVXUNSo7AF8Rdc8u9iprY0TAwmpEK1B4PN/3s/ZBH3fGVbYg8+NnwLZ1S fJDE/hs8dweIgkrDw81NMGnxKQ7LErDrVvXY0KNPmJjF7Y+HFeh0oG/IWPSGNwmVHvfWT0W+ OX4jluzrFtqb1gKMfs6n9r0kgvp5iFGyb4tN6YKS/EKEHjRHEFRA3SZppcKzwskcH0vHxPDj 13KUyQL7/LAuZE0+9TvjKWJ5dXhWehnE0YQWyGR4b+qPGOItiCu0K1RYtauJDr9bWLT/Ln9R ON3y/qnDuYLsmwXuKVBEpFq75kE2f3Rm5FgwD9JIlD3fnWwK7Y5InC5zchF7aJM4blCuDqJY EGE+/gEGLDQOMraD0IdflI1StuyjdQVxz/Yt6UzKmrH+R4traamUFpTDTaImidyPLt4C6J74 OYD6eo9yR2ztQouCfmC1htrzmWrKmcSdokWrbcYC5/Ppity7X8aer3aKCv9wK/XWuV2KkNwf wOl3vvTtYpT1m/pUiQVF0GU+cF/mJ5XmhRB7GFaFmSzgtCf28MGhkxAww8WEDZQ4A5Mid9oG 25RMEZwG6WC0hFoiOVHXEGuAwtxPwKYyGOg12o2kHDlcGfwWlzvNGEdPcO/zHId+U9YfRlZ+ +i840ThWjDIYsrw/3UTXWhIlv/dduFypzbywJ2fI8e4HpcBcWXEhI2qbjE2sBfJO546q3DGg uhIx9xOT5PHGxQemIABMLnC548sEEiFAEdgXcBe+Lg4GDCAWTOqhhmLBUODWuJMAP3o90WHJ dRkDZ9NXU7m1QKljDMSNYgTKZBawd8r49sjfOvwBGgk6rGwkBtggKjyxAPf2lA5YoxJutkvD 6/sbBS+K3y0qVoIvn7Sve9GF3GdY9JZVDbj3euwzvoFJ6gDvM5ob0s29Lm+5FeRDydK4DOWu xHldYbN7uk/17lpoZTgIp9DCyqwN9n3cuaCqyK3ktZWaOLwIdX8jBwUpnbnLjZpE+MoAfovr ou0sfnzwE/hl5Q1WTqAm5C+So941f/rV+9Tas/KPH1WmBWZY/DV4jwBxnuZLKJYm9YM9+ilQ AqFMPGLT+A3YOsE5nNpaHl5KS0/WoDXdabroB2vo8ucUiY91RP1F/L51HvLQ1wCSAo2FczQN gvGtcyqxOhkl6VXJRpdB/hZE55ye1Difq09duzOjzqTD0j2o1aOpordkQEE7BfVAEKlC+f/2 4rOHTLlRSSxuYbJ7dBXiJNztRspF0RAgfE8U0Ye2txugRW4MTIiAcECF64ZU7d4vzfX1p7qQ B3sNk4ZFjTbTzBIVT7e8ebTdF6TKcJWM+ioOwFz2V2fbhmHIb+pAZxj03xFyGh3cD6y99OXA 4gS1VOoNyfg36wzY/gY48G6puJVxvn64HYs0mKlmuzQBycuO5k75EZDLiFsCxOeS9rskX/VL 1cbXWpHGUG3aXDgGPZaJkJ6JksrgyPN/R4JMwG039fti6eKxrZhydr+Gd3J/J8tUcAoHIMKF FTLHzajwmbP1nIq7P5j/5pjhKJvEvuEE/SrNKKpF0VYg6i07X9hJM8Y2zYGSMY54gNEDlfBj X+W7mMjAFieYlVksFFMJd7lJ7oqOp7NM93IsOI7jTrPkBh80N2APhb2lkT0Lpb/r6WltEJdK NvXgIB9vHXO3AYIZxEn3hjYmrBDKc4UHHjAFCsvS/seVz+yHXRFGukJP14SjrpsHb4t+mmQX KmU2tQ5vomgW0tjGuwicMgxOtBaklvKJujPZ3IFMv6m+htcGb4Xq+qEF4s=
- Ironport-hdrordr: A9a23:Ba9lEaHaRSJYaZIypLqE38eALOsnbusQ8zAXPiFKOH5om6mj/f xG885rsiMc5AxhO03I3OrwQJVoP0msjqKdkLNhWItKNTOO0ADJEGgh1/qF/9SJIUzDH4VmpM NdmsZFeaTN5JtB4foSIjPULz/t+ra6GdiT9J3jJr5WIz1XVw==
- Ironport-phdr: A9a23:THRe/RI1lhOmn9L3FNmcuHpvWUAX0o4c3iYr45Yqw4hDbr6kt8y7e hCEvLM11BSQBM2bs6sC17GO9fi4GCQp2tWojjMrSN92a1c9k8IYnggtUoauKHbQC7rUVRE8B 9lIT1R//nu2YgB/Ecf6YEDO8DXptWZBUhrwOhBoKevrB4Xck9q41/yo+53Ufg5EmCexbal9I Ri4sAndrNUajZdtJqos1xfCv39Ed/hLyW9yKl+ekQrw6t2s8JJ/9ihbpu4s+dNHXajmcKs0S qBVAi4hP24p+sPgqAPNTRGI5nsSU2UWlgRHDg3Y5xzkXZn/rzX3uPNl1CaVIcP5Q7Y0WS+/7 6hwUx/nlD0HNz8i/27JjMF7kb9Wrwigpxx7xI7UfZ2VOf9jda7TYd8WWWxMVdtXWidcAI2zc pEPAvIBM+lEoIbzulUArRWgCwauB+3hzSRFiWXq0a0/yekhER3K0Q4mEtkTsHrUttL1NKIKX O6y1qbJwjTDb/ZL0jrh84fHaBQhru+KXbJzdsrR11QkGgTAjlqKpo3lPjaV2v4XvGeA9eVgU fygi2gkqwF2vDii3cgsiozTiYIUzlDI7zl2wIEwJdChTkNwfNGrHodKuS6AK4t2Xt0tQ3tuu CsiybALpIK3cSgWxZkj2hPSavyJfoqG7x/9V+ufISt1iG9kdb+9hRu/80aux+P9W8S6zVpHr DZInsXMuH0C1BHf9M6KQeZz8Eem3DaAzQHT6udcLE8ujqXUMZohzaA2lpoQtkTDADX6mFj1j KOOd0Uk/PKo6+X9bbn8qJ+cLZd4ihr+Mqg0gcO/HOU4MgwTUGSD5eSzzqbj/U7iQLpSlf02k 6jYv47CKsQcuK62HRVV0oY95BmlEjiqys8YnXkBIVlYexyHl5DkN0/SLP38F/uygFShnC12y /zYPbDtGJrAImbbnLv8ebtw5VRQxBcuwd1d/Z5YFKwNLf3pVkPvu9HVARk0OBGqzubjFdV9z Z8eVnyVAqGEMaPTsEGH6/ozL+SKeYQboizyK+I/6P7rlXI5mUESfa2u3ZYPbXC3BPVmI0GAb Xvih9cNDH4GvgQxQeD0klGCXjlTZ3G9X6I4+D43Ep6pDYDGRoy1gbyB2jm0HoFOa2xYFlyBF W3keoaEVvsWdS6ePMxsniYLWLS/U4Mh0AuhtA7+y7ppNOrU/SgYuIrm1Ndv6OzTlBQy+SZuA MuGyW6NS397nmULRz8xwqByukN9ylKZ3qh5h/xUD8Bc5/RMUgsiM57T1PR1C8ruVQLZYteJV FGmT827DT0pVNI+38cOY1phG9Wllh3DxDaqDKUPl7yTGZw467/T0GbvJ8ljz3bLz7IhjlkjQ stXL2KqnK9/9w7JB47IiUqVjaiqdb5PlBLKoXyfxHCWoQRUXRB9VY3EW2oebw3Yt4fX/ETHG pqnGbUieiJLwMqPLKhNdJW9iFxWRfKlMd7XZ2+3mGCuLRmNz7KIKoHtfjNOj23mFEEYnlVLr j69Pg8kC3L5y4q/JDlnFFa0Jljp7fE7s3SjCEk90wCNaURlkbuz4B8cw/KGGLsIxrxRniAnp n1vGUqlmcrMAo+FrhpgeeNVa9o57VFE0nzxuAl0P5jmJKdn1RYFawoih0r1zF1sD5lY180jr Xck1g13fKeRwFJEMTqS1JT9PL7QNEH9+Rmub+jd3VSNmM2O9PIp7/I1407moBnvFkcm9CB/1 MJJ1nKH+pjQJA8bUJa0TUNushYm/ffVZS4y446S3nppWUWtmhnF3d9hRO4syxL7OsxaLLvBD wjqVcsTG8mpLuUu3VmvdBMNeu5IpuYyOIu9ev2K1bTOXq4olS+6jWlB/IF2016dvyt6ROnS2 p8ZwvaelgKZXjb4hV2lv4j5g4dBLT0VG2O+z2DjCuszLuV3dJYKDCGnKcSzy9N9gIDFVHtR9 VrlDFQDmYeodReUc13hzFhIz01ExB7v0SC8zjFyj3Qotv/FhH2Ik7mkLUBXfDIRFwwAxR/2L IO5js4XRh2tZgktz16+4FrigrNcvOJ5JnXSRkFBe273KXtjW+2+rOnnAYYH5ZU2vCFQSOn5b 0qdT+u3phIK0ialFmFXwD09dDa0kpr8lh1+zmmaKTwgyRiRMdE13hrZ6NHGELRU1yoHQW9xg zDTC16/OMeB8tCdlpOFuee7HTHENNUbYWzgyoWOszG+7GthDEiknvy9rdbgFBAzzS7x09QCu TzglB/necGr0q27NbkiZUx0HBrn7NI8HIhik4w2jZVW2H4Ah5zT82BV2Wv0NNxa3+r5Yh9vD XYOys7U5U7s0kRnI3SKwJzRWXCUw88nbN6/Km8bwSMy6clWBbzctuQV23so5APi/USNO7B0h VJ/gbM25WQfgv0VtQZl1SibDr0IXAFZMSHqixWU/oW7paRTanyodOv4305/kNa9SbCa91sEC TCpJ9F7R382sp8sVTCEmGf+4YzlZtTKONcatxnP1gzFk/AQM5Up0PwDmStgP2v5+3wj0e8yy xJ0jvTY9MCKLXtg+KWhD1tWLDrwMokW8yHsiOBXlM+S0oasGYtJFTACXZ+uRvWtWmF317yvJ 0OVHTswp23OU7/WDQ6ark5spnbCFZOsKVmYIXAYyZNpQxzXdyk9yEgEGT49mJA+DAWjwsfsJ Vx46j4m7Vn9shJQy+hsOkq3QiLFqQyvcDtxVImHIU8c8FRZ/0mMe5/7jKo7D2RC85amtgDIN mGLe1ECEzQSQkLdT1H7Yuv1uJ+ZorDeXLbhaaOJO+nGqPQCBanUg8j0ic0/oW7KboLWbxwAR 7U6whYRAy4/QpyD3W1JE2tNz2rMd5LJ+kn6oHEm6JDntqysAlqn5JPTWeQId4wzvUnn2+Hbc LfA4UQxYTdAisFTmTmRkuVZhBhKzHgwPzi1TeZZ7X6LFf2P3P8RV1lBMmtyLJcatv1nmFAcZ YiDzIuyj+Ad7LZ9Ck8ZBwa5y4f5OIpTeTH7bBSeWw6KLOjUf2SVhZynJ/rtE/sIy7wF/xyo5 WTBShGlZG/S0WKzEUjoaLAp7mnTKhVavMvVng9FL2/lQZqmbxS6NIQyljgq2fgvgWuMM2cAM D96ekcLr7uK7CoejO8tU2pGpmFoK+WJgUP7p6HRN4oWvP13Ay91i/MS4XI0zKFQ5T1FQ/o9k TXbr9pnqVWr2ueVzT8vXB1LozdNzIWF2Ccqcb3e7YVFUG3Y8Qgl6GyRD1EVooIgBIG1/a9Xz dfLmeT4LzIDu9PY8M0ABtTFfcKKNH1yVHihUDXQDQYDUXuqLTSF3x0bwKzUryfF6MRg+f2O0 NIUR7RWVUI4DKYfA0VhR5kZJYtvGygjifidhdIJ4ny3qF/QQt9bt9bJTKH3Y72nJTCHgL1Df xZNz6n/KNFZM4Tr2kkkY1NzmInMGE3Odd9IqyxlKAQzpQ8elRo2Bn12wE/jZg63tTULEuWom xctlgZkSeEk9TOp/FJuY1SW/m0/l04+ndijijeUOm2UTu/4TcRdDCz6sFI0O5XwTlNubAG8q kdjMS/NW7Nbi7YInYFDjQbdvd5QHKcZQ/EeJhAXwv6TarMj1lEO8k1PKmdI4OLEDd1pkw54K PZESlpP3gtiaJg+IqmCfMJ0
- Ironport-sdr: 672a108a_Ey/7q4aXV6hMQU/s0YCQ+t7gOawVO4J2K4TXyWHBO8sKCw9 cq6yh8iz8a7+mw/Dsl9ty+Q051cMKIVO6pQxxSw==
Dear all,
Let me please describe my finite-difference code in more detail.
There are four computationally expensive phases of the code. Each of them is formulated as C function. These functions are called at every iteration of a time loop.
for t in 0..t_max {
comp_intercept_starpu
comp_overland_depth_starpu
comp_infiltration_starpu
comp_diffusive_routing_starpu
comp_outlet_starpu
comp_storm_starpu
}
The code uses multiple two-dimensional arrays with data values. These arrays (as common in C) are flattened into one dimension. A data array is an m x n matrix, where m — no. of rows and n — no. of columns. When flattened an m * n matrix becomes a vector of length m * n. To divide the data into blocks I divide my arrays along the first (row) dimension only. The second (column) dimension remains constant and is equal to n. Please see an attached sketch for details on data arrangement in arrays.
The first three routines (intercept, overland_depth and infiltration) can benefit from applying StarPU’s block filter and submitting tasks on these blocks. This situation is depicted in the leftmost figure on the sketch.
Diffusive routing routine is the most complex and needs to omit the last row and the last column from its calculation. To make it work I created my own data structure called a matrix block. I manually divide the data arrays into these matrix blocks, manually register them with StarPU and submit tasks. This access pattern is illustrated in the middle figure in the sketch.
Finally, there are two low effort routines: outlet and storm. Their data access pattern is shown in the rightmost figure on the sketch. It is probably not worthy of parallelisation. But my final goal is to enable distributed-memory parallelism via StarPU-MPI. It may be beneficial to register these data points with StarPU and compute all outlet and storm values on a single MPI rank. I hope this makes sense.
May I please ask a few more questions:
1) Is it normal to have multiple data views registered with StarPU (‘automatic’ blocks created with a StarPU vector filter), manual matrix blocks manually registered with StarPU and finally a handful of data points manually registered with StarPU? Will these multiple views conflict with each other and degrade the overall parallel performance?
2) Do I understand correctly that it is best for StarPU performance to create and register all of these data partitions before the main loop over time and unregistered after the main loop?
3) What else can I try to improve performance of my code? Currently, everything apart from ‘outlet’ and ‘storm’ routines is ported to StarPU. I’m using starpu_pause and starpu_resume before and after calling the ‘outlet’ and ‘storm’ routines.
I will share my code in a public GitHub repository and send you a link in a very near future. Thank you very much for your attention and help so far! Have a great day ahead!
—
Best wishes,
Maxim

Maxim Abalenkov \\ maxim.abalenkov@gmail.com
+44 7 486 486 505 \\ www.maxim.abalenkov.uk
+44 7 486 486 505 \\ www.maxim.abalenkov.uk
On 5 Nov 2024, at 09:46, Maxim Abalenkov <maxim.abalenkov@gmail.com> wrote:Dear Samuel, Philippe et al.,<tasks_size_overhead.pdf>How are you? I hope all is well with you. I re-compiled StarPU and ran the ‘task_size_overhead.sh’ script. Please find its output attached. Does it mean that for my system task sizes of 512 elements and above will be beneficial? In my StarPU experiments I have been using tasks of size 488. However, the performance I see is far from optimal. Thank you for your help and have a great day ahead!—Best wishes,MaximMaxim Abalenkov \\ maxim.abalenkov@gmail.com
+44 7 486 486 505 \\ www.maxim.abalenkov.ukOn 4 Nov 2024, at 14:03, Maxim Abalenkov <maxim.abalenkov@gmail.com> wrote:Dear Samuel, Philippe et al.,How are you? I hope all is well with you. Thank you very much for your suggestions. I tried:a) using starpu_pause, starpu_resume around sequential part of the codeb) using starpu_data_acquire, starpu_data_release —//—c) examining the task size overhead on my system with the ‘task_size_overhead.sh’ scriptfind -name tasks_size_overhead.sh./tests/microbenchs/tasks_size_overhead.sh./starpupy/benchmark/tasks_size_overhead.sh./build/tests/microbenchs/tasks_size_overhead.sh./build/starpupy/benchmark/tasks_size_overhead.shThe a) and b) didn’t not bring any performance gain. I run my StarPU code withSTARPU_NCUDA=0 STARPU_NOPENCL=0 STARPU_NCPU=8 STARPU_SCHED=dmda python ./app.pyPython script ‘app.py’ is a simple wrapper pre-processing the input data and launching the main C + StarPU executable.I also fail to run the shell script c). It crashes because it is unable to find the executable ‘tasks_size_overhead’. I couldn’t find it either using a ‘find’ command. Do I need to configure and compile StarPU with some extra options to make the executable ‘tasks_size_overhead’?I’m using StarPU 1.4.7 @ macOS Sonoma 14.7.1 with an M2 CPU.I will share my source code in a public GitHub repository for you to see in the very near future. In the meantime I’m going to convert the final phase of my finite-difference code to StarPU and test performance again. Thank you for your help and have a wonderful day ahead!—Best wishes,MaximMaxim Abalenkov \\ maxim.abalenkov@gmail.com
+44 7 486 486 505 \\ www.maxim.abalenkov.ukOn 11 Oct 2024, at 16:13, Philippe SWARTVAGHER <philippe.swartvagher@inria.fr> wrote:
Le 11/10/2024 à 14:43, Samuel Thibault a écrit :No, that should not really be a problem. But how large your tasks
are? See the checklist item about task sizes:
https://files.inria.fr/starpu/doc/html/CheckListWhenPerformanceAreNotThere.html#CheckTaskSize
That being said, you will probably want to put starpu_pause() /
starpu_resume() around your not-yet-starpufied code, otherwise StarPU
monopolizes all CPU cores, which can thus degrade the performance of the
non-starpufied code.
If your non-StarPU code accesses data managed by StarPU, you should also call
starpu_data_acquire(data_handle, STARPU_R /* or STARPU_RW */)
before your non-StarPU code reads (or writes) the StarPU data. This makes sure the tasks which manipulate data_handle are finished. Then, to tell StarPU you are finished working with this memory and it can start other tasks using this handle, you have to call
starpu_data_release(data_handle)
Have a look for instance on this example: https://gitlab.inria.fr/starpu/starpu/-/blob/f3e318b326666c5b279680d5a57fc0468e8c1876/examples/filters/fvector_pick_variable.c
Or, instead, you can just call
starpu_task_wait_for_all()
but depending on what your application does, you may not need to wait for all tasks to finish to start reading/writing one piece of data.
Not sure if this will solve your problems, but it was missing from your description of how your program uses StarPU.
--
Philippe SWARTVAGHER
Assistant Professor @ ENSEIRB-MATMECA
Research team Topal @ Inria Bordeaux-Sud Ouest
- Re: [starpu-devel] [StarPU for stencil-based code: logic, starpu_shutdown error], Maxim Abalenkov, 04/11/2024
- Re: [starpu-devel] [StarPU for stencil-based code: logic, starpu_shutdown error], Maxim Abalenkov, 05/11/2024
- Re: [starpu-devel] [StarPU for stencil-based code: logic, starpu_shutdown error], Maxim Abalenkov, 05/11/2024
- Re: [starpu-devel] [StarPU for stencil-based code: logic, starpu_shutdown error], Maxim Abalenkov, 05/11/2024
- Re: [starpu-devel] [StarPU for stencil-based code: logic, starpu_shutdown error], Maxim Abalenkov, 05/11/2024
- Re: [starpu-devel] [StarPU for stencil-based code: logic, starpu_shutdown error], Maxim Abalenkov, 05/11/2024
Archives gérées par MHonArc 2.6.19+.