Objet : Developers list for StarPU
Archives de la liste
Re: [starpu-devel] [StarPU for stencil-based code: logic, starpu_shutdown error]
Chronologique Discussions
- From: Maxim Abalenkov <maxim.abalenkov@gmail.com>
- To: starpu-devel@inria.fr
- Subject: Re: [starpu-devel] [StarPU for stencil-based code: logic, starpu_shutdown error]
- Date: Tue, 5 Nov 2024 17:22:03 +0000
- Authentication-results: mail3-smtp-sop.national.inria.fr; spf=None smtp.pra=maxim.abalenkov@gmail.com; spf=Pass smtp.mailfrom=maxim.abalenkov@gmail.com; spf=None smtp.helo=postmaster@mail-wm1-f44.google.com
- Ironport-data: A9a23:9pX34qtcYrk8MYi/8yirtZaKIefnVPxaMUV32f8akzHdYApBsoF/q tZmKWyDbPjcMGfwed9xOdy0o04CsMPUm9FmHABrpX1hECIWgMeUXt7xwmXYb3rDdJWbJK5Ex 5xDMYeYdJhcolv0/ErF3m3J9CEkvU2wbuOgTrSCYEidfCc8IA85kxVvhuUltYBhhNm9Emult Mj7yyHlEAbNNwVcbCRNsspvlDs15K6u4WpA4wRlDRx2lAa2e0c9XMp3yZ6ZdCOQrrl8RoaSW +vFxbelyWLVlz9F5gSNz94X2mVTKlLjFVDmZkh+A8BOsTAezsAG6ZvXAdJHAathZ5plqPgqo DlFncTYpQ7EpcQgksxFO/VTO3kW0aGrZNYriJVw2CCe5xSuTpfi/xlhJBtsBpYi0f9ZPUFD+ qYGeSBXVT2swMvjldpXSsE07igiBMziPYdapXQ5iD+FUbApRpfMR6iM7thdtNsyrpoWTLCOO oxAM2opMEiojx5nYj/7DLo7lfepgz/2eTRcpVSWorQf7G3azQg327/oWDbQUoXSHZUPwRjI9 goq+UynPCAFPv+O8AanqCz3gOzMvQnpXKY7QejQGvlC2wDKnjNCVnX6T2CTuuWwkFajHt5SN UEQ0jE/qLA7sk2tVNj0GROiyENopTYZUttUVvI/sUSDlvaS7AGeCWwJCDVGbbTKqfPaWxRwj GSSpfjwKwBPvbLPVyONrIy5gnSbbH19wXA5WQcISg4M4t/GqY41jw7SQtsLLEJTpo2lcd0X6 2DaxBXSl4kuYdg3O7JXFG0rbhqpr5nNCxEwv0DZAj7j4QR+a4qoIYev7DA3DMqszq7IFTFtX 1BdxKByCdzi67nTzkRhp81TRtmUCw6tamG0vLKWN8BJG86R03CiZ5tMxzp1OV1kNM0JERewP xSL5l8MvM8OZyTzBUOSX25XI5R7pUQHPYS0Ps04kvIXM/CdiSfeo3E+OxHKhwgBbmB1wPpna cnznTmQ4YYyUvk+lGXnGY/xIJckwScxwW6bRJbwiXyaPUm2NRaopUM+GALWNIgRtfvayC2Mq oo3H5XQl313DralCgGJqtF7ELz/BSJqbXwAg5cPLrbbSuencUl9Y8LsLUQJItE+x/4NyraXo RlQmCZwkTLCuJEOEi3SAlgLVV8ldc8XQasTZHR3YQSbyDI4bJyx7awSUZIycPN1vKZg1PN4B b1NMcmJHv0FGHyN9iU/fKvNitVoVC2qogaSYAujQjw0JKB7SyLzp9TLQwrI9Qs1NBSRi/cQm bOb+zngccIxfDg6VMfyQ9Cz/my1pkkYyb5TXVOXA9x9e3fM0YlNKg73hMAZO8sndBfJnGOb8 y20Ah4og/bHjKFo0dvOhIGC95yIFcknFGVkPmDr15SEHgiEwXiCmKhrT/StUQ3Gcl/N6IGOR Llw3u7tFv8qh3NIuNdMKKlqxqcA+Nffnb9W4QB6FnHtbV7wKLdfDlSZ/MtIpItf76R4vFaoZ 0ex5dVqA7WFF8f7Glo3JgB+TOCi1+kRqwbC/8YOP0T2yy9mzoWpCXwIEUG3txVcC79pPKcO4 +Qr4pcW4jPirCsaCI+NiyQM+lmcKnAFbb4ciagbJ43WkSsu9EBJZM3NKy3x4azXUe52DGsRH ma2ipbB1pNm/WiTV1ooFHPI4/hRuoRWhjBO0205Bgqonvjru6YJ+SN/oBUNcxRt7xRY0uhMF HBhGG9rKI6voTp5pshxcFq9OgNGBRem1FT74AJSnlH0U3uqbzTpKWEjM7yB53IirmBWJGBa2 Jq6y2/VdyngU++s/yk1WG9j8+fCS/4o/CL8ucmXJea3NLhkXijAnYmvejAuuTb8JMEM2H38u uhh+dhvZZ3BNSI/p7MxD6+Y3+8yTC+oCXNjQ/Y72o80BkDZJS+P3AaRJ3CLes9iI+LA9Wm6A ZdMIuNNTxGP6zacnAsEBKIjI65Go9Bx3YAsIojUHG8ht6eTihFLs5iKryj3uzINcuVUyM04L tvcSiKGHmmunkBrom7qrvRfG2+GcNIBNRzd3ue0zb0zLKg9ktpQKGM87rjlmE+uElpD3wmVt wb9daPp37Rc6YBzrbDNTIRHJSuJcO3WasrZ3jy3gdp0afH3DfzvrCIQ81nuABRXN+AeWvNxj rW8j+T010Lk4pczX3zopJ2aM6xv+8+JffF2N/jvJyJwhhqyW87L4joC9VunKJdPrshv28m/S ya8a+qybdQwWeoB4FFwdA5lDE87J4nsS6XvtweRjq6pMQcM9x7DIPeM127bXUsCegAmY5TBW xLJ4dCw7dVmnaFwLR4jBdQ9JrRnIVXmCJAUR/eouRa2VmCX006/4J38nh8d6BbOOHmOMOD+x bnnHhHeVhCDiJvk/eFjkb5Zn0MoVS5moOwKYEgi1cZ8iGm6AE44PO0tC8g6Ja8OoBPi9qPTR W/rV3QjOxXfTD4fUBTb4fbfZCm9KNEKGO/EIm0OwxvJRQawXY+OOe40vGMoqXJ7YSDqw+ybO MkTsC+4dAS4xpZyA/0f/LqniONg3enX3W8M5Vu7qcHpHhICGv8f4RSNxuaWufDvSKkhVXknJ FTZgUhBSUC/DFH0SINuJyITFxYeszfiiT4vaE9jBToZV5qzlIV9JD/XYokfEYHvqOwFIbcPQ TX8QG7lD6W+xCkIoaVw0z42qfYcNB9Id/RW6IfsQAQTm+e77WFP0wbuW8YQZJlKxTOz2G8xW tVhD7bSyahFxI1sNGWq9Dg0
- Ironport-hdrordr: A9a23:VAXZGqivsbfEidyA0UqCiyZwZHBQXu0ji2hC6mlwRA09TyX4rb HNoB1/73XJYVkqKRYdcLy7Scq9qDbnhOZICPcqTMyftXjdyQ6VxehZhOOIowEIWReOkNK1vZ 0AT0EUMqyVMbEVt7eC3OB6KbodKRu8nZxASd2w854Ud3ARV0io1WlENjo=
- Ironport-phdr: A9a23:Uh4hZhxKkXiPRAzXCzIsw1BlVkEcU1XcAAcZ59Idhq5Udez7ptK+Z xWZva0m0g6BHd2Cra4f06yO6+GocFdDyKjCmUhBSqAEbwUCh8QSkl5oK+++Imq/EsTXaTcnF t9JTl5v8iLzG0FUHMHjew+a+SXqvnYdFRrlKAV6OPn+FJLMgMSrzeCy/IDYbxlViDanbr5/I gi6oR/NusUKjodvK6I8wQbNrndUZuha32xlKFyPkxrm+su84Jtv+DlMtvw88MJNTar1c6MkQ LJCCzgoL34779DxuxnZSguP6HocUmEInRdNHgPI8hL0UIrvvyXjruZy1zWUMsPwTbAvRDSt9 LxrRwPyiCcGLDE27mfagdFtga1BoRKhoxt/w5PIYIyQKfFzcL/Rcc8cSGFcWMtaSi5PDZ6mb 4YXD+QOIelXoZTzqVsAsxWxBwqiCuT0xzBSmnP22Lc30+Q9HQzE2gErAtIAsG7TrNXwLKocV vq6zabJzTXGcvhbxSzy55LMch8/p/GMXrVwcdTMwkQoEgPKlFqRqY3gPz+PzOsCqHKU7+5+W uKpiG4nrB9xryOgxscpkIbJh4YVxkrY+iV+xYY4PNu1Q1N0btC4CpVfrT2aN5doTcM4RWFlo Ds2xLMJt5OlfCUHy5Yqyh7DZvGGfIaF4xDuWeaSLDp6mX9oZLCyiwuv/EW+yODwSsu53EpEo ydbjNXAqn4A2h7V58OaRPV9+UKh1iyO1wDV8uxELkE0lbbbK5482bE8jIYcsUPGHiPugkr5l 7WZdkE69eiu6OTnZavmqoWBN493kg3+M6Iumta+AesiKAQOUXKb9fyn1L3j50H2W6hKjvwyk qbEqJzaOd4UprW6Aw9Oyokj8Be/Dja439sAmnkHMkxKeAiHj4f3IV7BPer0Dey/g1mqjTxlx OjGM6X/DpnRKnXPirTscLZn50JB1gY+zspT6p1bB70ZPf7+XkH8uMbFAhMkMQG42fjrBMhy2 48ER26CDaCUPaXPulKW4uIvPvKMZJMLtzbnNvYl5v/vgmEhlFMBe6SiwIEZZ2qiHvt8JkWUe XrsjckFEWcNpgc+SfbliFyGUTJKaXeyWr8w6ig1CI+pAorPXI+tgLuG3Ce0Gp1ZeHpKBUyLE Xftb4mEWvEMZzyOIsJ5jDAISbysR5Ui2ByurgP21qRrI+nO9iAXup/vzN116PfSlRE2+zx0F cOd02SVQmFxhGwJSD423KVlrUNn0ViMy6x4jOJeFdxQ4PNJTBw3NZHZz+NgCtDyXhjNccuOS FajWtmmByo+Qcorw9ASe0Z9B8mijhfb0ia3G7MVjaCEBIQo8qLA2Hj8P919y2rc26k7l1kmX 9JDOnC4ia5h6QfTA5XEk16ZlqavbaQTxjTB9GaFzWqUvUFXShR8UavfXSNXWkyDtsjw/FvfC rOjF7kjGgpH08+LbKVQOfPzilATYf7zOdGWSmi4n2q2BhGWjueJZZrrfiMU1izZBU4HlB478 nOPNAx4DSCk9TGNRAdyHE7iNhu/udJ1r2m2GxdcJ2CiakRg0+Hw4RsJnbmHTOtV2LsYuSAno jEyHVCn3tuQBcDT7xF5cvB6ZtUwqExCyXqfrxZ0a5mpP6FnwFcUegB6uUDpyT14D4xBlY4hq 3Z5hBFqJ/eg2UhaPyidwYi2P7TWLmfo+xX6bqfG21SY2dyf/qoJ6/Mmg1rmtQCtUEEl9iYvy MFbhl2b4JiCFw8OSdTxX0IwogB9vK3fazIh6pn80HRtNeyrt2aH1Y51QuQizRmkcpFUN6bs+ BbaNcocCoDuLeUrnwPsdRcYJKVJ87ZyOcq6dvyA0artPeB6nTvgg34VqIZ6mlmB8SZxUIuql 94M3u2Y0w2bVjz9kEbps8b5nppBbC0TGWz3wDbtBYpYbKl/NYgRDmLmL8qyz9R4z5njPhwQv FuqF1IDnsajcB6faV/6wyVf0E0WpTqsni75hz14njc1r7aOiTTUyreHFlJPMWpKSW9+yFb0d NLs3pZKAQ7xNVhvyEf2gCSyj7JWr6l+MWTJFEJBfiytanpnTrP1rb2aJchG9JIvtyxTFuW6e 1GTDLDn8H54m2vuGXVTwDcjenSkoJL8ylZ/gXKcKTB6pnPddMdzwgn369nVRPoX1T0DDnod6 3GfFh2nMt+l8M/B3ZLKouG7EWurX5ledyrs06uPsSK64SthBhj1zJXR0pX3VAM91yH8zdxjU y7F+Q39bofc3KO/Ket7f0NsCTcQ8uJCE5pl2ss1jZAUgj0Bg4mNuGEAiSH1OMla3qT3aDwMQ yQKypjb+lqt1EpmJ3OPj4X3MxfVis5od9i/JGoc0yYw6cNNFo+b6bVFmW1+pV/woQ/KYPd7l ysQ0rN0sC9c07xP4VB9iHnBSrkJeCsQdTThjRGJ88yzoO1MaWCjfKLxnEtykNa9Da2T9wRVW XL3YJAnTmd76sRyNk6J0WWmsNm1PomNK4tJ5lvIz0Sl7aAdMp86m/sUiDAyPGv8uSZg0Osnl Vl12on8uoGbKmJr9ab/AxhCNzSzadlAn1OlxatYgMuS2JiiW5t7HTBeFp7hXfOtVjsYvPDqM QWJDhUzr36aHfzUGgrVuyIE5zrfVoumMX2aPixTyNxyRRzbLUZWhA0SXzIktpE8HwGugsfmd Q0qg1JZrk69oRxKxOVyMhD5WWqKvwakZAA/T52HJQZX5AVPtA/FdNaT5eVpE2RE74Ws+UaTf 3eDaV0CXgRrEgSUQkruNb606Zzc/viEU6CgeuDWb+zGqPQCBazVg8v+is08o2nKboLVYjFjF 6FphBYFBysiXZ2Hw3NXDHVG8kCFJ8+D+EXipGsu9pr5qLKzH1i3rYqXV+kMb5M1p0Hw0f/Fb 6nK3G54MWoKiclKnCOOkelFmgZV0nELFXHlEKxc53GRCvuKx+kPSUZcMn07NdMUvftkjk8UZ pGd2pWtkeQhxv8tVwUcCgenw5D1I5RMeybkajalTA6KLOjUf2WahZGqJ/rmGfsIy7wL/xyo5 WTBShGlY2TFzmizEUjoaLAEjTnHbkYH5sfnKUcrUjKlFJW/O3jZeJdhhDkyi9XYn1vsMmgRe Xh5ekJJ9fiL6D9Ax+54AypH52ZkKu+Nn2CY6fPZI9AYq6kjBCM8jO9c7HkgrtkdpChZWPx4n jfTpd9yshmnlOeI0D9uTBtJrH5CmouKuUxoPaiR+INHXD7I+xcE7GPYDBpvxZMtEtr0p6VZ0 cTCjor2ITZGts3RpI4SW5SSJ8WAP34sdxHuHX+cDQcISyKqKXCKh0FZl6L3lDXdpZw7p57w3 ZsWH+UDBRplS7VDUBQjQIBRRfU/FikpmrOaksMSsH+3rR2KAd5fooiCTfWKR/PmNDeei7BAI RoO27Lxa4oJZeiZkwRvbEd3mIPSFg/eR9dI92dkZx81pQNE+nF6Q2sw3FzNZQak4XtVHvmx1 E1T6EM2caE2+THg7k1ibELNvzc1mVItlM/NhDmQdHvuKf71U90KTSXzsEc1P9XwRAM/PmjQ1 QR0cTzDQbxWlb5pc2tm3RTdtZV4EvlZVaRYYRUUyJl/iN0n1F1drmOswkoVvIMt6LNnnQouN IGp9jdOh14lY9kyKqjdYqFOywoI7kprliCt3+E1hgQZIhRVmF4=
- Ironport-sdr: 672a5447_qWEKMcD2CwzY8rR49lWzzzgqyCDI+GFDv2gWMD3hdKEB2j0 K+joX8yMOJOAmB2Amof+QAvEjtzPrXw7uJrA5bA==
Dear all,
I hope all is well with you. Let me please share the link to a GitHub repository with my code:
The wrapper routine Engine::profile_run calls the six phases of computation at each iteration of the time loop:
1) Intercept phase:
2) Overland depth phase:
3) Infiltration phase:
4) Diffusive routing phase:
5) Outlet + Storm phase // currently sequential:
If possible, can you please spend 15 minutes to review my StarPU code and provide insight on what I could do to improve parallel performance? Thank you very much for all your help so far! Have a wonderful evening ahead!
—
Best wishes,
Maxim
Maxim Abalenkov \\ maxim.abalenkov@gmail.com
+44 7 486 486 505 \\ www.maxim.abalenkov.uk
+44 7 486 486 505 \\ www.maxim.abalenkov.uk
On 5 Nov 2024, at 12:33, Maxim Abalenkov <maxim.abalenkov@gmail.com> wrote:Dear all,Let me please describe my finite-difference code in more detail.There are four computationally expensive phases of the code. Each of them is formulated as C function. These functions are called at every iteration of a time loop.for t in 0..t_max {comp_intercept_starpucomp_overland_depth_starpucomp_infiltration_starpucomp_diffusive_routing_starpucomp_outlet_starpucomp_storm_starpu}The code uses multiple two-dimensional arrays with data values. These arrays (as common in C) are flattened into one dimension. A data array is an m x n matrix, where m — no. of rows and n — no. of columns. When flattened an m * n matrix becomes a vector of length m * n. To divide the data into blocks I divide my arrays along the first (row) dimension only. The second (column) dimension remains constant and is equal to n. Please see an attached sketch for details on data arrangement in arrays.The first three routines (intercept, overland_depth and infiltration) can benefit from applying StarPU’s block filter and submitting tasks on these blocks. This situation is depicted in the leftmost figure on the sketch.Diffusive routing routine is the most complex and needs to omit the last row and the last column from its calculation. To make it work I created my own data structure called a matrix block. I manually divide the data arrays into these matrix blocks, manually register them with StarPU and submit tasks. This access pattern is illustrated in the middle figure in the sketch.Finally, there are two low effort routines: outlet and storm. Their data access pattern is shown in the rightmost figure on the sketch. It is probably not worthy of parallelisation. But my final goal is to enable distributed-memory parallelism via StarPU-MPI. It may be beneficial to register these data points with StarPU and compute all outlet and storm values on a single MPI rank. I hope this makes sense.May I please ask a few more questions:1) Is it normal to have multiple data views registered with StarPU (‘automatic’ blocks created with a StarPU vector filter), manual matrix blocks manually registered with StarPU and finally a handful of data points manually registered with StarPU? Will these multiple views conflict with each other and degrade the overall parallel performance?2) Do I understand correctly that it is best for StarPU performance to create and register all of these data partitions before the main loop over time and unregistered after the main loop?3) What else can I try to improve performance of my code? Currently, everything apart from ‘outlet’ and ‘storm’ routines is ported to StarPU. I’m using starpu_pause and starpu_resume before and after calling the ‘outlet’ and ‘storm’ routines.I will share my code in a public GitHub repository and send you a link in a very near future. Thank you very much for your attention and help so far! Have a great day ahead!—Best wishes,Maxim<data_arrangement_access_patterns.jpg>Maxim Abalenkov \\ maxim.abalenkov@gmail.com
+44 7 486 486 505 \\ www.maxim.abalenkov.ukOn 5 Nov 2024, at 09:46, Maxim Abalenkov <maxim.abalenkov@gmail.com> wrote:Dear Samuel, Philippe et al.,<tasks_size_overhead.pdf>How are you? I hope all is well with you. I re-compiled StarPU and ran the ‘task_size_overhead.sh’ script. Please find its output attached. Does it mean that for my system task sizes of 512 elements and above will be beneficial? In my StarPU experiments I have been using tasks of size 488. However, the performance I see is far from optimal. Thank you for your help and have a great day ahead!—Best wishes,MaximMaxim Abalenkov \\ maxim.abalenkov@gmail.com
+44 7 486 486 505 \\ www.maxim.abalenkov.ukOn 4 Nov 2024, at 14:03, Maxim Abalenkov <maxim.abalenkov@gmail.com> wrote:Dear Samuel, Philippe et al.,How are you? I hope all is well with you. Thank you very much for your suggestions. I tried:a) using starpu_pause, starpu_resume around sequential part of the codeb) using starpu_data_acquire, starpu_data_release —//—c) examining the task size overhead on my system with the ‘task_size_overhead.sh’ scriptfind -name tasks_size_overhead.sh./tests/microbenchs/tasks_size_overhead.sh./starpupy/benchmark/tasks_size_overhead.sh./build/tests/microbenchs/tasks_size_overhead.sh./build/starpupy/benchmark/tasks_size_overhead.shThe a) and b) didn’t not bring any performance gain. I run my StarPU code withSTARPU_NCUDA=0 STARPU_NOPENCL=0 STARPU_NCPU=8 STARPU_SCHED=dmda python ./app.pyPython script ‘app.py’ is a simple wrapper pre-processing the input data and launching the main C + StarPU executable.I also fail to run the shell script c). It crashes because it is unable to find the executable ‘tasks_size_overhead’. I couldn’t find it either using a ‘find’ command. Do I need to configure and compile StarPU with some extra options to make the executable ‘tasks_size_overhead’?I’m using StarPU 1.4.7 @ macOS Sonoma 14.7.1 with an M2 CPU.I will share my source code in a public GitHub repository for you to see in the very near future. In the meantime I’m going to convert the final phase of my finite-difference code to StarPU and test performance again. Thank you for your help and have a wonderful day ahead!—Best wishes,MaximMaxim Abalenkov \\ maxim.abalenkov@gmail.com
+44 7 486 486 505 \\ www.maxim.abalenkov.ukOn 11 Oct 2024, at 16:13, Philippe SWARTVAGHER <philippe.swartvagher@inria.fr> wrote:
Le 11/10/2024 à 14:43, Samuel Thibault a écrit :No, that should not really be a problem. But how large your tasks
are? See the checklist item about task sizes:
https://files.inria.fr/starpu/doc/html/CheckListWhenPerformanceAreNotThere.html#CheckTaskSize
That being said, you will probably want to put starpu_pause() /
starpu_resume() around your not-yet-starpufied code, otherwise StarPU
monopolizes all CPU cores, which can thus degrade the performance of the
non-starpufied code.
If your non-StarPU code accesses data managed by StarPU, you should also call
starpu_data_acquire(data_handle, STARPU_R /* or STARPU_RW */)
before your non-StarPU code reads (or writes) the StarPU data. This makes sure the tasks which manipulate data_handle are finished. Then, to tell StarPU you are finished working with this memory and it can start other tasks using this handle, you have to call
starpu_data_release(data_handle)
Have a look for instance on this example: https://gitlab.inria.fr/starpu/starpu/-/blob/f3e318b326666c5b279680d5a57fc0468e8c1876/examples/filters/fvector_pick_variable.c
Or, instead, you can just call
starpu_task_wait_for_all()
but depending on what your application does, you may not need to wait for all tasks to finish to start reading/writing one piece of data.
Not sure if this will solve your problems, but it was missing from your description of how your program uses StarPU.
--
Philippe SWARTVAGHER
Assistant Professor @ ENSEIRB-MATMECA
Research team Topal @ Inria Bordeaux-Sud Ouest
- Re: [starpu-devel] [StarPU for stencil-based code: logic, starpu_shutdown error], Maxim Abalenkov, 04/11/2024
- Re: [starpu-devel] [StarPU for stencil-based code: logic, starpu_shutdown error], Maxim Abalenkov, 05/11/2024
- Re: [starpu-devel] [StarPU for stencil-based code: logic, starpu_shutdown error], Maxim Abalenkov, 05/11/2024
- Re: [starpu-devel] [StarPU for stencil-based code: logic, starpu_shutdown error], Maxim Abalenkov, 05/11/2024
- Re: [starpu-devel] [StarPU for stencil-based code: logic, starpu_shutdown error], Maxim Abalenkov, 05/11/2024
- Re: [starpu-devel] [StarPU for stencil-based code: logic, starpu_shutdown error], Maxim Abalenkov, 05/11/2024
Archives gérées par MHonArc 2.6.19+.