Objet : Developers list for StarPU
Archives de la liste
[Starpu-devel] Performance profiling of tile low rank cholesky (HiCMA) with StarPU on distributed memory systems
Chronologique Discussions
- From: Hatem Ltaief <hatem.ltaief@kaust.edu.sa>
- To: "starpu-devel@lists.gforge.inria.fr" <starpu-devel@lists.gforge.inria.fr>
- Cc: Kadir Akbudak <kadir.akbudak@kaust.edu.sa>, Aniello Esposito <esposito@cray.com>, Aleksandr Mikhalev <aleksandr.mikhalev@kaust.edu.sa>, "Sameh M. Abdulah" <sameh.abdulah@kaust.edu.sa>
- Subject: [Starpu-devel] Performance profiling of tile low rank cholesky (HiCMA) with StarPU on distributed memory systems
- Date: Thu, 26 Apr 2018 14:57:42 +0000
- Accept-language: en-GB, en-US
- Authentication-results: mail2-smtp-roc.national.inria.fr; spf=None smtp.pra=hatem.ltaief@kaust.edu.sa; spf=Pass smtp.mailfrom=prvs=5654d6edfa=hatem.ltaief@kaust.edu.sa; spf=None smtp.helo=postmaster@mx08-0025e101.pphosted.com
- Ironport-phdr: 9a23:ZRhiABPjJ3lZM3tKFcol6mtUPXoX/o7sNwtQ0KIMzox0I/n4rarrMEGX3/hxlliBBdydt6ofzbKO+4nbGkU4qa6bt34DdJEeHzQksu4x2zIaPcieFEfgJ+TrZSFpVO5LVVti4m3peRMNQJW2aFLduGC94iAPERvjKwV1Ov71GonPhMiryuy+4ZLebxlGiTanfb9+MAi9oBnMuMURnYZsMLs6xAHTontPdeRWxGdoKkyWkh3h+Mq+/4Nt/jpJtf45+MFOTav1f6IjTbxFFzsmKHw65NfqtRbYUwSC4GYXX3gMnRpJBwjF6wz6Xov0vyDnuOdxxDWWMMvrRr0vRz+s87lkRwPpiCcfNj427mfXitBrjKlGpB6tvgFzz5LIbI2QMvd1Y6HTcs4ARWdZXshfSTFPAp+yYYUMCOQBPPpXoIbmqlsLsRe+ABOhBOPzxjNWgHL9wK000/4mEQHDxAEuHcgBsGjKo9XwKawcV/y1w7PJzTrdYfNdxDDw6IjSfRA9v/6MWKh8cc7NyUY1FgPKkFuQpJfmPzyPy+QNqHSb4/B+Wu2ylm4rsRpxrSK2y8oql4LHhZoVx0jF+Cln2oo5O8G0RUphbdOkDZdcrTyWOol1T886Xm1kpDw2xqMGtJO0ZiQG1YgrywLFZ/GHa4SI7AzsWeWNLTp9gX9pZr2yiAis/UWl0eLxVM253VlPoydGj9bAqHAA2hnP5cWGVPRw+Fqq1yyV2ADJ8O5EJFg5larFJJ4lxb49jp8TsEvfEiL1gUn6kLOaelsk9+e29ujrfqjqqoWEN49sigH+Nb8umtChDuslKAQBQ2+b+eGk2L3i+032XqlKg+U3n6XHqpzWO9gXq6qjDwJVz4ov8QuzAjWl3dgAmHkINlNFeBaJj4jzPFHOJej1De+lg1Syizdm3/DLM7zhD5jCM3fOkanucqtz60FG0AczzcpQ545KBbEEOv7zXlPxu8bZDhAnPQy52OXnB89g1oMFRWKDGLGWP77PsVOS/eIgOfGAZJUJtzblN/gl+/nugGcimV8BZ6apx4MXaG2hEvt7PUqZf2TjgtMaHGcOvwo+V/DqiEaYXT5caXayWLg85j4lB468DIfDQJqtgL2b0yuhEJ1WfDMONlfZW2z0fp+cRrIAZTyfJudlkycYTv6uRYg72hzotQngyrMhIPCesnkDqZv5zMUw6+DNmBUa8T1vE9/b3G+KVW552GIOXT4/mq5l9x9T0FCGhIhxn/FdXfhP7vJEXE9uNZfAwuk8D8z7Ww/PVtSAVU26BNiqHHcqR4RikJc1f09hFoD63Vj41C2wDupQzuTTXc1mwufnx3H0Yv1F5TPD3aglgUMhR5ITZ26vm7VjsQXfGsjSmhfAzvr4ReEnxCfIsVy74y+WpkgBDFxzWLjeQTYSalaQtt2rvhqfHY/rMqwuN0568eDHKqZObYe531BWHKemYIyEPm/owz72HQ6IwaiQYYascGIYjn3Q
- List-archive: <http://lists.gforge.inria.fr/pipermail/starpu-devel/>
- List-id: "Developers list. For discussion of new features, code changes, etc." <starpu-devel.lists.gforge.inria.fr>
Dear StarPU Team,
We have performed some string scalability experiments of HiCMA on a Cray XC system (2 sockets 14-core Intel BDW).
We fixed basically the problem size and increase the number of nodes.
We set the number of threads per node to 27 to free the single remaining thread for StarPU to handle communication.
As shown in the attached pdf document, when we increase the number of nodes, we do not scale as expected.
The profiles indicate that most of the time is being spent into internal StarPU functions:
|| 42.6% | 79,366.5 | 851.5 | 1.1% | _starpu_get_worker_task
|| 2.2% | 4,111.3 | 114.7 | 2.8% | _starpu_handle_pending_node_data_requests
|| 1.9% | 3,523.9 | 145.1 | 4.1% | _starpu_cpu_worker
|| 1.9% | 3,507.2 | 130.8 | 3.7% | _starpu_cpu_driver_run_once
|| 1.9% | 3,469.6 | 128.4 | 3.7% | _starpu_may_pause
|| 1.7% | 3,180.7 | 220.3 | 6.7% | starpu_memchunk_tidy
|| 1.7% | 3,171.0 | 139.0 | 4.3% | _starpu_datawizard_progress
|| 1.1% | 2,113.8 | 94.2 | 4.4% | _starpu_handle_node_data_requests
We reduced then the number of worker threads per node to 16 and 8: we recover our parallel efficiency up to a decent level!
This is strong scaling, work may not be available, we would expect threads to stay idle and not consume resources. But apparently, from our experiments, they seem to do some internal StarPU-related work, which seem to prevent parallel performance.
Is this an expected behavior? Have you also observed something similar on your side? Any advices?
Thanks,
Hatem
This message and its contents including attachments are intended solely for the original recipient. If you are not the intended recipient or have received this message in error, please notify me immediately and delete this message from your computer system. Any unauthorized use or distribution is prohibited. Please consider the environment before printing this email.
Attachment:
hicma-starpu-profiling.pdf
Description: hicma-starpu-profiling.pdf
- [Starpu-devel] Performance profiling of tile low rank cholesky (HiCMA) with StarPU on distributed memory systems, Hatem Ltaief, 26/04/2018
- Re: [Starpu-devel] Performance profiling of tile low rank cholesky (HiCMA) with StarPU on distributed memory systems, Samuel Thibault, 26/04/2018
- Re: [Starpu-devel] Performance profiling of tile low rank cholesky (HiCMA) with StarPU on distributed memory systems, Hatem Ltaief, 26/04/2018
- Re: [Starpu-devel] Performance profiling of tile low rank cholesky (HiCMA) with StarPU on distributed memory systems, Samuel Thibault, 26/04/2018
- Re: [Starpu-devel] Performance profiling of tile low rank cholesky (HiCMA) with StarPU on distributed memory systems, Hatem Ltaief, 26/04/2018
- Re: [Starpu-devel] Performance profiling of tile low rank cholesky (HiCMA) with StarPU on distributed memory systems, Sameh Abdulah, 26/04/2018
- Re: [Starpu-devel] Performance profiling of tile low rank cholesky (HiCMA) with StarPU on distributed memory systems, Samuel Thibault, 26/04/2018
- Re: [Starpu-devel] Performance profiling of tile low rank cholesky (HiCMA) with StarPU on distributed memory systems, Samuel Thibault, 26/04/2018
- Re: [Starpu-devel] Performance profiling of tile low rank cholesky (HiCMA) with StarPU on distributed memory systems, Hatem Ltaief, 26/04/2018
- Re: [Starpu-devel] Performance profiling of tile low rank cholesky (HiCMA) with StarPU on distributed memory systems, Samuel Thibault, 26/04/2018
Archives gérées par MHonArc 2.6.19+.