Objet : Developers list for StarPU
Archives de la liste
- From: Amani Alonazi <amani.alonazi@kaust.edu.sa>
- To: Samuel Thibault <samuel.thibault@inria.fr>, Amani Alonazi <amani.alonazi@kaust.edu.sa>, starpu-devel@lists.gforge.inria.fr, Hatem Ltaief <Hatem.Ltaief@kaust.edu.sa>
- Subject: Re: [Starpu-devel] Control data movement to/from device
- Date: Wed, 17 Apr 2019 12:24:38 +0300
- Authentication-results: mail2-smtp-roc.national.inria.fr; spf=None smtp.pra=amani.alonazi@kaust.edu.sa; spf=Pass smtp.mailfrom=amani.alonazi@kaust.edu.sa; spf=None smtp.helo=postmaster@mail-ed1-f50.google.com
- Ironport-phdr: 9a23:ZeVVthJB59giQiqmx9mcpTZWNBhigK39O0sv0rFitYgfIv3xwZ3uMQTl6Ol3ixeRBMOHsqsC0LGd6vuoGTRZp8rY6DZaKN0EfiRGoP1epxYnDs+BBB+zB9/RRAt+Iv5/UkR49WqwK0lfFZW2TVTTpnqv8WxaQU2nZkJ6KevvB4Hdkdm82fys9J3PeQVIgye2ba9vIBmsogjdq9QajZFsJ6s/xRfFv3VFcPlSyW90OF6fhRnx6tqs8JJ57yhcp/ct/NNcXKvneKg1UaZWByk8PWAv483ruxjDTQ+R6XYZT24bjBlGDRXb4R/jRpv+vTf0ueR72CmBIM35Vqs0Vii476dqUxDnliEKPCMk/W7Ni8xwiKVboA+9pxF63oXZbp2ZOOZ4c6jAZt4RW3ZPUdhNWCxAGoO8bpUAD+wdPeZDsoLxo0ICoQaiCQWwAe/izCJDiH3r0q0gy+kuHg/G0w4gEdwAs3rascv7O7sJXO+v0KXF1y/OY+9K1Tr/7oXDbxAvoeuLXbJ1acfc1UwvGBnDjlWRtIfoIzeV1uMLs2eB7utgVP+khmk9pAF0uDevwMYshpPTiYIRzVDE8z92wYc0JdCjS050e8OkEIBMty2AKYR5X94iT3tzuCkg07ALv4OwcisSyJk/2RLTd/iKf5KL7x/jTuqdPyt0iXF/dL+/mxq/91WrxPfmWcmuyllKqzJIktnSuXAJ0Bze8s2HReF8/kelwDqAyR3c5vxdLUA6lafXN4QtwrE3lpoUvkTDGjH5lF/qg6+Rc0Uo4umo6+L5bbX6vpKQKZN4hwXkPqktmsGzG/o0PhQNUmSB+emwyKXv/UjjT7VLiv02nLPZsJffJckDvKG5BhVa0oAi6xqlFTim1NMYnX8dI1NEeRKKlIvpNEvTIPDjEfezmUqjnyp2x/zcJb3uGI3BLmLfn7f5YbZ990lcxRIozdBD/Z1UEKkBIO/qVkDsqtPYEAE2MwivzubjCdV9zZ8eWXmVDq+WPqPStkWI5u0xLOWWZY8Vviz9K/k/6PL0g385gwxVQa785pISdX28VshmIk+QfH7wyoMaGGIXsw54UOzrglSfVSJ7ZnCoXqt66CttTMqPCI7FR8iEhLuH1TW6VslfY2lACxaFFXDufpeFc/QFdTmOZMJtj3ofXO7lA5Q93Au2qUr2xqRqKsLQ+zYErtTs2t9v6OCVlBco9DUyAd7Zm0GEU2xx1kcUTj470ugrgUVhx1LF9bRxgvFWPd9a+uhVFAo2KNjHxropJcr1X1foe96PSR6PRtKgCCMwBoY4xtYIYgB0H9KkhwLC9ymjHqQJ0bGHGdov+/SPjDDKO89hxiOeh+EahF48T54KbDX+3/8tx03oH4fM1n6hueOvfKUY0jTK8T7fn2mHoVpEFgN8TOPeVCJGPxeEnZHC/krHCoSWJ/E/KAIYlZyBI7ZRdpvkgUgAX/6xYI2DMVL0oH+5AFOz/p3JbIfufD9AjiDUCUxBngxKuHjbbU4xASCup2+YBztrRwri
- List-archive: <http://lists.gforge.inria.fr/pipermail/starpu-devel/>
- List-id: "Developers list. For discussion of new features, code changes, etc." <starpu-devel.lists.gforge.inria.fr>
Is starpu trying to maintain the same data buffer coherent on all available memory spaces?
On Wed, Apr 17, 2019 at 11:55 AM Amani Alonazi <amani.alonazi@kaust.edu.sa> wrote:
The problem is using 4 gpus (4N workload) takes longer time than using 1 gpu (N workload). The application in theory requires only 600GB to be in gpu and then back to CPU memory. The data movement is very large in case of 4 gpus:1 gpu -> Data transfer stats:
NUMA 0 -> CUDA 0 660.2479 GB 686.1818 MB/s (transfers : 507 - avg 1333.5185 MB)
CUDA 0 -> NUMA 0 655.9976 GB 681.7644 MB/s (transfers : 502 - avg 1338.1305 MB)
Disk 0 -> NUMA 0 0.0000 GB 0.0000 MB/s (transfers : 0 - avg nan MB)
NUMA 0 -> Disk 0 0.0000 GB 0.0000 MB/s (transfers : 0 - avg nan MB)4 gpus -> Data transfer stats:
NUMA 0 -> CUDA 0 2870.3994 GB 1808.5446 MB/s (transfers : 2780 - avg 1057.2982 MB)
CUDA 0 -> NUMA 0 2353.4680 GB 1482.8431 MB/s (transfers : 1893 - avg 1273.0857 MB)
NUMA 0 -> CUDA 1 2453.7686 GB 1546.0391 MB/s (transfers : 2366 - avg 1061.9861 MB)
CUDA 1 -> NUMA 0 2117.4614 GB 1334.1430 MB/s (transfers : 1696 - avg 1278.4673 MB)
CUDA 0 -> CUDA 1 566.6113 GB 357.0032 MB/s (transfers : 501 - avg 1158.1038 MB)
CUDA 1 -> CUDA 0 614.7939 GB 387.3615 MB/s (transfers : 553 - avg 1138.4248 MB)
NUMA 0 -> CUDA 2 2538.7312 GB 1599.5713 MB/s (transfers : 2506 - avg 1037.3746 MB)
CUDA 2 -> NUMA 0 2063.7825 GB 1300.3216 MB/s (transfers : 1654 - avg 1277.6985 MB)
CUDA 0 -> CUDA 2 587.0763 GB 369.8975 MB/s (transfers : 517 - avg 1162.7971 MB)
CUDA 2 -> CUDA 0 603.6446 GB 380.3367 MB/s (transfers : 554 - avg 1115.7618 MB)
CUDA 1 -> CUDA 2 463.2246 GB 291.8626 MB/s (transfers : 399 - avg 1188.8271 MB)
CUDA 2 -> CUDA 1 610.2127 GB 384.4750 MB/s (transfers : 551 - avg 1134.0432 MB)
NUMA 0 -> CUDA 3 2419.3684 GB 1524.3646 MB/s (transfers : 2377 - avg 1042.2521 MB)
CUDA 3 -> NUMA 0 2139.2100 GB 1347.8460 MB/s (transfers : 1714 - avg 1278.0344 MB)
CUDA 0 -> CUDA 3 525.6738 GB 331.2098 MB/s (transfers : 436 - avg 1234.6099 MB)
CUDA 3 -> CUDA 0 541.2795 GB 341.0425 MB/s (transfers : 467 - avg 1186.8742 MB)
CUDA 1 -> CUDA 3 471.3506 GB 296.9826 MB/s (transfers : 403 - avg 1197.6751 MB)
CUDA 3 -> CUDA 1 448.6032 GB 282.6502 MB/s (transfers : 372 - avg 1234.8648 MB)
CUDA 2 -> CUDA 3 414.4608 GB 261.1382 MB/s (transfers : 366 - avg 1159.5844 MB)
CUDA 3 -> CUDA 2 421.8076 GB 265.7671 MB/s (transfers : 353 - avg 1223.6007 MB)
Disk 0 -> NUMA 0 0.0000 GB 0.0000 MB/s (transfers : 0 - avg nan MB)
NUMA 0 -> Disk 0 0.0000 GB 0.0000 MB/s (transfers : 0 - avg nan MB)Do you know how can I fix this?On Tue, Apr 16, 2019 at 4:45 PM Samuel Thibault <samuel.thibault@inria.fr> wrote:Amani Alonazi, le mar. 16 avril 2019 16:19:12 +0300, a ecrit:
> Ok I got something. Setting the following variable help with the data movement.
> STARPU_SCHED_BETA=200
That's a quite strong value. Are the bandwidth values reported by
starpu_machine_display meaningful for the disk being considered?
(you will need the current 1.3 branch, because it seems nobody thought
about including disks in its output, so I added it just now).
> Is there any other way to enforce locality while using lws/dmdar?
lws does not actually care about data, it just happens to try to keep
tasks together. For dmdar this is indeed the proper way to make locality
stronger.
Now, if even dmdar does not manage to get good locality, this is a
testcase we would be interested in to improve our scheduling heuristics.
Samuel
--
- Re: [Starpu-devel] Control data movement to/from device, Amani Alonazi, 01/04/2019
- Re: [Starpu-devel] Control data movement to/from device, Samuel Thibault, 01/04/2019
- Re: [Starpu-devel] Control data movement to/from device, Amani Alonazi, 01/04/2019
- Re: [Starpu-devel] Control data movement to/from device, Samuel Thibault, 01/04/2019
- Re: [Starpu-devel] Control data movement to/from device, Amani Alonazi, 16/04/2019
- Re: [Starpu-devel] Control data movement to/from device, Samuel Thibault, 16/04/2019
- Re: [Starpu-devel] Control data movement to/from device, Amani Alonazi, 17/04/2019
- Re: [Starpu-devel] Control data movement to/from device, Amani Alonazi, 17/04/2019
- Re: [Starpu-devel] Control data movement to/from device, Samuel Thibault, 17/04/2019
- Re: [Starpu-devel] Control data movement to/from device, Amani Alonazi, 17/04/2019
- Re: [Starpu-devel] Control data movement to/from device, Amani Alonazi, 17/04/2019
- Re: [Starpu-devel] Control data movement to/from device, Samuel Thibault, 18/04/2019
- Re: [Starpu-devel] Control data movement to/from device, Samuel Thibault, 16/04/2019
- Re: [Starpu-devel] Control data movement to/from device, Amani Alonazi, 16/04/2019
- Re: [Starpu-devel] Control data movement to/from device, Samuel Thibault, 01/04/2019
- Re: [Starpu-devel] Control data movement to/from device, Amani Alonazi, 01/04/2019
- Re: [Starpu-devel] Control data movement to/from device, Samuel Thibault, 01/04/2019
Archives gérées par MHonArc 2.6.19+.