Accéder au contenu.
Menu Sympa

starpu-devel - Re: [Starpu-devel] Control data movement to/from device

Objet : Developers list for StarPU

Archives de la liste

Re: [Starpu-devel] Control data movement to/from device


Chronologique Discussions 
  • From: Amani Alonazi <amani.alonazi@kaust.edu.sa>
  • To: Samuel Thibault <samuel.thibault@inria.fr>, Amani Alonazi <amani.alonazi@kaust.edu.sa>, starpu-devel@lists.gforge.inria.fr, Hatem Ltaief <Hatem.Ltaief@kaust.edu.sa>
  • Subject: Re: [Starpu-devel] Control data movement to/from device
  • Date: Wed, 17 Apr 2019 12:24:38 +0300
  • Authentication-results: mail2-smtp-roc.national.inria.fr; spf=None smtp.pra=amani.alonazi@kaust.edu.sa; spf=Pass smtp.mailfrom=amani.alonazi@kaust.edu.sa; spf=None smtp.helo=postmaster@mail-ed1-f50.google.com
  • Ironport-phdr: 9a23:ZeVVthJB59giQiqmx9mcpTZWNBhigK39O0sv0rFitYgfIv3xwZ3uMQTl6Ol3ixeRBMOHsqsC0LGd6vuoGTRZp8rY6DZaKN0EfiRGoP1epxYnDs+BBB+zB9/RRAt+Iv5/UkR49WqwK0lfFZW2TVTTpnqv8WxaQU2nZkJ6KevvB4Hdkdm82fys9J3PeQVIgye2ba9vIBmsogjdq9QajZFsJ6s/xRfFv3VFcPlSyW90OF6fhRnx6tqs8JJ57yhcp/ct/NNcXKvneKg1UaZWByk8PWAv483ruxjDTQ+R6XYZT24bjBlGDRXb4R/jRpv+vTf0ueR72CmBIM35Vqs0Vii476dqUxDnliEKPCMk/W7Ni8xwiKVboA+9pxF63oXZbp2ZOOZ4c6jAZt4RW3ZPUdhNWCxAGoO8bpUAD+wdPeZDsoLxo0ICoQaiCQWwAe/izCJDiH3r0q0gy+kuHg/G0w4gEdwAs3rascv7O7sJXO+v0KXF1y/OY+9K1Tr/7oXDbxAvoeuLXbJ1acfc1UwvGBnDjlWRtIfoIzeV1uMLs2eB7utgVP+khmk9pAF0uDevwMYshpPTiYIRzVDE8z92wYc0JdCjS050e8OkEIBMty2AKYR5X94iT3tzuCkg07ALv4OwcisSyJk/2RLTd/iKf5KL7x/jTuqdPyt0iXF/dL+/mxq/91WrxPfmWcmuyllKqzJIktnSuXAJ0Bze8s2HReF8/kelwDqAyR3c5vxdLUA6lafXN4QtwrE3lpoUvkTDGjH5lF/qg6+Rc0Uo4umo6+L5bbX6vpKQKZN4hwXkPqktmsGzG/o0PhQNUmSB+emwyKXv/UjjT7VLiv02nLPZsJffJckDvKG5BhVa0oAi6xqlFTim1NMYnX8dI1NEeRKKlIvpNEvTIPDjEfezmUqjnyp2x/zcJb3uGI3BLmLfn7f5YbZ990lcxRIozdBD/Z1UEKkBIO/qVkDsqtPYEAE2MwivzubjCdV9zZ8eWXmVDq+WPqPStkWI5u0xLOWWZY8Vviz9K/k/6PL0g385gwxVQa785pISdX28VshmIk+QfH7wyoMaGGIXsw54UOzrglSfVSJ7ZnCoXqt66CttTMqPCI7FR8iEhLuH1TW6VslfY2lACxaFFXDufpeFc/QFdTmOZMJtj3ofXO7lA5Q93Au2qUr2xqRqKsLQ+zYErtTs2t9v6OCVlBco9DUyAd7Zm0GEU2xx1kcUTj470ugrgUVhx1LF9bRxgvFWPd9a+uhVFAo2KNjHxropJcr1X1foe96PSR6PRtKgCCMwBoY4xtYIYgB0H9KkhwLC9ymjHqQJ0bGHGdov+/SPjDDKO89hxiOeh+EahF48T54KbDX+3/8tx03oH4fM1n6hueOvfKUY0jTK8T7fn2mHoVpEFgN8TOPeVCJGPxeEnZHC/krHCoSWJ/E/KAIYlZyBI7ZRdpvkgUgAX/6xYI2DMVL0oH+5AFOz/p3JbIfufD9AjiDUCUxBngxKuHjbbU4xASCup2+YBztrRwri
  • List-archive: <http://lists.gforge.inria.fr/pipermail/starpu-devel/>
  • List-id: "Developers list. For discussion of new features, code changes, etc." <starpu-devel.lists.gforge.inria.fr>

Is starpu trying to maintain the same data buffer coherent on all available memory spaces? 

On Wed, Apr 17, 2019 at 11:55 AM Amani Alonazi <amani.alonazi@kaust.edu.sa> wrote:
The problem is using 4 gpus (4N workload) takes longer time than using 1 gpu (N workload). The application in theory requires only 600GB to be in gpu and then back to CPU memory. The data movement is very large in case of 4 gpus:
1 gpu -> Data transfer stats:
        NUMA 0 -> CUDA 0        660.2479 GB     686.1818 MB/s   (transfers : 507 - avg 1333.5185 MB)
        CUDA 0 -> NUMA 0        655.9976 GB     681.7644 MB/s   (transfers : 502 - avg 1338.1305 MB)
        Disk 0 -> NUMA 0        0.0000 GB       0.0000 MB/s     (transfers : 0 - avg nan MB)
        NUMA 0 -> Disk 0        0.0000 GB       0.0000 MB/s     (transfers : 0 - avg nan MB)
4 gpus -> Data transfer stats:
        NUMA 0 -> CUDA 0        2870.3994 GB    1808.5446 MB/s  (transfers : 2780 - avg 1057.2982 MB)
        CUDA 0 -> NUMA 0        2353.4680 GB    1482.8431 MB/s  (transfers : 1893 - avg 1273.0857 MB)
        NUMA 0 -> CUDA 1        2453.7686 GB    1546.0391 MB/s  (transfers : 2366 - avg 1061.9861 MB)
        CUDA 1 -> NUMA 0        2117.4614 GB    1334.1430 MB/s  (transfers : 1696 - avg 1278.4673 MB)
        CUDA 0 -> CUDA 1        566.6113 GB     357.0032 MB/s   (transfers : 501 - avg 1158.1038 MB)
        CUDA 1 -> CUDA 0        614.7939 GB     387.3615 MB/s   (transfers : 553 - avg 1138.4248 MB)
        NUMA 0 -> CUDA 2        2538.7312 GB    1599.5713 MB/s  (transfers : 2506 - avg 1037.3746 MB)
        CUDA 2 -> NUMA 0        2063.7825 GB    1300.3216 MB/s  (transfers : 1654 - avg 1277.6985 MB)
        CUDA 0 -> CUDA 2        587.0763 GB     369.8975 MB/s   (transfers : 517 - avg 1162.7971 MB)
        CUDA 2 -> CUDA 0        603.6446 GB     380.3367 MB/s   (transfers : 554 - avg 1115.7618 MB)
        CUDA 1 -> CUDA 2        463.2246 GB     291.8626 MB/s   (transfers : 399 - avg 1188.8271 MB)
        CUDA 2 -> CUDA 1        610.2127 GB     384.4750 MB/s   (transfers : 551 - avg 1134.0432 MB)
        NUMA 0 -> CUDA 3        2419.3684 GB    1524.3646 MB/s  (transfers : 2377 - avg 1042.2521 MB)
        CUDA 3 -> NUMA 0        2139.2100 GB    1347.8460 MB/s  (transfers : 1714 - avg 1278.0344 MB)
        CUDA 0 -> CUDA 3        525.6738 GB     331.2098 MB/s   (transfers : 436 - avg 1234.6099 MB)
        CUDA 3 -> CUDA 0        541.2795 GB     341.0425 MB/s   (transfers : 467 - avg 1186.8742 MB)
        CUDA 1 -> CUDA 3        471.3506 GB     296.9826 MB/s   (transfers : 403 - avg 1197.6751 MB)
        CUDA 3 -> CUDA 1        448.6032 GB     282.6502 MB/s   (transfers : 372 - avg 1234.8648 MB)
        CUDA 2 -> CUDA 3        414.4608 GB     261.1382 MB/s   (transfers : 366 - avg 1159.5844 MB)
        CUDA 3 -> CUDA 2        421.8076 GB     265.7671 MB/s   (transfers : 353 - avg 1223.6007 MB)
        Disk 0 -> NUMA 0        0.0000 GB       0.0000 MB/s     (transfers : 0 - avg nan MB)
        NUMA 0 -> Disk 0        0.0000 GB       0.0000 MB/s     (transfers : 0 - avg nan MB)

Do you know how can I fix this?


On Tue, Apr 16, 2019 at 4:45 PM Samuel Thibault <samuel.thibault@inria.fr> wrote:
Amani Alonazi, le mar. 16 avril 2019 16:19:12 +0300, a ecrit:
> Ok I got something. Setting the following variable help with the data movement.
> STARPU_SCHED_BETA=200

That's a quite strong value. Are the bandwidth values reported by
starpu_machine_display meaningful for the disk being considered?
(you will need the current 1.3 branch, because it seems nobody thought
about including disks in its output, so I added it just now).

> Is there any other way to enforce locality while using lws/dmdar?

lws does not actually care about data, it just happens to try to keep
tasks together. For dmdar this is indeed the proper way to make locality
stronger.

Now, if even dmdar does not manage to get good locality, this is a
testcase we would be interested in to improve our scheduling heuristics.

Samuel


-- 


This message and its contents, including attachments are intended solely for the original recipient. If you are not the intended recipient or have received this message in error, please notify me immediately and delete this message from your computer system. Any unauthorized use or distribution is prohibited. Please consider the environment before printing this email.


Archives gérées par MHonArc 2.6.19+.

Haut de le page