Objet : Developers list for StarPU
Archives de la liste
- From: David <dstrelak@cnb.csic.es>
- To: starpu-devel@inria.fr
- Subject: [starpu-devel] Suspicious check in the cuda memory allocation routine
- Date: Wed, 12 May 2021 13:04:43 +0200
- Authentication-results: mail3-smtp-sop.national.inria.fr; spf=None smtp.pra=dstrelak@cnb.csic.es; spf=Pass smtp.mailfrom=dstrelak@cnb.csic.es; spf=None smtp.helo=postmaster@cel1.sgai.csic.es
- Ironport-hdrordr: A9a23:/FXNba3YeVip0tAe5rtsYwqjBKkkLtp133Aq2lEZdPUnSL37qy nOpoV56faaslgssR0b9OxoW5PhfZq/z/FICOAqVN/IYOCMggeVxe9Zh7cKjweAJxHD
- Ironport-phdr: A9a23:2DhEkxd+dK6toQUJbyXHzF11lGM+H97LVj580XLHo4xHfqnrxZn+JkuXvawr0AaYG9+Gsrkd0rWempujcFRI2YyGvnEGfc4EfD4+ouJSoTYdBtWYA1bwNv/gYn9yNs1DUFh44yPzahANS47xaFLIv3K98yMZFAnhOgppPOT1HZPZg9iq2+yo9JDffgFFiCCjbb5yMRm6ohjdutQUjIB/Nqs/1xzFr2dSde9L321oP1WTnxj95se04pFu9jlbtuwi+cBdT6j0Zrw0QrNEAjsoNWA1/9DrugLYTQST/HscU34ZnQRODgPY8Rz1RJbxsi/9tupgxCmXOND9QL4oVTi+6apgVRjnhj0HNz4462HXis1wjKRUoB64uxFyzY3ZbJ+MOPZgY6/WYNcWSGRdUspNUiBMBJ63YYkSAOobJetWsYnzqUYNoxWwCwajC//gxyRSiXPqx6A3yfgtHA/E0QEmAtkAsG7UrNLwNKoKX+661rfIzTTeYPhL3jry9ZPIcgw7of6SWbJ/a8nRxFIzGAPDlVqcs5DqPyiU1usRqWSU8fFgWPuphmU6pA5/viKhyd0wionVmI0V0FbE+D12zoorKtO1SEx2bcCmHZZRtCyXNIR4T8MiTWxnpSs3xbILtJq5cSUKyJoqxBDRZvKIfoSU/h7uV/udLzd8iXxrfr+0mhi88U+lyuLmV8m01k5HritDktnWt3ACzQbf6sadSvdl8Ueh3jGP1w/X6u5aO0w7ia3bK5snz7UtlZQTqVzOEjL3lUnrlqOaa1go9vWy5+j6frnqvJ+ROo1shgz8MKkigNKzDfo7PwQUQWSW+eqx2Kfj8EHlRrhBk+c4nbPDsJ/AIMQWvq65DBFR0oYk8xuwEymp0NAFkXUdN1JFdwiIj4nzN17SO/D4DOuwj06ynzdw3/zGP7vhDYvRLnXbjbvsfKtx51RBxAYu0NxT/Y9YBq0bLP/zWEL9rNnYAQU4MwywzebnEtJ91oYGVG2SGa+ZLLnSvkGM5u01IOmBf5MauDDmJPQ/+/Huln45lkMHcaa3xpsbcGq4Eeh+I0WFfXrshc8MEXwWvgUkVuzqkECCXSdOaHmsQaIz+DU7BZm9DYbDQ4CtmKaO0D26Hp1QfGBGC0qDHW3md4WeCL8wb3eJPsZ7iiFBWbW/RosJ1Be0tQa8xaA0APDT/3g0vIzi2JAh6/bPjxw2/D9cBMKXlWqWTmoyk3heFGx+57x2vUEokwTL6qN/mfENToQLjxumegw8P9jX1O1xTd3pCFupljahQ1uiG5O9DjgwCNkqwtRIbl0vQ71KazjI2SDsCKQUnPqGH85smp8=
- Ironport-sdr: toez33fwa5RWLWdPwaRRIj+hm/EcL81AUxyk44PcO6vISHD5tu0o/0L8UE3DBV7ZBUNH6YwfX6 5YDsApwQNeBX5m5M3UM7gOHod3IqHdT7uWZYDiSWRsK3pQ60TOSIgR7YExAm6SUYUBdNDOGnL7 zhlK98/0r0r2NxF62DJXYm3InHcrt5FEhsoSm9/WPaegPXaE2z8S0mxTLkGl30QNB0wYcdBL9K WsCxDA7uw66HABc7cyLZYvxP6JNSQYbi+LH64Qm5XEr+oVGT3J+PZNClrdbyDXq+m3AS73dHNk anU=
Dear sir, madam,
we're currently testing StarPU v. 1.3.7.
We have noticed that when StarPU tries to allocate memory on a cuda, it performs the following check:
if (status == cudaSuccess && cuda_mem_free < (size*2))
{
addr = 0;
}
(see https://gitlab.inria.fr/starpu/starpu/-/blame/starpu-1.3.7/src/drivers/cuda/driver_cuda.c#L1757)
Notice
that StarPU is checking if twice as much memory than it's being
requested is currently available.
In
our use case, we tried to allocate roughly 2GB block on a card
with 5,9GB of memory, but less than 4GB were available at that
time, so this check succeeded and hence no memory was allocated.
We
then saw the following message:
[starpu][_starpu_memory_reclaim_generic]
Not enough memory left on node CUDA 0. Your application data set
seems too huge to fit on the device, StarPU will cope by trying to
purge 4313 MiB out. This message will not be printed again for
further purges
And
the application got stuck (kept polling) at the memory allocation
routine (our implementation of the
starpu_data_interface_ops.allocate_data_on_node, at the end of
this email for reference).
Is
the '2' in the 'cuda_mem_free < (size*2)' there on purpose? We
did not find anything related to this in the documentation.
Thanks
for the answer.
KR,
David
Strelak
------------------------
template<typename
T>
static starpu_ssize_t payload_allocate_data_on_node(void *data_interface, unsigned node)
{
auto *interface = reinterpret_cast<umpalumpa::data::Payload<T> *>(data_interface);
starpu_ssize_t requested_memory = interface->dataInfo.bytes;
void *data = "reinterpret_cast<void" *>(starpu_malloc_on_node(node, requested_memory));
if (nullptr == data) return -ENOMEM;
/* update the data properly in consequence */
interface->data = "data;
return requested_memory;
}
static starpu_ssize_t payload_allocate_data_on_node(void *data_interface, unsigned node)
{
auto *interface = reinterpret_cast<umpalumpa::data::Payload<T> *>(data_interface);
starpu_ssize_t requested_memory = interface->dataInfo.bytes;
void *data = "reinterpret_cast<void" *>(starpu_malloc_on_node(node, requested_memory));
if (nullptr == data) return -ENOMEM;
/* update the data properly in consequence */
interface->data = "data;
return requested_memory;
}
- [starpu-devel] Suspicious check in the cuda memory allocation routine, David, 12/05/2021
- Re: [starpu-devel] Suspicious check in the cuda memory allocation routine, Samuel Thibault, 15/05/2021
Archives gérées par MHonArc 2.6.19+.