Accéder au contenu.
Menu Sympa

starpu-devel - Re: [Starpu-devel] MPI tags limitation with Starpu

Objet : Developers list for StarPU

Archives de la liste

Re: [Starpu-devel] MPI tags limitation with Starpu


Chronologique Discussions 
  • From: Mathieu Faverge <mathieu.faverge@inria.fr>
  • To: Sameh Abdulah <sameh.abdulah@kaust.edu.sa>, Nathalie Furmento <nathalie.furmento@labri.fr>
  • Cc: starpu-devel@lists.gforge.inria.fr, morse-devel@lists.gforge.inria.fr
  • Subject: Re: [Starpu-devel] MPI tags limitation with Starpu
  • Date: Mon, 20 Feb 2017 11:28:58 +0100
  • List-archive: <http://lists.gforge.inria.fr/pipermail/starpu-devel/>
  • List-id: "Developers list. For discussion of new features, code changes, etc." <starpu-devel.lists.gforge.inria.fr>

Hello Sameh,

This is a limitation from StarPU with the number of tags to describe the data, so we protect it in Chameleon. There are multiple solutions to your problem.

First, as you discovered, the tag to identify each data is decomposed in two segments. A portion of the bits are dedicated to identify the descriptor, the other are used to identify the tile. The default is 32bits to define the descriptor, and 24 for the tiles of each descriptor, based on the fact that the MPI implementation will give us 32 bits to tag the messages.

You can adjust this by calling
RUNTIME_user_tag_size(int user_tag_width, int user_tag_sep);

To come back to the default cited previously, user_tag_width is equal to 31 as the maximum provided by MPI implementations, and user_tag_sep is set to 24 which defines the number of tiles per descriptor. You can lower this value to better fit the number of tiles you have per descriptor.

So in your case:
RUNTIME_user_tag_size(31, 19); should work, because you will have  524288 tiles per descriptor possible (more than your 390625 tiles in your configuration), and if MPI gives you 32 bits for the descriptor, you should have 4096 descriptors, so more than the 50 you need.

Be careful, if you MPI implementation allows less than 31 bits, both values will be decreased simultaneously. Something we should probably change, and we should add a warning message, so you could see it when enabled.

I also want to point out that another solution to avoid this problem, if your factorizations are applied one after each other, is to reuse the same descriptor every time. You can reinitialize the data with the MORSE_zbuild_Tile family functions, and call multiple times you factorization on the same descriptor.

Best,
Mathieu

Le 20/02/2017 à 09:45, Sameh Abdulah a écrit :
Thank you for your reply. Actually, I am using StarPU 1.2. BTW, both messages coming from runtime_descriptor.c file. This file exists in chameleon under StarPU control directory.

--Sameh

On Mon, Feb 20, 2017 at 11:39 AM, Nathalie Furmento <nathalie.furmento@labri.fr> wrote:
Dear Sameh,

Could you please let us know which version of StarPU you are using? The version 1.2 has a new MPI internal communication system, all the MPI messages sent and received by StarPU only use 2 different tags. You should consider using this version if you are not already doing it.

I am not sure however where the error message

"Number of descriptor available in MPI mode out of stock".

comes from. I am adding Mathieu as a recipient to get help from his side.

Cheers,

Nathalie


On 19/02/2017 07:21, Sameh Abdulah wrote:
Dear Team

I am working on chameleon and starpu to do Cholesky for large matrices on distributed systems. I need to do Cholesky for several iterations (>50) for 200K matrices. My configuration is as follows:

N: 200K
Tile Size:320
Nodes 144 (with 7 cores each)

I got this error: Too many tiles in the descriptor for MPI tags. When I checked it I found that I need to change the default values of both tag_width = 31 and tag_sep =24 on runtime_descriptor.c using this function "MORSE_user_tag_size(int tag_width, int tag_sep)". I do not know what is the correct values I need for both to handle my case. I played with different numbers but I am still having the same error and if I can avoid this error using a certain values I got another error "Number of descriptor available in MPI mode out of stock".

Please can I know the best configuration for my case and if I need to use larger thank 200K matrix how can I play with these values.


Thank you.

--Sameh Abdulah
Postdoc, ECRC, KAUST.



This message and its contents, including attachments are intended solely for the original recipient. If you are not the intended recipient or have received this message in error, please notify me immediately and delete this message from your computer system. Any unauthorized use or distribution is prohibited. Please consider the environment before printing this email.

_______________________________________________
Starpu-devel mailing list
Starpu-devel@lists.gforge.inria.fr
http://lists.gforge.inria.fr/mailman/listinfo/starpu-devel


This message and its contents, including attachments are intended solely for the original recipient. If you are not the intended recipient or have received this message in error, please notify me immediately and delete this message from your computer system. Any unauthorized use or distribution is prohibited. Please consider the environment before printing this email.
-- 
--
Mathieu Faverge
Maitre de conférence / Associate Professor
Institut Polytechnique de Bordeaux - ENSEIRB-Matmeca
INRIA Bordeaux - Sud-Ouest, HiePACS Team
200 avenue de la vielle tour
33405 Talence Cedex
Phone: (+33) 5 24 57 40 73



Archives gérées par MHonArc 2.6.19+.

Haut de le page