Accéder au contenu.
Menu Sympa

starpu-devel - [Starpu-devel] CPU core binding mask and worker creation.

Objet : Developers list for StarPU

Archives de la liste

[Starpu-devel] CPU core binding mask and worker creation.


Chronologique Discussions 
  • From: Mirko Myllykoski <mirkom@cs.umu.se>
  • To: Starpu Devel <starpu-devel@lists.gforge.inria.fr>
  • Subject: [Starpu-devel] CPU core binding mask and worker creation.
  • Date: Wed, 24 Oct 2018 09:45:41 +0200
  • Authentication-results: mail2-smtp-roc.national.inria.fr; spf=None smtp.pra=mirkom@cs.umu.se; spf=Pass smtp.mailfrom=mirkom@cs.umu.se; spf=None smtp.helo=postmaster@mail.cs.umu.se
  • Ironport-phdr: 9a23:lQhq1Rwcl5frxJLXCy+O+j09IxM/srCxBDY+r6Qd2+oUIJqq85mqBkHD//Il1AaPAd2Eraocw8Pt8InYEVQa5piAtH1QOLdtbDQizfssogo7HcSeAlf6JvO5JwYzHcBFSUM3tyrjaRsdF8nxfUDdrWOv5jAOBBr/KRB1JuPoEYLOksi7ze+/94HRbglSmDaxfa55IQmrownWqsQYm5ZpJLwryhvOrHtIeuBWyn1tKFmOgRvy5dq+8YB6/ShItP0v68BPUaPhf6QlVrNYFygpM3o05MLwqxbOSxaE62YGXWUXlhpIBBXF7A3/U5zsvCb2qvZx1S+HNsDtU7s6RSqt4LtqSB/wiScIKTg58H3MisdtiK5XuQ+tqwBjz4LRZoyeKfhwcb7Hfd4CSmVOQshfWSxfDI2hbIUOAOQOMP1dr4XhpVsDtweyCBOyCO7p1zRGhmX23ao/0+k5Fg/G3RYgH9EJsH/Jq9v0NKMSUeS1zanLyjXDdPBW2Tbg44XPdxAhoOuMXbF3ccrU0kQgCxjFgk+NqYzgIjOZzP8NvHaC4udmSOmhiHYnphlsrjWh2ssgkJfFi4wRx1ze+ih13Jw5KcO4RUJjf9KoDIdcuzyfOoZ3WcwuX2Rltzg/x7AJpZK2czQGxZEiyhPeaPGKdZWD7Aj5W+aLOzh4gWpoeLKhiBa29kit0uj8WdO10FZOtCZKjsLMumoQ1xzW98iLUOB98Vm51TaO0QDc9P1ELFgpmafVN5It2KA8m5QXvEjZESL6hF/6gLGWe0k8/+in8eXnYrHopp+GMI90jxnzMqEvmsylAuQ4NQ0OUnOH9uSnzrHj4Ej5QKhQgv0tjKbVqIraKtgDpq6lHw9V1Z4u6w2jDzi8ytQYhWQHIEtYdx2ZkYjmJVXOLev8Dfe+mFSsjCxry+rJPr3vBZXNNHfDn6n7cbZ87U5c0gszwspF65JaELFSaM70D3TtvcbAE1cVPhK5x66zE8l0zJsDHG6CHKKdGKfTqkOToO0hJPODackUviz8Ir4r/ai9o2U+nAo4fLOq2tMyeXS8D/dhOEaYKS7pg80CFmIitRF4Uen3zkaPB20AL02uVr4xs2loQLmtCp3OE8X02OTYjXWLW6ZOb2UDMWiiVHLhdoGKQfAJMXvAKdQnjzkZE6OsGdZ4iUOe8TTiwr8iFdL6vzUCvMu6htNuofDWiFcp+G4sVpnP4yS2V2hx21gwaXo20aR4+BIvz16C1e5zmLpFEMEV/P4bCgo=
  • List-archive: <http://lists.gforge.inria.fr/pipermail/starpu-devel/>
  • List-id: "Developers list. For discussion of new features, code changes, etc." <starpu-devel.lists.gforge.inria.fr>

Hi,

I have noticed that StarPU does not take into account the core/PU binding mask when it creates the worker threads. That is, if I launch my StarPU-based application by setting the binding mask (for example, hwloc-bind 0x00000001 ./my_application ...), then I always get as many worker threads as there are CPU cores in my machine and StarPU attempts to bind those worker threads to cores/PUs that are not in the binding mask. In most cases, a batch system sets the binding mask.

I have thus far solved the problem by setting the STARPU_WORKERS_NOBIND environmental variable to 1 and setting the correct number of workers manually.

I looked into StarPU source code base (1.2.6) and realized that I could fix the problem as follows:

1) Modify src/core/topology.c::_starpu_initialize_workers_bindid such that only those cores, that contain enough PUs that are in the PU binding mask (STARPU_NTHREADS_PER_CORE), are added to topology->workers_bindid.

2) Modify src/core/topology.c::_starpu_init_machine_config such that only those cores, that contain enough PUs that are in the PU binding mask, are counted in avail_cpus.

I have attached the patch to this email.

The modification seems to do the job (I also tested it with a hyperthreaded CPU) but the overall worker creating and binding process seems to be quite complex so I am not 100% sure about my modifications.

You are welcome to use the patch if you find it useful.

Best Regards,
Mirko Myllykoski
--- standard/src/core/topology.c	2018-09-21 11:18:04.000000000 +0200
+++ patched/src/core/topology.c	2018-10-23 20:53:19.059239797 +0200
@@ -672,6 +672,48 @@
 		int nhyperthreads = topology->nhwpus / topology->nhwcpus;
 		STARPU_ASSERT_MSG(nth_per_core > 0 && nth_per_core <= nhyperthreads , "Incorrect number of hyperthreads");
 
+#ifdef STARPU_HAVE_HWLOC
+		hwloc_cpuset_t master = hwloc_bitmap_alloc();
+		hwloc_get_cpubind(topology->hwtopology, master, HWLOC_CPUBIND_THREAD);
+		int j, avail;
+
+		i = 0; /* Core number currently assigned */
+		while (nbindids < STARPU_NMAXWORKERS)
+		{
+			/* Count how many PUs the core contains */ 
+			avail = 0;
+			for (j = 0; j < nhyperthreads; j++)
+			{
+				if (hwloc_bitmap_isset(master, (i*nhyperthreads+j) % topology->nhwpus))
+				{
+					avail++;
+				}
+			}
+
+			/* Skip the core if it does not contain enough PUs */
+			if (avail < nth_per_core)
+			{
+				i++;
+				continue;
+			}
+
+			/* Map workers to the available PUs */
+			k = 0;
+			for (j = 0; k < nth_per_core && nbindids < STARPU_NMAXWORKERS; j++)
+			{
+				if (hwloc_bitmap_isset(master, (i*nhyperthreads+j) % topology->nhwpus))
+				{
+					topology->workers_bindid[nbindids] = (unsigned)((i*nhyperthreads+j) % topology->nhwpus);
+					nbindids++;
+					k++;
+				}
+			}
+
+			i++;
+		}
+
+		hwloc_bitmap_free(master);
+#else
 		i = 0; /* PU number currently assigned */
 		k = 0; /* Number of threads already put on the current core */
 		while(nbindids < STARPU_NMAXWORKERS)
@@ -692,6 +734,7 @@
 			k++;
 			i++;
 		}
+#endif
 	}
 
 	for (i = 0; i < STARPU_MAXCPUS;i++)
@@ -1212,10 +1255,42 @@
 			unsigned already_busy_cpus = mic_busy_cpus + topology->ncudagpus
 				+ topology->nopenclgpus + topology->nsccdevices;
 
+			int nth_per_core = starpu_get_env_number_default("STARPU_NTHREADS_PER_CORE", 1);
+
+#ifdef STARPU_HAVE_HWLOC
+			hwloc_cpuset_t master = hwloc_bitmap_alloc();
+			hwloc_get_cpubind(topology->hwtopology, master, HWLOC_CPUBIND_THREAD);
+
+			unsigned i, avail;
+			unsigned nhyperthreads = topology->nhwpus / topology->nhwcpus;
+
+            /* Count how many cores contain enough PUs */
+			long avail_cpus = 0;
+			for (i = 0; i < topology->nhwcpus; i++)
+			{
+				avail = 0;
+				for (j = 0; j < nhyperthreads; j++)
+				{
+					if (hwloc_bitmap_isset(master, i * nhyperthreads + j))
+					{
+						avail++;
+					}
+				}
+
+				if (avail >= nth_per_core)
+				{
+					avail_cpus++;
+				}
+			}
+
+			avail_cpus -= (long) already_busy_cpus;
+
+			hwloc_bitmap_free(master);
+#else
 			long avail_cpus = (long) topology->nhwcpus - (long) already_busy_cpus;
+#endif
 			if (avail_cpus < 0)
 				avail_cpus = 0;
-			int nth_per_core = starpu_get_env_number_default("STARPU_NTHREADS_PER_CORE", 1);
 			avail_cpus *= nth_per_core;
 			ncpu = STARPU_MIN(avail_cpus, STARPU_MAXCPUS);
 		}



Archives gérées par MHonArc 2.6.19+.

Haut de le page