Objet : Developers list for StarPU
Archives de la liste
- From: Berenger Bramas <berenger.bramas@inria.fr>
- To: Samuel Thibault <samuel.thibault@inria.fr>
- Cc: starpu-devel@lists.gforge.inria.fr
- Subject: Re: [Starpu-devel] Worker Binding Problem
- Date: Wed, 9 Sep 2015 11:42:01 +0200 (CEST)
- List-archive: <http://lists.gforge.inria.fr/pipermail/starpu-devel/>
- List-id: "Developers list. For discussion of new features, code changes, etc." <starpu-devel.lists.gforge.inria.fr>
Thanks for you answer.
Yes, my CPU has two cores (and four threads using hyperthreading),
The thing is that I try to make the same behavior for my application from the
user point of view using OpenMP or StarPU,
and I was expecting that the first threads are bind over the different cores
and then over the different SMT (if enabled)
considering it may not be better to use the SMT but it may not be worse too.
In my case with a very optimized kernel based on AVX (I remind with the same
amount of work per thread)
In sequential the thread takes 0.757221s
using 4 threads with SMT 1.3s
but using 4 threads without SMT 1.5s.
So that is OK, in my case if the user ask for more threads than the number of
cores because he want to use SMT I will do a manual binding.
But I still have some troubles on a 2 Dodeca-core Haswell Intel® Xeon®
E5-2680 (with hyperthreading disabled)(notice I updated my hwlock to 1.11)
I attach to this email a very simple test case which simply perform a spin
loop (with a volatile to consume some memory traffic)
The OpenMP Code is as follow
===========================================================
// Execute computation with openmp
#pragma omp parallel num_threads(numThreads)
{
// Wait all
#pragma omp barrier
// Compute
const double startTime = omp_get_wtime();
work();
const double endTime = omp_get_wtime();
// Print time
#pragma omp critical(PRINT)
{
std::cout << "[" << omp_get_thread_num() << "]Done " << " = " <<
(endTime-startTime) << "s" << std::endl;
}
}
===========================================================
Compilation is done with:
$ g++ -Wall -fopenmp -std=c++11 -mavx testStarPUOpenMPv2.cpp -o
testStarPUOpenMPv2.exe -lgomp
-I/projets/scalfmm/Starpu/StarPU/installwithfxtverbose/include/starpu/1.3/
-L/projets/scalfmm/Starpu/StarPU/installwithfxtverbose/lib/ -lstarpu-1.3
The number of threads can be chosen from the openmp env variable
OMP_NUM_THREADS:
$ OMP_NUM_THREADS=24 OMP_PROC_BIND=TRUE ./testStarPUOpenMPv2.exe
>From the results below we can see that some worker failed to do their job in
>the normal time.
With two threads it is OK:
Test with OpenMP:
[0]Done = 0.826635s
[1]Done = 0.828059s
Starpu:
[0]Done = 0.813087s
[1]Done = 0.826654s
With three is start to be strange:
Test with OpenMP:
[0]Done = 0.825707s
[2]Done = 0.826624s
[1]Done = 0.827749s
Starpu:
[2]Done = 0.826262s
[0]Done = 0.826653s
[1]Done = 1.64255s
With 8:
Test with OpenMP:
[0]Done = 0.819424s
[6]Done = 0.826602s
[4]Done = 0.826648s
[5]Done = 0.826813s
[3]Done = 0.826964s
[1]Done = 0.827056s
[2]Done = 0.827338s
[7]Done = 0.827339s
Starpu:
[5]Done = 0.825804s
[4]Done = 0.82598s
[7]Done = 0.826542s
[6]Done = 0.826605s
[0]Done = 0.826869s
[2]Done = 1.6363s
[1]Done = 1.64211s
[3]Done = 1.64718s
Finally with 24:
Test with OpenMP:
[0]Done = 0.814564s
[23]Done = 0.826475s
[9]Done = 0.826668s
[22]Done = 0.826724s
[16]Done = 0.826892s
[18]Done = 0.82697s
[7]Done = 0.826996s
[17]Done = 0.827005s
[12]Done = 0.827061s
[20]Done = 0.827099s
[13]Done = 0.827134s
[1]Done = 0.827237s
[15]Done = 0.82724s
[11]Done = 0.827262s
[21]Done = 0.82729s
[19]Done = 0.827358s
[6]Done = 0.827403s
[4]Done = 0.827468s
[10]Done = 0.827473s
[8]Done = 0.827505s
[14]Done = 0.827493s
[3]Done = 0.827588s
[5]Done = 0.827671s
[2]Done = 0.829008s
StarPU:
[0]Done = 0.82736s
[17]Done = 1.61875s
[19]Done = 1.6194s
[4]Done = 1.63148s
[23]Done = 1.63316s
[22]Done = 1.63799s
[18]Done = 1.63861s
[6]Done = 1.63878s
[3]Done = 1.63924s
[11]Done = 1.63968s
[15]Done = 1.64206s
[7]Done = 1.64275s
[12]Done = 1.64313s
[5]Done = 1.64386s
[13]Done = 1.64457s
[21]Done = 1.64462s
[20]Done = 1.64464s
[1]Done = 1.64517s
[8]Done = 1.64626s
[10]Done = 1.64649s
[14]Done = 1.64715s
[2]Done = 1.64786s
[9]Done = 1.65221s
[16]Done = 1.64785s
The binding is correct (from logical CPU 0 to 23)
So I do not know what I am doing wrong but openmp do the job as expected and
starpu does not.
So must have forgot something in my code, but I have the same behavior with
my real application.
Thanks.
Bérenger Bramas
HiePACS Project
Tel (05 24 57) 40 76
INRIA BORDEAUX Sud Ouest
----- Mail original -----
| De: "Samuel Thibault" <samuel.thibault@inria.fr>
| À: "Berenger Bramas" <berenger.bramas@inria.fr>
| Cc: starpu-devel@lists.gforge.inria.fr
| Envoyé: Mercredi 9 Septembre 2015 10:13:04
| Objet: Re: [Starpu-devel] Worker Binding Problem
|
| Berenger Bramas, le Wed 09 Sep 2015 10:08:06 +0200, a écrit :
| > My processor is a: Intel® Core™ i7-4610M CPU @ 3.00GHz × 4
|
| This CPU has only two cores, doesn't it? (I don't think anybody has
| worked on distributing workers on the threads of the same cores) Or
| do you mean you have 4 sockets and thus 8 cores? Do you pass any
| environment variable to the program in the failing case?
|
| Samuel
|
// // Compilation: // g++ -Wall -fopenmp -std=c++11 -mavx testStarPUOpenMPv2.cpp -o testStarPUOpenMPv2.exe -lgomp -I/projets/scalfmm/Starpu/StarPU/installwithfxtverbose/include/starpu/1.3/ -L/projets/scalfmm/Starpu/StarPU/installwithfxtverbose/lib/ -lstarpu-1.3 // // Examples: // OMP_NUM_THREADS=4 ./testStarPUOpenMPv2.exe // OMP_NUM_THREADS=3 OMP_PROC_BIND=TRUE numactl -l ./testStarPUOpenMPv2.exe #include <cassert> #include <memory> #include <omp.h> #include <cstdlib> #include <unistd.h> #include <iostream> #include <cstring> #include <starpu.h> static void BindToFunc(void */*buffers*/[], void *cl_arg){ void* ptr; starpu_codelet_unpack_args(cl_arg, &ptr); std::function<void(void)>* func = (std::function<void(void)>*) ptr; (*func)(); } void work(){ size_t Nb = 100000000; volatile size_t idx = 0; while(idx != Nb){ idx += 1; idx -= 1; idx += 1; } } void runs(){ const int numThreads = omp_get_max_threads(); std::cout << "Test with OpenMP:" << std::endl; // Execute computation with openmp #pragma omp parallel num_threads(numThreads) { // Wait all #pragma omp barrier // Compute const double startTime = omp_get_wtime(); work(); const double endTime = omp_get_wtime(); // Print time #pragma omp critical(PRINT) { std::cout << "[" << omp_get_thread_num() << "]Done " << " = " << (endTime-startTime) << "s" << std::endl; } } std::cout << "Test with StarPU:" << std::endl; { // Init starpu struct starpu_conf conf; assert(starpu_conf_init(&conf) == 0); // Use the same number of threads conf.ncpus = numThreads; assert(starpu_init(&conf) == 0); // We need a barrier and a mutex starpu_pthread_barrier_t barr; assert(starpu_pthread_barrier_init(&barr, NULL, numThreads) == 0); starpu_pthread_mutex_t printMutex; starpu_pthread_mutex_init(&printMutex, NULL); // Core part std::function<void(void)> func = [&](){ // Barrier between workers int ret = starpu_pthread_barrier_wait(&barr); assert(ret == 0 || ret == PTHREAD_BARRIER_SERIAL_THREAD); // Compute const double startTime = omp_get_wtime(); work(); const double endTime = omp_get_wtime(); // Print res starpu_pthread_mutex_lock(&printMutex); { std::cout << "[" << starpu_worker_get_id() << "]Done " << " = " << (endTime-startTime) << "s" << std::endl; } starpu_pthread_mutex_unlock(&printMutex); }; // Create a codelete to call the functional starpu_codelet perWorkerCodelet; memset(&perWorkerCodelet, 0, sizeof(perWorkerCodelet)); perWorkerCodelet.cpu_funcs[0] = BindToFunc; perWorkerCodelet.where |= STARPU_CPU; perWorkerCodelet.nbuffers = 0; perWorkerCodelet.name = "perWorkerCodelet"; // Insert one task per worker for(int idxThread = 0 ; idxThread < numThreads ; ++idxThread){ struct starpu_task* const task = starpu_task_create(); task->cl = &perWorkerCodelet; // Store args values void* funcptr = (void*)&func; starpu_codelet_pack_args((void**)&task->cl_arg, &task->cl_arg_size, STARPU_VALUE, &funcptr, sizeof(void*), 0); // This task is only for one worker task->execute_on_a_specific_worker = 1; task->workerid = idxThread; assert(starpu_task_submit(task) == 0); } // Wait all starpu_task_wait_for_all(); starpu_shutdown(); // Dealloc starpu_pthread_mutex_destroy(&printMutex); starpu_pthread_barrier_destroy(&barr); } } ///////////////////////////////////////////////////////////////////////// /// Main ///////////////////////////////////////////////////////////////////////// int main(int argc, char* argv[]){ runs(); return 0; }
- [Starpu-devel] Worker Binding Problem, Berenger Bramas, 09/09/2015
- Re: [Starpu-devel] Worker Binding Problem, Samuel Thibault, 09/09/2015
- Re: [Starpu-devel] Worker Binding Problem, Berenger Bramas, 09/09/2015
- Re: [Starpu-devel] Worker Binding Problem, Samuel Thibault, 09/09/2015
- Re: [Starpu-devel] Worker Binding Problem, Samuel Thibault, 09/09/2015
- Re: [Starpu-devel] Worker Binding Problem, Samuel Thibault, 09/09/2015
- Re: [Starpu-devel] Worker Binding Problem, Berenger Bramas, 09/09/2015
- Re: [Starpu-devel] Worker Binding Problem, Samuel Thibault, 09/09/2015
- Re: [Starpu-devel] Worker Binding Problem, Berenger Bramas, 09/09/2015
- Re: [Starpu-devel] Worker Binding Problem, Samuel Thibault, 09/09/2015
- Re: [Starpu-devel] Worker Binding Problem, Berenger Bramas, 09/09/2015
- Re: [Starpu-devel] Worker Binding Problem, Andra Hugo, 09/09/2015
- Re: [Starpu-devel] Worker Binding Problem, Samuel Thibault, 09/09/2015
- Re: [Starpu-devel] Worker Binding Problem, Andra Hugo, 09/09/2015
- Re: [Starpu-devel] Worker Binding Problem, Berenger Bramas, 09/09/2015
- Re: [Starpu-devel] Worker Binding Problem, Samuel Thibault, 09/09/2015
- Re: [Starpu-devel] Worker Binding Problem, Berenger Bramas, 09/09/2015
- Re: [Starpu-devel] Worker Binding Problem, Samuel Thibault, 09/09/2015
Archives gérées par MHonArc 2.6.19+.