Objet : Developers list for StarPU
Archives de la liste
- From: "Vergel, Julio" <Julio.Vergel@aeroflex.com>
- To: "starpu-devel@lists.gforge.inria.fr" <starpu-devel@lists.gforge.inria.fr>
- Cc: "jvergel@hotmail.com" <jvergel@hotmail.com>
- Subject: [Starpu-devel] StarPU questions
- Date: Fri, 29 Aug 2014 18:38:10 +0000
- Accept-language: en-US
- List-archive: <http://lists.gforge.inria.fr/pipermail/starpu-devel/>
- List-id: "Developers list. For discussion of new features, code changes, etc." <starpu-devel.lists.gforge.inria.fr>
Hi,
First, I would like to tell you that you have done a pretty amazing job!
I have two questions that I hope you can help me (I have attached the main C code in case that you need it)
1) I was trying to run the following tasks :
a) taskGain[0]
b) taskAddConstant which has to wait for taskGain[0] completion
c) taskGain[1] which has to wait for taskAddConstant completion
d) taskGain[2] which has to wait for taskAddConstant completion
taskGain[0]--->taskAddConstant|--------->taskGain[1]
|-------->taskGain[2]
When I build the following code (around line 307 in the attached file):
starpu_task_submit( taskGain[0] );
starpu_task_submit( taskAddConstant );
starpu_task_submit( taskGain[2] );
starpu_task_submit( taskGain[1] );
everything works OK. However if I change the order of submitting the tasks
starpu_task_submit( taskGain[2] );
starpu_task_submit( taskGain[0] );
starpu_task_submit( taskAddConstant );
starpu_task_submit( taskGain[1] );
the program hangs when i run it.
I thought that it was because of the task dependencies and then I tried (line 285 in attached file) :
struct starpu_task *depending[2]={taskGain[0] , taskAddConstant};
starpu_task_declare_deps_array ( taskGain[2] , 2, depending );
starpu_task_submit( taskGain[2] );
starpu_task_submit( taskGain[0] );
starpu_task_submit( taskAddConstant );
starpu_task_submit( taskGain[1] );
but I got the same result : the program is hanging when running.
What am I doing wrong?
The tasks are configured in lines 200,220,246 and ,266 in the attached file.
Is this the correct way to configure tasks in StarPU? I want to be able to receive tasks out of order and StarPU schedule the tasks in order based on the dependencies. What is the correct way to do it?
2) My second question is that I noticed that if I increase the value of NX (line 17) to a couple of millions, the program crashes immediately. Does StarPU have memory restrictions? How can I set StarPU to support bigger buffers?
Thanks in advance,
Julio vergel
First, I would like to tell you that you have done a pretty amazing job!
I have two questions that I hope you can help me (I have attached the main C code in case that you need it)
1) I was trying to run the following tasks :
a) taskGain[0]
b) taskAddConstant which has to wait for taskGain[0] completion
c) taskGain[1] which has to wait for taskAddConstant completion
d) taskGain[2] which has to wait for taskAddConstant completion
taskGain[0]--->taskAddConstant|--------->taskGain[1]
|-------->taskGain[2]
When I build the following code (around line 307 in the attached file):
starpu_task_submit( taskGain[0] );
starpu_task_submit( taskAddConstant );
starpu_task_submit( taskGain[2] );
starpu_task_submit( taskGain[1] );
everything works OK. However if I change the order of submitting the tasks
starpu_task_submit( taskGain[2] );
starpu_task_submit( taskGain[0] );
starpu_task_submit( taskAddConstant );
starpu_task_submit( taskGain[1] );
the program hangs when i run it.
I thought that it was because of the task dependencies and then I tried (line 285 in attached file) :
struct starpu_task *depending[2]={taskGain[0] , taskAddConstant};
starpu_task_declare_deps_array ( taskGain[2] , 2, depending );
starpu_task_submit( taskGain[2] );
starpu_task_submit( taskGain[0] );
starpu_task_submit( taskAddConstant );
starpu_task_submit( taskGain[1] );
but I got the same result : the program is hanging when running.
What am I doing wrong?
The tasks are configured in lines 200,220,246 and ,266 in the attached file.
Is this the correct way to configure tasks in StarPU? I want to be able to receive tasks out of order and StarPU schedule the tasks in order based on the dependencies. What is the correct way to do it?
2) My second question is that I noticed that if I increase the value of NX (line 17) to a couple of millions, the program crashes immediately. Does StarPU have memory restrictions? How can I set StarPU to support bigger buffers?
Thanks in advance,
Julio vergel
Notice: This e-mail is intended solely for use of the individual or entity to which it is addressed and may contain information that is proprietary, privileged, company confidential and/or exempt from disclosure under applicable law. If the reader is not the intended recipient or agent responsible for delivering the message to the intended recipient, you are hereby notified that any dissemination, distribution or copying of this communication is strictly prohibited. If this communication has been transmitted from a U.S. location it may also contain data subject to the International Traffic in Arms Regulations or U.S. Export Administration Regulations and cannot be disseminated, distributed or copied to foreign nationals, residing in the U.S. or abroad, without the prior approval of the U.S. Department of State or appropriate export licensing authority. If you have received this communication in error, please notify the sender by reply e-mail or collect telephone call and delete or destroy all copies of this email message, any physical copies made of this e-mail message and/or any file attachment(s).
#include <iostream> #include <stdio.h> #include <starpu.h> #include <assert.h> #include <unistd.h> #define STARPU_USE_OPENCL #define ProfilingEnable /* * This example demonstrates how to use StarPU to scale an array by a factor. * It shows how to manipulate data with StarPUâs data management library. * 1- how to declare a piece of data to StarPU (starpu_vector_data_register) * 2- how to describe which data are accessed by a task (task->handles[0]) * 3- how a kernel can manipulate the data (buffers[0].vector.ptr) */ #include <starpu.h> #define NX 256000 #define NX2 256000 extern void scal_cpu_func(void *buffers[], void *_args); extern void scal_sse_func(void *buffers[], void *_args); extern void scal_cuda_func(void *buffers[], void *_args); extern void scal_opencl_func(void *buffers[], void *_args); extern void addConstantOpenclFunction(void *buffers[], void *_args); extern void gainOpenclFunction(void *buffers[], void *_args); #if 0 static struct starpu_codelet clGain = { .where = STARPU_CPU | STARPU_CUDA | STARPU_OPENCL , /* CPU implementation of the codelet */ .cpu_funcs = { scal_cpu_func, scal_sse_func, NULL }, #ifdef STARPU_USE_CUDA /* CUDA implementation of the codelet */ .cuda_funcs = { scal_cuda_func, NULL }, #endif #ifdef STARPU_USE_OPENCL /* OpenCL implementation of the codelet */ .opencl_funcs = { scal_opencl_func, NULL }, #endif .nbuffers = 1, .modes = { STARPU_RW } }; #else static struct starpu_codelet clGain; static struct starpu_codelet clGainStarPU; static struct starpu_codelet clAddConstant; static struct starpu_perfmodel mult_perf_model; #endif #ifdef STARPU_USE_OPENCL struct starpu_opencl_program programGainStarPU; struct starpu_opencl_program programGain; struct starpu_opencl_program programAddConstant; #endif void InitializeGainStarPUProgram() { clGainStarPU.where = STARPU_CPU | STARPU_CUDA | STARPU_OPENCL; clGainStarPU.nbuffers = 2; clGainStarPU.modes[0] = STARPU_RW ; clGainStarPU.modes[1] = STARPU_RW ; clGainStarPU.cpu_funcs[0] =(starpu_cpu_func_t) scal_cpu_func; clGainStarPU.cpu_funcs[1] =(starpu_cpu_func_t) scal_sse_func; clGainStarPU.cpu_funcs[2] =NULL; clGainStarPU.opencl_funcs[0] = ( starpu_opencl_func_t ) gainOpenclFunction; clGainStarPU.opencl_funcs[1] = NULL; clGainStarPU.model = &mult_perf_model; } #if 0 void InitializeGain1Program() { clGain.where = STARPU_CPU | STARPU_CUDA | STARPU_OPENCL; clGain.nbuffers = 2; clGain.modes[0] = STARPU_RW ; clGain.modes[1] = STARPU_RW ; clGain.cpu_funcs[0] =(starpu_cpu_func_t) scal_cpu_func; clGain.cpu_funcs[1] =(starpu_cpu_func_t) scal_sse_func; clGain.cpu_funcs[2] =NULL; clGain.opencl_funcs[0] = ( starpu_opencl_func_t ) gainOpenclFunction; clGain.opencl_funcs[1] = NULL; clGain.model = &mult_perf_model; } #endif void InitializeAddConstantProgram() { clAddConstant.where = STARPU_OPENCL; clAddConstant.nbuffers = 2; clAddConstant.modes[0] = STARPU_RW ; clAddConstant.modes[1] = STARPU_RW ; clAddConstant.opencl_funcs[0] = ( starpu_opencl_func_t ) addConstantOpenclFunction; clAddConstant.opencl_funcs[1] = NULL; clAddConstant.model = &mult_perf_model; } int main(int argc, char **argv) { /* We consider a vector of float that is initialized just as any of C * data */ float vector[NX]; float bufferGainIn[NX]; float bufferGainOut[NX]; float bufferAddConstantOut[NX]; float bufferGain2Out[NX2]; float bufferGainParallel[NX]; unsigned i; mult_perf_model.type = STARPU_HISTORY_BASED;//STARPU_NL_REGRESSION_BASED; //STARPU_REGRESSION_BASED;//STARPU_HISTORY_BASED; mult_perf_model.symbol = "mult_perf_model"; InitializeGainStarPUProgram( ); //InitializeGain1Program( ); InitializeAddConstantProgram( ); for (i = 0; i < NX; i++) { vector[i] = 1.0f; bufferGainIn[i]=2.0f; bufferGainOut[i]=0.0f; bufferAddConstantOut[i]=0.0f; bufferGainParallel[i]=1; } for (i = 0; i < NX2; i++) { bufferGain2Out[i] = 10.0f; } fprintf(stderr, "BEFORE: First element was %f\n", vector[0]); /* Initialize StarPU with default configuration */ starpu_init(NULL); #ifdef STARPU_USE_OPENCL starpu_opencl_load_opencl_from_file("/space/WorkSpaceNGMP/DSP_StarPU/Research/Examples/Basic/vector_scal_opencl_kernel.cl", &programGainStarPU, NULL); starpu_opencl_load_opencl_from_file("/space/WorkSpaceNGMP/DSP_StarPU/Research/Examples/Basic/GainOpenDSP.cl", &programGain, NULL); starpu_opencl_load_opencl_from_file("/space/WorkSpaceNGMP/DSP_StarPU/Research/Examples/Basic/AddConstantOpenDSP.cl", &programAddConstant, NULL); #endif /* Tell StaPU to associate the "vector" vector with the "vector_handle" * identifier. When a task needs to access a piece of data, it should * refer to the handle that is associated to it. * In the case of the "vector" data interface: * - the first argument of the registration method is a pointer to the handle that should describe the data * * - the second argument is the memory node where the data (ie. "vector") resides initially: 0 stands for an address in main memory, as * opposed to an adress on a GPU for instance. * * - the third argument is the adress of the vector in RAM * - the fourth argument is the number of elements in the vector * - the fifth argument is the size of each element. */ starpu_data_handle_t vectorGainHandle[3]; starpu_vector_data_register(&vectorGainHandle[0], 0, (uintptr_t)bufferGainIn, NX, sizeof(bufferGainIn[0])); starpu_vector_data_register(&vectorGainHandle[1], 0, (uintptr_t)bufferGainOut, NX, sizeof(bufferGainOut[0])); starpu_data_handle_t vectorAddConstantHandle; starpu_vector_data_register(&vectorAddConstantHandle, 0, (uintptr_t)bufferAddConstantOut, NX, sizeof(bufferAddConstantOut[0])); starpu_data_handle_t vectorGain2Handle; starpu_vector_data_register(&vectorGain2Handle, 0, (uintptr_t)bufferGain2Out,NX, sizeof(bufferGain2Out[0])); starpu_data_handle_t vectorGainParallelHandle; starpu_vector_data_register(&vectorGainParallelHandle, 0, (uintptr_t)bufferGainParallel,NX, sizeof(bufferGainParallel[0])); float factor = 3; /* create a synchronous task: any call to starpu_task_submit will block * until it is terminated */ struct starpu_task *taskGain[3]; taskGain[0] = starpu_task_create( ); taskGain[0]->synchronous = 0; #ifdef ProfilingEnable /* Enable profiling */ starpu_profiling_status_set(STARPU_PROFILING_ENABLE); taskGain[0]->destroy = 0; #endif taskGain[0]->cl = &clGainStarPU; taskGain[0]->tag_id = 0; taskGain[0]->handles[0] = vectorGainHandle[0]; taskGain[0]->handles[1] = vectorGainHandle[1]; void *arg_buffer; size_t arg_buffer_size; starpu_codelet_pack_args( &arg_buffer, &arg_buffer_size, // STARPU_VALUE, &lengthNX, sizeof(lengthNX), //STARPU_DATA_ARRAY,&vector_handle, 2, STARPU_VALUE, &factor, sizeof(factor), // STARPU_RW, vector_handle, 0); std::cout<<"arg_buffer_size="<<arg_buffer_size<<std::endl; taskGain[0]->cl_arg = arg_buffer; taskGain[0]->cl_arg_size = arg_buffer_size; float valueToAdd = 1000; struct starpu_task *taskAddConstant = starpu_task_create( ); taskAddConstant->synchronous = 0; #ifdef ProfilingEnable taskAddConstant->destroy = 0; #endif taskAddConstant->cl = &clAddConstant; taskAddConstant->tag_id = 2; /* the codelet manipulates one buffer in RW mode */ taskAddConstant->handles[0] = vectorGainHandle[1]; taskAddConstant->handles[1] = vectorAddConstantHandle; void *arg_bufferConstantAdd; size_t arg_bufferConstantAddSize; starpu_codelet_pack_args( &arg_bufferConstantAdd, &arg_bufferConstantAddSize, // STARPU_VALUE, &lengthNX, sizeof(lengthNX), //STARPU_DATA_ARRAY,&vector_handle, 2, STARPU_VALUE, &valueToAdd, sizeof(valueToAdd), // STARPU_RW, vector_handle, 0); std::cout<<"arg_bufferConstantAddSize="<<arg_bufferConstantAddSize<<std::endl; taskAddConstant->cl_arg = arg_bufferConstantAdd; taskAddConstant->cl_arg_size = arg_bufferConstantAddSize; factor =5; taskGain[1] = starpu_task_create( ); taskGain[1]->tag_id = 1; taskGain[1]->synchronous = 0; #ifdef ProfilingEnable taskGain[1]->destroy = 0; #endif taskGain[1]->cl = &clGainStarPU; taskGain[1]->handles[0] = vectorAddConstantHandle; taskGain[1]->handles[1] = vectorGain2Handle; void *arg_buffer2; size_t arg_buffer_size2; starpu_codelet_pack_args( &arg_buffer2, &arg_buffer_size2, STARPU_VALUE, &factor, sizeof(factor), 0); taskGain[1]->cl_arg = arg_buffer2; taskGain[1]->cl_arg_size = arg_buffer_size2; factor =8; taskGain[2] = starpu_task_create( ); taskGain[2]->synchronous = 0; #ifdef ProfilingEnable taskGain[2]->destroy = 0; #endif taskGain[2]->cl = &clGainStarPU; taskGain[2]->tag_id = 3; taskGain[2]->handles[0] = vectorAddConstantHandle; taskGain[2]->handles[1] = vectorGainParallelHandle; void *argBufferParallel; size_t argBufferParallelSize; starpu_codelet_pack_args( &argBufferParallel, &argBufferParallelSize, STARPU_VALUE, &factor, sizeof(factor), 0); taskGain[2]->cl_arg = argBufferParallel; taskGain[2]->cl_arg_size = argBufferParallelSize; #if 1 struct starpu_task *depending[2]={taskGain[0] , taskAddConstant}; // starpu_task_declare_deps_array ( taskGain[1] , 1, depending ); starpu_task_declare_deps_array ( taskGain[2] , 2, depending ); //struct starpu_task* dependingOnGain0[1]={taskGain[0]}; //starpu_task_declare_deps_array ( taskAddConstant , 1, &dependingOnGain0[0] ); #else starpu_tag_t arrayOfdeps[1]={ taskAddConstant->tag_id }; starpu_tag_declare_deps( taskGain[2]->tag_id, 1, arrayOfdeps ); #endif /* execute the task on any eligible computational resource */ // for (int i=0;i<100;i++) { //starpu_task_wait_for_all(); starpu_task_submit( taskGain[0] ); starpu_task_submit( taskAddConstant ); starpu_task_submit( taskGain[2] ); starpu_task_submit( taskGain[1] ); } // starpu_task_wait ( taskGain[2] ); starpu_task_wait_for_all(); std::cout<<"Input="<<bufferGainIn[0]<<" AddOut="<<vector[0]<<" bufferGain2Out="<<bufferGain2Out[0]<<std::endl; // struct starpu_task* depending[1]; // depending[0]=taskAddConstant; // starpu_task_declare_deps_array ( taskGain[0] , 1, &depending[0] ); // starpu_task_wait_for_all(); // starpu_task_wait_for_all(); #ifdef ProfilingEnable static double delay = 0 ; static double length = 0; /* The task is finished, get profiling information */ struct starpu_profiling_task_info *info = taskGain[0]->profiling_info; /* How much time did it take before the task started ? */ delay += starpu_timing_timespec_delay_us(&info->submit_time, &info->start_time); /* How long was the task execution ? */ length += starpu_timing_timespec_delay_us(&info->start_time, &info->end_time); /* We donât need the task structure anymore */ starpu_task_destroy(taskGain[0]); starpu_task_destroy(taskAddConstant); starpu_task_destroy(taskGain[1]); char workername[128]; starpu_worker_get_name(0, workername, 128); std::cout<<"Worker :"<<workername<<std::endl; std::cout<<"Avg. delay :"<<delay<<std::endl; std::cout<<"length :"<<length<<std::endl; /* Display the occupancy of all workers during the test */ unsigned worker; for (worker = 0; worker < starpu_worker_get_count(); worker++) { struct starpu_profiling_worker_info worker_info; int ret = starpu_profiling_worker_get_info(worker, &worker_info); STARPU_ASSERT(!ret); double total_time = starpu_timing_timespec_to_us(&worker_info.total_time); double executing_time = starpu_timing_timespec_to_us(&worker_info.executing_time); double sleeping_time = starpu_timing_timespec_to_us(&worker_info.sleeping_time); double overhead_time = total_time - executing_time - sleeping_time; float executing_ratio = 100.0*executing_time/total_time; float sleeping_ratio = 100.0*sleeping_time/total_time; float overhead_ratio = 100.0 - executing_ratio - sleeping_ratio; char workername[128]; starpu_worker_get_name(worker, workername, 128); std::cout<<"Worker "<<workername<<std::endl; std::cout<<"\ttotal time : "<< total_time*1e-3<<" ms"<<std::endl; std::cout<< "\texec time : "<< executing_time*1e-3<<" ms ("<< executing_ratio<<" \% ) "<<std::endl; std::cout<< "\tblocked time : "<< sleeping_time*1e-3<<" ms ("<< sleeping_ratio<<" \% ) "<<std::endl; std::cout<< "\toverhead time : "<< overhead_time*1e-3<<" ms ("<< overhead_ratio<<" \% ) "<<std::endl; } #endif /* StarPU does not need to manipulate the array anymore so we can stop * monitoring it */ starpu_data_unregister(vectorAddConstantHandle); starpu_data_unregister(vectorGainHandle[0]); starpu_data_unregister(vectorGainHandle[1]); starpu_data_unregister(vectorGain2Handle); starpu_data_unregister(vectorGainParallelHandle); #ifdef STARPU_USE_OPENCL starpu_opencl_unload_opencl(&programGainStarPU); #endif /* terminate StarPU, no task can be submitted after */ starpu_shutdown(); std::cout<<"bufferGainParallel="<<bufferGainParallel[0]<<" Input="<<bufferGainIn[0]<<" Gain1Out="<<bufferGainOut[0]<<" AddOut="<<bufferAddConstantOut[0]<<" bufferGain2Out="<<bufferGain2Out[0]<<std::endl; //fprintf(stderr, "AFTER First element is %f and %f vec2=%f and %f\n", bufferGainOut[0], bufferGainOut[NX-1], bufferGain2Out[0], bufferGain2Out[NX2-1]); return 0; }
- [Starpu-devel] StarPU questions, Vergel, Julio, 29/08/2014
Archives gérées par MHonArc 2.6.19+.