Accéder au contenu.
Menu Sympa

starpu-devel - Re: [Starpu-devel] StarPU SVN and OpenCL on CPU

Objet : Developers list for StarPU

Archives de la liste

Re: [Starpu-devel] StarPU SVN and OpenCL on CPU


Chronologique Discussions 
  • From: Sylvain HENRY <sylvain.henry@inria.fr>
  • To: George Russell <george@codeplay.com>
  • Cc: starpu-devel@lists.gforge.inria.fr
  • Subject: Re: [Starpu-devel] StarPU SVN and OpenCL on CPU
  • Date: Thu, 24 Feb 2011 11:33:53 +0100
  • List-archive: <http://lists.gforge.inria.fr/pipermail/starpu-devel>
  • List-id: "Developers list. For discussion of new features, code changes, etc." <starpu-devel.lists.gforge.inria.fr>

Hi,

See the attached patch. However, I'm using Git and I don't know if this patch is compatible with other patch systems.
If you want to apply it with Git:

git svn clone --stdlayout svn://scm.gforge.inria.fr/svn/starpu
cd starpu
git apply blabla.patch

Cheers
Sylvain

Le 23/02/2011 21:40, George Russell a écrit :
4D6570A2.80108@codeplay.com"> Hi,

Great that it works ;-)

Could you post a patch against the StarPU SVN trunk for this? I am not terribly concerned about the "hackiness" of this at the moment, as I simply want to see how I can get OpenCL code running in StarPU on the CPU as I at present don't have access to an OpenCL GPU in a context where StarPU will build!

Cheers,
George

On 23/02/2011 20:32, Sylvain HENRY wrote:
4D6560BF.9090803@inria.fr"> Hi George,

I succeeded in using StarPU with the AMD Stream SDK on CPU but it requires a hack.

The problem is that StarPU uses asynchronous data transfers and uses polling (clGetEventInfo(CL_EVENT_COMMAND_EXECUTION_STATUS...)) to know when a transfer is terminated. But it seems that the asynchronous data transfer (i.e. memcpy) is never performed by AMD Stream. A blocking call is required for the transfer to be performed...

A first simple solution would be to detect when the worker is an OpenCL CPU device and to somehow force the call to be blocking.

The solution I used is different. Because I wanted to avoid the superfluous memcpy, I used an OpenCL buffer created with the flag CL_MEM_USE_HOST_PTR. Each time StarPU wants to transfer data from host memory to the OpenCL CPU device, it calls a function with the source address and the target buffer. This is where I free the buffer and replace it with a new one "mapping" the data.

This hack worked for me because I only had OpenCL devices (no CPU worker from a StarPU point of view, only OpenCL ones). With CPU workers, data may get corrupted as we are faking a copy.

The long term solution would be for StarPU to consider OpenCL CPU devices as CPU workers, that is, workers performing their computations in host memory (and using CPU cores... but this is another story).

Cheers
Sylvain


>From acd3fc243196893e0523b1233ed8d5a182f1b90e Mon Sep 17 00:00:00 2001
From: Sylvain HENRY <sylvain.henry@inria.fr>
Date: Thu, 24 Feb 2011 11:10:01 +0100
Subject: [PATCH] OpenCL CPU devices patch

---
src/datawizard/copy_driver.c | 21 +++++---
src/datawizard/interfaces/variable_interface.c | 65 ++++++++++++++++++-----
src/drivers/opencl/driver_opencl.c | 29 ++++++++++-
src/drivers/opencl/driver_opencl.h | 6 ++
4 files changed, 97 insertions(+), 24 deletions(-)

diff --git a/src/datawizard/copy_driver.c b/src/datawizard/copy_driver.c
index 48309fe..f6fd3f7 100644
--- a/src/datawizard/copy_driver.c
+++ b/src/datawizard/copy_driver.c
@@ -290,10 +290,11 @@ void
_starpu_driver_wait_request_completion(starpu_async_channel *async_channel
#ifdef STARPU_USE_OPENCL
case STARPU_OPENCL_RAM:
{
- if ((*async_channel).opencl_event == NULL) STARPU_ABORT();
- cl_int err = clWaitForEvents(1,
&((*async_channel).opencl_event));
- if (err != CL_SUCCESS) STARPU_OPENCL_REPORT_ERROR(err);
- clReleaseEvent((*async_channel).opencl_event);
+ if ((*async_channel).opencl_event != NULL) {
+ cl_int err = clWaitForEvents(1,
&((*async_channel).opencl_event));
+ if (err != CL_SUCCESS) STARPU_OPENCL_REPORT_ERROR(err);
+ clReleaseEvent((*async_channel).opencl_event);
+ }
}
break;
#endif
@@ -328,10 +329,14 @@ unsigned
_starpu_driver_test_request_completion(starpu_async_channel *async_chan
{
cl_int event_status;
cl_event opencl_event = (*async_channel).opencl_event;
- if (opencl_event == NULL) STARPU_ABORT();
- cl_int err = clGetEventInfo(opencl_event,
CL_EVENT_COMMAND_EXECUTION_STATUS, sizeof(event_status), &event_status, NULL);
- if (err != CL_SUCCESS) STARPU_OPENCL_REPORT_ERROR(err);
- success = (event_status == CL_COMPLETE);
+ if (opencl_event != NULL) {
+ cl_int err = clGetEventInfo(opencl_event,
CL_EVENT_COMMAND_EXECUTION_STATUS, sizeof(event_status), &event_status, NULL);
+ if (err != CL_SUCCESS) STARPU_OPENCL_REPORT_ERROR(err);
+ success = (event_status == CL_COMPLETE);
+ }
+ else {
+ success = 1;
+ }
break;
}
#endif
diff --git a/src/datawizard/interfaces/variable_interface.c
b/src/datawizard/interfaces/variable_interface.c
index 4a4fb0d..7cdedff 100644
--- a/src/datawizard/interfaces/variable_interface.c
+++ b/src/datawizard/interfaces/variable_interface.c
@@ -54,8 +54,8 @@ static const struct starpu_data_copy_methods
variable_copy_data_methods_s = {
#ifdef STARPU_USE_OPENCL
.ram_to_opencl = copy_ram_to_opencl,
.opencl_to_ram = copy_opencl_to_ram,
- .opencl_to_opencl = copy_opencl_to_opencl,
- .ram_to_opencl_async = copy_ram_to_opencl_async,
+ .opencl_to_opencl = NULL, //copy_opencl_to_opencl,
+ .ram_to_opencl_async = copy_ram_to_opencl_async,
.opencl_to_ram_async = copy_opencl_to_ram_async,
#endif
.cuda_to_spu = NULL,
@@ -218,13 +218,13 @@ static ssize_t allocate_variable_buffer_on_node(void
*interface_, uint32_t dst_n
#ifdef STARPU_USE_OPENCL
case STARPU_OPENCL_RAM:
{
- int ret;
- void *ptr;
- ret = _starpu_opencl_allocate_memory(&ptr,
elemsize, CL_MEM_READ_WRITE);
- addr = (uintptr_t)ptr;
- if (ret) {
- fail = 1;
- }
+ int ret;
+ void *ptr;
+ ret = _starpu_opencl_allocate_memory(&ptr, elemsize,
CL_MEM_READ_WRITE);
+ addr = (uintptr_t)ptr;
+ if (ret) {
+ fail = 1;
+ }
break;
}
#endif
@@ -257,9 +257,9 @@ static void free_variable_buffer_on_node(void *interface,
uint32_t node)
break;
#endif
#ifdef STARPU_USE_OPENCL
- case STARPU_OPENCL_RAM:
-
clReleaseMemObject((void*)STARPU_VARIABLE_GET_PTR(interface));
- break;
+ case STARPU_OPENCL_RAM:
+ clReleaseMemObject((void*)STARPU_VARIABLE_GET_PTR(interface));
+ break;
#endif
default:
assert(0);
@@ -350,11 +350,29 @@ static int copy_ram_to_opencl_async(void
*src_interface, unsigned src_node __att
starpu_variable_interface_t *dst_variable = dst_interface;
int err,ret;

+ cl_device_type typ = _starpu_opencl_device_type();
+ cl_context ctx = _starpu_opencl_context();
+
+ if (typ == CL_DEVICE_TYPE_CPU) {
+ clReleaseMemObject((void*)(STARPU_VARIABLE_GET_PTR(dst_variable)));
+ dst_variable->ptr = (uintptr_t)clCreateBuffer(ctx, CL_MEM_USE_HOST_PTR
| CL_MEM_READ_WRITE, src_variable->elemsize, (char*)src_variable->ptr, &err);
+ if (STARPU_UNLIKELY(err))
+ STARPU_OPENCL_REPORT_ERROR(err);
+
+ if (_event != NULL)
+ *(cl_event*)_event = NULL;
+
+ ret = 0;
+ }
+ else {
+
err =
_starpu_opencl_copy_ram_to_opencl_async_sync((void*)src_variable->ptr,
(cl_mem)dst_variable->ptr, src_variable->elemsize,
0,
(cl_event*)_event, &ret);
if (STARPU_UNLIKELY(err))
STARPU_OPENCL_REPORT_ERROR(err);

+ }
+
STARPU_TRACE_DATA_COPY(src_node, dst_node, src_variable->elemsize);

return ret;
@@ -364,14 +382,33 @@ static int copy_opencl_to_ram_async(void
*src_interface, unsigned src_node __att
{
starpu_variable_interface_t *src_variable = src_interface;
starpu_variable_interface_t *dst_variable = dst_interface;
- int err, ret;
+ int ret;
+
+ cl_device_type typ = _starpu_opencl_device_type();

- err =
_starpu_opencl_copy_opencl_to_ram_async_sync((cl_mem)src_variable->ptr,
(void*)dst_variable->ptr, src_variable->elemsize,
+ if (typ == CL_DEVICE_TYPE_CPU) {
+ cl_command_queue cq = _starpu_opencl_queue();
+ cl_int err;
+ void * ptr = clEnqueueMapBuffer(cq,
(cl_mem)STARPU_VARIABLE_GET_PTR(src_variable), CL_TRUE, CL_MAP_READ |
CL_MAP_WRITE, 0, src_variable->elemsize, 0, NULL, NULL, &err);
+ if (STARPU_UNLIKELY(err))
+ STARPU_OPENCL_REPORT_ERROR(err);
+ memcpy(dst_variable->ptr, ptr, src_variable->elemsize);
+ clEnqueueUnmapMemObject(cq, (cl_mem)src_variable->ptr, ptr, 0, NULL,
NULL);
+ //Nothing to do
+ if (_event != NULL)
+ *(cl_event*)_event = NULL;
+ ret = 0;
+ }
+ else {
+ int err;
+
+ err =
_starpu_opencl_copy_opencl_to_ram_async_sync((cl_mem)src_variable->ptr,
(void*)dst_variable->ptr, src_variable->elemsize,
0,
(cl_event*)_event, &ret);

if (STARPU_UNLIKELY(err))
STARPU_OPENCL_REPORT_ERROR(err);

+ }
STARPU_TRACE_DATA_COPY(src_node, dst_node, src_variable->elemsize);

return ret;
diff --git a/src/drivers/opencl/driver_opencl.c
b/src/drivers/opencl/driver_opencl.c
index 3e7fac8..6a27d44 100644
--- a/src/drivers/opencl/driver_opencl.c
+++ b/src/drivers/opencl/driver_opencl.c
@@ -119,7 +119,10 @@ cl_int _starpu_opencl_init_context(int devid)

// Create a compute context
err = 0;
- contexts[devid] = clCreateContext(NULL, 1, &devices[devid], NULL,
NULL, &err);
+ cl_platform_id platform;
+ clGetDeviceInfo(devices[devid], CL_DEVICE_PLATFORM,
sizeof(cl_platform_id), &platform, NULL);
+ cl_context_properties props[] = {CL_CONTEXT_PLATFORM,
(cl_context_properties)platform, 0};
+ contexts[devid] = clCreateContext(props, 1, &devices[devid], NULL,
NULL, &err);
if (err != CL_SUCCESS) STARPU_OPENCL_REPORT_ERROR(err);

// Create queue for the given device
@@ -210,6 +213,28 @@ cl_int _starpu_opencl_copy_ram_to_opencl(void *ptr,
cl_mem buffer, size_t size,
return CL_SUCCESS;
}

+cl_device_id _starpu_opencl_device(void) {
+ struct starpu_worker_s *worker = _starpu_get_local_worker_key();
+ return devices[worker->devid];
+}
+
+cl_context _starpu_opencl_context(void) {
+ struct starpu_worker_s *worker = _starpu_get_local_worker_key();
+ return contexts[worker->devid];
+}
+
+cl_command_queue _starpu_opencl_queue(void) {
+ struct starpu_worker_s *worker = _starpu_get_local_worker_key();
+ return queues[worker->devid];
+}
+
+cl_device_type _starpu_opencl_device_type(void) {
+ cl_device_id dev = _starpu_opencl_device();
+ cl_device_type typ;
+ clGetDeviceInfo(dev, CL_DEVICE_TYPE, sizeof(cl_device_type), &typ, NULL);
+ return typ;
+}
+
cl_int _starpu_opencl_copy_opencl_to_ram_async_sync(cl_mem buffer, void
*ptr, size_t size, size_t offset, cl_event *event, int *ret)
{
cl_int err;
@@ -292,7 +317,7 @@ void _starpu_opencl_init(void)
if (!init_done) {
cl_platform_id platform_id[STARPU_OPENCL_PLATFORM_MAX];
cl_uint nb_platforms;
- cl_device_type device_type =
CL_DEVICE_TYPE_GPU|CL_DEVICE_TYPE_ACCELERATOR;
+ cl_device_type device_type =
CL_DEVICE_TYPE_GPU|CL_DEVICE_TYPE_ACCELERATOR | CL_DEVICE_TYPE_CPU;
cl_int err;
unsigned int i;

diff --git a/src/drivers/opencl/driver_opencl.h
b/src/drivers/opencl/driver_opencl.h
index bbe6a89..44bdd9e 100644
--- a/src/drivers/opencl/driver_opencl.h
+++ b/src/drivers/opencl/driver_opencl.h
@@ -68,5 +68,11 @@ void _starpu_opencl_init(void);
extern
void *_starpu_opencl_worker(void *);

+
+cl_device_id _starpu_opencl_device(void);
+cl_context _starpu_opencl_context(void);
+cl_device_type _starpu_opencl_device_type(void);
+cl_command_queue _starpu_opencl_queue(void);
+
#endif // STARPU_USE_OPENCL
#endif // __DRIVER_OPENCL_H__
--
1.7.4.1




Archives gérées par MHonArc 2.6.19+.

Haut de le page