starpu-devel - Re: [Starpu-devel] Error - No worker may execute this task

Objet : Developers list for StarPU

Archives de la liste

Re: [Starpu-devel] Error - No worker may execute this task

From: ASUDE ASUDE <ajitsdeshpande@gmail.com>
To: Nathalie Furmento <nathalie.furmento@labri.fr>, starpu-devel@lists.gforge.inria.fr
Subject: Re: [Starpu-devel] Error - No worker may execute this task
Date: Tue, 29 Nov 2011 14:53:50 +0000
List-archive: <http://lists.gforge.inria.fr/pipermail/starpu-devel>
List-id: "Developers list. For discussion of new features, code changes, etc." <starpu-devel.lists.gforge.inria.fr>

Hello,
Sorry Pls find attached the mroe files files and attaching the earlier file test_starpu.c again . Pls refer this file.
Line 140 is actually a function call to - clEnqueueWriteImage()
I had tested it on standalone openCL executing on GPU card , it works fine. (Execution Log FYI shown below)
Loaded 'testin.png', [230 x 230] x 3Ch 16Bits
Executing on NVIDIA Corporation GeForce GTX 470...
Executing on NVIDIA Corporation GeForce GTX 470...
Executing on NVIDIA Corporation GeForce GTX 470...
Writing test.png [230 x 230] x 3Ch 16Bits

I am afraid I might not be able to share all code here.

Any pointers would help.

thank you.
-AD.

On Tue, Nov 29, 2011 at 2:30 PM, Nathalie Furmento <nathalie.furmento@labri.fr> wrote:

Did you update your application ?

#10 0x0000000000403cb0 in test_opencl_codelet (buffers=0x7efc28,
cl_arg=0x7fffffffe5f0) at test_starpu.c:140

Looking at the test_starpu.c file you sent earlier, the line 140 is

err |= clSetKernelArg(kernel, 1, sizeof(cl_mem), &d_tex);

Have you tried to execute your OpenCL code outside StarPU ? Does it work ? Maybe you should send us all the files of your application that we can test it ourselves.

Cheers,

Nathalie

On 29/11/2011 15:17, ASUDE ASUDE wrote:
Hello,
Did some debugging with gdb on my application executable and some interesting(but those which I could not explain) things were uncovered-
Built with -g enabled, no optimization enabled. And same configure and make steps as before. (--enable-opencl --disable-cuda)

When executed in standalone(w/o gdb) the segmentation fault occurs in a transient manner, i.e. it does not show up for every execution , but based on the configuration of devices on which, starpu schedules the task at runtime -
When it starpu schedules the tasks on all CPU devices, it does not crash, when it schedules on OpenCL device (GTX470), it always crashes.

Then did: gdb testapp
At gdb prompt gave run with necessary arguments needed for app -

Below is the gdb backtrace when it crashes,
(gdb) bt
#0 0x00007fffef33b380 in ?? () from /usr/lib/libcuda.so.1
#1 0x00007fffef292271 in ?? () from /usr/lib/libcuda.so.1
#2 0x00007fffef29c3b3 in ?? () from /usr/lib/libcuda.so.1
#3 0x00007fffef29c98c in ?? () from /usr/lib/libcuda.so.1
#4 0x00007fffef29412e in ?? () from /usr/lib/libcuda.so.1
#5 0x00007fffef33eb98 in ?? () from /usr/lib/libcuda.so.1
#6 0x00007fffef33f3b0 in ?? () from /usr/lib/libcuda.so.1
#7 0x00007fffef350f33 in ?? () from /usr/lib/libcuda.so.1
#8 0x00007fffef351cc3 in ?? () from /usr/lib/libcuda.so.1
#9 0x00007fffef34996c in ?? () from /usr/lib/libcuda.so.1
#10 0x0000000000403cb0 in test_opencl_codelet (buffers=0x7efc28,
    cl_arg=0x7fffffffe5f0) at test_starpu.c:140
#11 0x00007ffff7ba98bf in _starpu_opencl_execute_job (arg=0x7ffff7dbf220)
    at drivers/opencl/driver_opencl.c:554
#12 _starpu_opencl_worker (arg=0x7ffff7dbf220)
    at drivers/opencl/driver_opencl.c:457
#13 0x00007fffedbe7d8c in start_thread (arg=0x7fffe9744700)
    at pthread_create.c:304
#14 0x00007fffee2fe04d in clone ()
    at ../sysdeps/unix/sysv/linux/x86_64/clone.S:112
#15 0x0000000000000000 in ?? ()

Like for e.g. At one point it executed just fine console output log shown as below:

1]
Loaded 'LFI.png', [230 x 230] x 3Ch 16Bits
Executing on CPU 0
Executing on CPU 0
Executing on CPU 0

2]
Loaded 'LFI.png', [230 x 230] x 3Ch 16Bits
Executing on OpenCL 0 (GeForce GTX 470)
Segmentation fault

3]
Loaded 'LFI.png', [230 x 230] x 3Ch 16Bits
Executing on CPU 0
Executing on OpenCL 0 (GeForce GTX 470)
Executing on OpenCL 0 (GeForce GTX 470)
Segmentation fault

4]
Loaded 'LFI.png', [230 x 230] x 3Ch 16Bits
Executing on CPU 0
Executing on OpenCL 0 (GeForce GTX 470)
Executing on CPU 0
Writing LFI_test.png [230 x 230] x 3Ch 16Bits
This did not crash strangely.

5]
Loaded 'LFI.png', [230 x 230] x 3Ch 16Bits
Executing on CPU 0
Executing on OpenCL 0 (GeForce GTX 470)
Executing on OpenCL 0 (GeForce GTX 470)
Writing LFI_test.png [230 x 230] x 3Ch 16Bits
This did not crash strangely.

Whenever it runs, the output image generated(its a image processing algorithm) is ok with main features present as they should be but with some bad background colors artifacts in the image.

Coud it be Some kind of Heisenbug at play related to Multitple thread/tasks race condition or so?

What next should be my course?

On Tue, Nov 29, 2011 at 12:41 PM, Samuel Thibault <samuel.thibault@ens-lyon.org> wrote:

ASUDE ASUDE, le Tue 29 Nov 2011 12:25:28 +0000, a écrit :

> After adding in code .where = STARPU_OPENCL | STARPU_CPU
> and configuring with --disable-cuda --enable-opencl, make and executing I get
> error:
> Executing on OpenCL 0 (GeForce GTX 470)
> Segmentation fault
>
> So it seems its task is getting scheduled to OpenCL device of GPU, but crashing
> for some other reason.

Yes.

> Would you suspect that my NVIDIA driver or something related to GPU card could
> be having problem?

Unlikely.

> Where would you point I start debugging to resolve this?

Getting a backtrace in gdb, to at least know where the segmentation
fault is.

Samuel
_______________________________________________
Starpu-devel mailing list
Starpu-devel@lists.gforge.inria.fr
http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/starpu-devel

/*
 * 
 * 
 *
 * 
 * 
 *
 */


//
#include <stdlib.h>
#include <stdio.h>
#include <string.h>
#include <math.h>
#include <unistd.h>
#include <assert.h>

#ifdef __OPENCL__
#ifdef __Apple__ 
#include <OpenCL/OpenCL.h>
#else
#include <CL/opencl.h>
#endif
#endif

#ifdef __STARPU__
#include <starpu.h>
#include <starpu_profiling.h>
#ifdef __OPENCL__
#include <starpu_opencl.h>
#endif
#ifdef __CUDA__
#include <starpu_cuda.h>
#endif
#endif

#include <test.h>



typedef struct _test_arg {
	float depth;
	int xoffset;
	int yoffset;
	int verbose;
} test_arg;

static struct starpu_perfmodel_t test_model = {
        .type = STARPU_HISTORY_BASED,
        .symbol = "test"
};

static struct starpu_perfmodel_t test_power_model = {
        .type = STARPU_HISTORY_BASED,
        .symbol = "test_power"
};

void test_cpu_codelet(void *buffers[], void *cl_arg){

	int devid,id;

	unsigned width = STARPU_MATRIX_GET_NX(buffers[0]);
	unsigned height = STARPU_MATRIX_GET_NY(buffers[0]);
	unsigned ld = STARPU_MATRIX_GET_LD(buffers[0]);

	float *d_dst = (float *)STARPU_MATRIX_GET_PTR(buffers[0]);
	float *d_src = (float *)STARPU_MATRIX_GET_PTR(buffers[1]);

	id = starpu_worker_get_id();
	devid = starpu_worker_get_devid(id);

	test_arg * arg=cl_arg;

	if(arg->verbose) {
			char dst[128];
			starpu_worker_get_name(id , dst ,128);
			printf("Executing on %s\n", dst);
	}

	test_cpu(d_dst,d_src, ld, width, height, arg->depth, arg->xoffset, arg->yoffset, arg->verbose);
}




#ifdef __OPENCL__

struct starpu_opencl_program test_opencl_program;

void test_opencl_codelet(void *buffers[], void *cl_arg){

	cl_device_id device;
	cl_kernel kernel;
	cl_context context;
	cl_command_queue queue;
	cl_event event;
	int devid,id;
	cl_int err = 0;

	cl_platform_id platform;

	int width = STARPU_MATRIX_GET_NX(buffers[0]);
	int height = STARPU_MATRIX_GET_NY(buffers[0]);
	int ld = STARPU_MATRIX_GET_LD(buffers[0]);
	int size = width * height * STARPU_MATRIX_GET_ELEMSIZE(buffers[0]);

	cl_mem *d_dst = (cl_mem *)STARPU_MATRIX_GET_PTR(buffers[0]);
	cl_mem *d_src = (cl_mem *)STARPU_MATRIX_GET_PTR(buffers[1]);

	test_arg * arg=cl_arg;

	id = starpu_worker_get_id();
	devid = starpu_worker_get_devid(id);

	if(arg->verbose) {
		char dst[128];
		starpu_worker_get_name(id , dst ,128);
		printf("Executing on %s\n", dst);
	}

	err = starpu_opencl_load_kernel(&kernel, &queue, &test_opencl_program, "test_opencl_kernel", devid);
	if (err != CL_SUCCESS) STARPU_OPENCL_REPORT_ERROR(err);
	assert(err == CL_SUCCESS);

	starpu_opencl_get_device(devid, &device);
	starpu_opencl_get_context(devid, &context);
	starpu_opencl_get_queue(devid, &queue);


	// Allocate device resources for input lightfield image
	cl_image_format format = { CL_R, CL_UNSIGNED_INT32};
	cl_mem d_tex = clCreateImage2D(context, CL_MEM_READ_ONLY,
	                            &format, width, height, 0 ,(void *) d_src, &err);
	if (err != CL_SUCCESS) STARPU_OPENCL_REPORT_ERROR(err);
	assert(err == CL_SUCCESS);

	size_t origin[3] = {0,0,0};
	size_t region[3] = {width,height,1};
	err = clEnqueueWriteImage(queue, d_tex, CL_TRUE,  origin, region, width*sizeof(float), 0, (void *) d_src, 0, NULL, NULL);
	if (err != CL_SUCCESS) STARPU_OPENCL_REPORT_ERROR(err);
	assert(err == CL_SUCCESS);

	cl_sampler sampler = clCreateSampler(context, CL_TRUE, CL_ADDRESS_CLAMP, CL_FILTER_LINEAR, &err);
	assert(err == CL_SUCCESS);

	// Now setup the arguments to our test_kernel
	err  = clSetKernelArg(kernel,  0, sizeof(cl_mem), &d_dst);
	err |= clSetKernelArg(kernel,  1, sizeof(cl_mem), &d_tex);
	err |= clSetKernelArg(kernel,  2, sizeof(cl_sampler), &sampler);
	err |= clSetKernelArg(kernel,  3, sizeof(int), &ld);
	err |= clSetKernelArg(kernel,  4, sizeof(int), &width);
	err |= clSetKernelArg(kernel,  5, sizeof(int), &height);
	err |= clSetKernelArg(kernel,  6, sizeof(float), &arg->depth);
	err |= clSetKernelArg(kernel,  7, sizeof(int), &arg->xoffset);
	err |= clSetKernelArg(kernel,  8, sizeof(int), &arg->yoffset);
	assert(err == CL_SUCCESS);

	size_t localWorkSize[2] = {16,16};
	size_t globalWorkSize[2] ={width + (64 - width%64) , height + (64 - height%64) };
	err = clEnqueueNDRangeKernel(queue, kernel, 2, NULL, globalWorkSize, localWorkSize, 0, NULL, &event);
	assert(err == CL_SUCCESS);

	clFinish(queue);
	starpu_opencl_collect_stats(event);
	clReleaseEvent(event);
	starpu_opencl_release_kernel(kernel);

}

#endif //__OPENCL__



#ifdef __CUDA__

void test_cuda_codelet(void *buffers[], void *cl_arg){

	int width = STARPU_MATRIX_GET_NX(buffers[0]);
	int height = STARPU_MATRIX_GET_NY(buffers[0]);
	int ld = STARPU_MATRIX_GET_LD(buffers[0]);
	int size = width * height * STARPU_MATRIX_GET_ELEMSIZE(buffers[0]);

	float *d_dst = (float *)STARPU_MATRIX_GET_PTR(buffers[0]);
	float *d_src = (float *)STARPU_MATRIX_GET_PTR(buffers[1]);

	int id = starpu_worker_get_id();
	int devid = starpu_worker_get_devid(id);

	test_arg * arg=cl_arg;

	if(arg->verbose) {
			char dst[128];
			starpu_worker_get_name(id , dst ,128);
			printf("Executing =%s\n", dst);
	}

	_test_cuda(d_dst, d_src, ld, width, height, arg->depth, arg->xoffset, arg->yoffset, 0);

}

#endif //__CUDA__



int test_starpu(float * h_dst, float * h_src, int ld, int width, int height,  float depth, int xoffset, int yoffset, int verbose)
{
	test_arg arg= {depth, xoffset, yoffset, verbose};

	starpu_codelet test_codelets =
		{

#if __OPENCL__ && __CUDA__
		// Note two GPU required for both OpenCL and CUDA.
		.where = STARPU_OPENCL | STARPU_CUDA | STARPU_CPU,
		.opencl_func = test_opencl_codelet,
		.cuda_func = test_cuda_codelet,
#elif __OPENCL__
		.where = STARPU_OPENCL,
		.opencl_func = test_opencl_codelet,
#elif __CUDA__
		.where = STARPU_CUDA | STARPU_CPU,
		.cuda_func = test_cuda_codelet,
#else
		.where = STARPU_CPU,
#endif
		.cpu_func = test_cpu_codelet,
		.nbuffers = 2,
	    .model = &test_model,
            .power_model = &test_power_model
    };

	starpu_init(NULL);

#ifdef __OPENCL__
	if(test_codelets.where & STARPU_OPENCL ){
		int err = starpu_opencl_load_opencl_from_file("test_opencl_kernel.cl", &test_opencl_program, NULL);
		assert(err == CL_SUCCESS);
	}
#endif

 	struct starpu_task *task = starpu_task_create();

	starpu_data_handle src_handle, dst_handle;
	starpu_matrix_data_register(&dst_handle, 0, (uintptr_t)h_dst, ld , width, height, sizeof(float));
	starpu_matrix_data_register(&src_handle, 0, (uintptr_t)h_src, ld , width, height, sizeof(float));

	task->buffers[0].handle = dst_handle; /* 1st parameter of the codelet */
	task->buffers[0].mode = STARPU_W;

	task->buffers[1].handle = src_handle; /* 2nd parameter of the codelet */
	task->buffers[1].mode = STARPU_R;

	task->cl = &test_codelets;
	task->cl_arg = &arg;				  /* Remaining parameters of the codelet */
	task->cl_arg_size = sizeof(test_arg);

	task->synchronous = 1;
	task->callback_func = NULL;

	/* submit the task to StarPU */
	int err = starpu_task_submit(task);

	if (err == -ENODEV) {
		printf("No worker may execute this task\n");
		return 1;
	}

	starpu_task_wait_for_all();

#ifdef __OPENCL__
	/* unload program */
	if(test_codelets.where & STARPU_OPENCL){
		starpu_opencl_unload_opencl( &test_opencl_program);
	}
#endif

	/* update the array in RAM */
	starpu_data_unregister(src_handle);
	starpu_data_unregister(dst_handle);
	starpu_shutdown();

	return 0;

}

Re: [Starpu-devel] Error - No worker may execute this task, (suite)

Archives gérées par MHonArc 2.6.19+.

Archives de la liste

Re: [Starpu-devel] Error - No worker may execute this task