Objet : Developers list for StarPU
Archives de la liste
- From: Nathalie Furmento <nathalie.furmento@labri.fr>
- To: starpu-announce@lists.gforge.inria.fr
- Cc: "starpu-devel@lists.gforge.inria.fr" <starpu-devel@lists.gforge.inria.fr>
- Subject: [Starpu-devel] StarPU v1.2.0 released
- Date: Thu, 25 Aug 2016 11:20:57 +0200
- Authentication-results: mail2-smtp-roc.national.inria.fr; spf=None smtp.pra=nathalie.furmento@labri.fr; spf=None smtp.mailfrom=nathalie.furmento@labri.fr; spf=None smtp.helo=postmaster@v-zimmta02.u-bordeaux.fr
- Ironport-phdr: 9a23:2FvvnxSTrD8qxiniuUx6QpCh0dpsv+yvbD5Q0YIujvd0So/mwa67bRSN2/xhgRfzUJnB7Loc0qyN7PCmBDdLuMvJmUtBWaIPfidNsd8RkQ0kDZzNImzAB9muURYHGt9fXkRu5XCxPBsdMs//Y1rPvi/6tmZKSV2sfTZyc//pE5TKkoG+0ea15pvYbi1MhSGhevV9IhKsogiXt88MgIIkJLxi5AHOpy5ucvhWzGdpKBq9ggz568Gs+9Y39S1Mu/sl9sMGX7jgeqk+UbtwCD0sKWFz6te95kqLdheG+nZJCjZeqRFPGQWQtBw=
- List-archive: <http://lists.gforge.inria.fr/pipermail/starpu-devel/>
- List-id: "Developers list. For discussion of new features, code changes, etc." <starpu-devel.lists.gforge.inria.fr>
All,
The StarPU team is really pleased to announce the v1.2.0 release!
This release notably brings an out-of-core support, a MIC Xeon Phi
support, an OpenMP runtime support, and a new internal communication
system for MPI.
The tarball along with the md5sum file and the OpenPGP signature are
available on the starpu web site
http://starpu.gforge.inria.fr/files/
or directly on the Gforge web site
https://gforge.inria.fr/frs/?group_id=1570
Nathalie, for the StarPU team.
---
StarPU 1.2.0 (svn revision 18521)
==============================================
New features:
* MIC Xeon Phi support
* SCC support
* New function starpu_sched_ctx_exec_parallel_code to execute a
parallel code on the workers of the given scheduler context
* MPI:
- New internal communication system : a unique tag called
is now used for all communications, and a system
of hashmaps on each node which stores pending receives has been
implemented. Every message is now coupled with an envelope, sent
before the corresponding data, which allows the receiver to
allocate data correctly, and to submit the matching receive of
the envelope.
- New function
starpu_mpi_irecv_detached_sequential_consistency which
allows to enable or disable the sequential consistency for
the given data handle (sequential consistency will be
enabled or disabled based on the value of the function
parameter and the value of the sequential consistency
defined for the given data)
- New functions starpu_mpi_task_build() and
starpu_mpi_task_post_build()
- New flag STARPU_NODE_SELECTION_POLICY to specify a policy for
selecting a node to execute the codelet when several nodes
own data in W mode.
- New selection node policies can be un/registered with the
functions starpu_mpi_node_selection_register_policy() and
starpu_mpi_node_selection_unregister_policy()
- New environment variable STARPU_MPI_COMM which enables
basic tracing of communications.
- New function starpu_mpi_init_comm() which allows to specify
a MPI communicator.
* New STARPU_COMMUTE flag which can be passed along STARPU_W or
STARPU_RW to
let starpu commute write accesses.
* Out-of-core support, through registration of disk areas as
additional memory
nodes. It can be enabled programmatically or through the
STARPU_DISK_SWAP*
environment variables.
* Reclaiming is now periodically done before memory becomes full. This can
be controlled through the STARPU_*_AVAILABLE_MEM environment variables.
* New hierarchical schedulers which allow the user to easily build
its own scheduler, by coding itself each "box" it wants, or by
combining existing boxes in StarPU to build it. Hierarchical
schedulers have very interesting scalability properties.
* Add STARPU_CUDA_ASYNC and STARPU_OPENCL_ASYNC flags to allow
asynchronous
CUDA and OpenCL kernel execution.
* Add STARPU_CUDA_PIPELINE and STARPU_OPENCL_PIPELINE to specify how
many asynchronous tasks are submitted in advance on CUDA and
OpenCL devices. Setting the value to 0 forces a synchronous
execution of all tasks.
* Add CUDA concurrent kernel execution support through
the STARPU_NWORKER_PER_CUDA environment variable.
* Add CUDA and OpenCL kernel submission pipelining, to overlap costs
and allow
concurrent kernel execution on Fermi cards.
* New locality work stealing scheduler (lws).
* Add STARPU_VARIABLE_NBUFFERS to be set in cl.nbuffers, and nbuffers and
modes field to the task structure, which permit to define codelets
taking a
variable number of data.
* Add support for implementing OpenMP runtimes on top of StarPU
* New performance model format to better represent parallel tasks.
Used to provide estimations for the execution times of the
parallel tasks on scheduling contexts or combined workers.
* starpu_data_idle_prefetch_on_node and
starpu_idle_prefetch_task_input_on_node allow to queue prefetches to
be done
only when the bus is idle.
* Make starpu_data_prefetch_on_node not forcibly flush data out, introduce
starpu_data_fetch_on_node for that.
* Add data access arbiters, to improve parallelism of concurrent data
accesses, notably with STARPU_COMMUTE.
* Anticipative writeback, to flush dirty data asynchronously before the
GPU device is full. Disabled by default. Use
STARPU_MINIMUM_CLEAN_BUFFERS
and STARPU_TARGET_CLEAN_BUFFERS to enable it.
* Add starpu_data_wont_use to advise that a piece of data will not be used
in the close future.
* Enable anticipative writeback by default.
* New scheduler 'dmdasd' that considers priority when deciding on
which worker to schedule
* Add the capability to define specific MPI datatypes for
StarPU user-defined interfaces.
* Add tasks.rec trace output to make scheduling analysis easier.
* Add Fortran 90 module and example using it
* New StarPU-MPI gdb debug functions
* Generate animated html trace of modular schedulers.
* Add asynchronous partition planning. It only supports coherency through
the main RAM for now.
* Add asynchronous partition planning. It only supports coherency through
the home node of data for now.
* Add STARPU_MALLOC_SIMULATION_FOLDED flag to save memory when simulating.
* Include application threads in the trace.
* Add starpu_task_get_task_scheduled_succs to get successors of a task.
* Add graph inspection facility for schedulers.
* New STARPU_LOCALITY flag to mark data which should be taken into account
by schedulers for improving locality.
* Experimental support for data locality in ws and lws.
* Add a preliminary framework for native Fortran support for StarPU
Small features:
* Tasks can now have a name (via the field const char *name of
struct starpu_task)
* New functions starpu_data_acquire_cb_sequential_consistency() and
starpu_data_acquire_on_node_cb_sequential_consistency() which allows
to enable or disable sequential consistency
* New configure option --enable-fxt-lock which enables additional
trace events focused on locks behaviour during the execution
* Functions starpu_insert_task and starpu_mpi_insert_task are
renamed in starpu_task_insert and starpu_mpi_task_insert. Old
names are kept to avoid breaking old codes.
* New configure option --enable-calibration-heuristic which allows
the user to set the maximum authorized deviation of the
history-based calibrator.
* Allow application to provide the task footprint itself.
* New function starpu_sched_ctx_display_workers() to display worker
information belonging to a given scheduler context
* The option --enable-verbose can be called with
--enable-verbose=extra to increase the verbosity
* Add codelet size, footprint and tag id in the paje trace.
* Add STARPU_TAG_ONLY, to specify a tag for traces without making StarPU
manage the tag.
* On Linux x86, spinlocks now block after a hundred tries. This avoids
typical 10ms pauses when the application thread tries to submit tasks.
* New function char *starpu_worker_get_type_as_string(enum
starpu_worker_archtype type)
* Improve static scheduling by adding support for specifying the task
execution order.
* Add starpu_worker_can_execute_task_impl and
starpu_worker_can_execute_task_first_impl to optimize getting the
working implementations
* Add STARPU_MALLOC_NORECLAIM flag to allocate without running a
reclaim if
the node is out of memory.
* New flag STARPU_DATA_MODE_ARRAY for the function family
starpu_task_insert to allow to define a array of data handles
along with their access modes.
* New configure option --enable-new-check to enable new testcases
which are known to fail
* Add starpu_memory_allocate and _deallocate to let the application
declare
its own allocation to the reclaiming engine.
* Add STARPU_SIMGRID_CUDA_MALLOC_COST and
STARPU_SIMGRID_CUDA_QUEUE_COST to
disable CUDA costs simulation in simgrid mode.
* Add starpu_task_get_task_succs to get the list of children of a given
task.
* Add starpu_malloc_on_node_flags, starpu_free_on_node_flags, and
starpu_malloc_on_node_set_default_flags to control the allocation flags
used for allocations done by starpu.
* Ranges can be provided in STARPU_WORKERS_CPUID
* Add starpu_fxt_autostart_profiling to be able to avoid autostart.
* Add arch_cost_function perfmodel function field.
* Add STARPU_TASK_BREAK_ON_SCHED, STARPU_TASK_BREAK_ON_PUSH, and
STARPU_TASK_BREAK_ON_POP environment variables to debug schedulers.
* Add starpu_sched_display tool.
* Add starpu_memory_pin and starpu_memory_unpin to pin memory allocated
another way than starpu_malloc.
* Add STARPU_NOWHERE to create synchronization tasks with data.
* Document how to switch between differents views of the same data.
* Add STARPU_NAME to specify a task name from a starpu_task_insert call.
* Add configure option to disable fortran --disable-fortran
* Add configure option to give path for smpirun executable --with-smpirun
* Add configure option to disable the build of tests --disable-build-tests
* Add starpu-all-tasks debugging support
* New function
void starpu_opencl_load_program_source_malloc(const char
*source_file_name, char **located_file_name, char **located_dir_name,
char **opencl_program_source)
which allocates the pointers located_file_name, located_dir_name
and opencl_program_source.
* Add submit_hook and do_schedule scheduler methods.
* Add starpu_sleep.
* Add starpu_task_list_ismember.
* Add _starpu_fifo_pop_this_task.
* Add STARPU_MAX_MEMORY_USE environment variable.
* Add starpu_worker_get_id_check().
* New function starpu_mpi_wait_for_all(MPI_Comm comm) that allows to
wait until all StarPU tasks and communications for the given
communicator are completed.
* New function starpu_codelet_unpack_args_and_copyleft() which
allows to copy in a new buffer values which have not been unpacked by
the current call
* Add STARPU_CODELET_SIMGRID_EXECUTE flag.
* Add STARPU_CL_ARGS flag to starpu_task_insert() and
starpu_mpi_task_insert() functions call
Changes:
* Data interfaces (variable, vector, matrix and block) now define
pack und unpack functions
* StarPU-MPI: Fix for being able to receive data which have not yet
been registered by the application (i.e it did not call
starpu_data_set_tag(), data are received as a raw memory)
* StarPU-MPI: Fix for being able to receive data with the same tag
from several nodes (see mpi/tests/gather.c)
* Remove the long-deprecated cost_model fields and task->buffers field.
* Fix complexity of implicit task/data dependency, from quadratic to
linear.
Small changes:
* Rename function starpu_trace_user_event() as
starpu_fxt_trace_user_event()
* "power" is renamed into "energy" wherever it applies, notably energy
consumption performance models
* Update starpu_task_build() to set starpu_task::cl_arg_free to 1 if
some arguments of type ::STARPU_VALUE are given.
* Simplify performance model loading API
* Better semantic for environment variables STARPU_NMIC and
STARPU_NMICDEVS, the number of devices and the number of cores.
STARPU_NMIC will be the number of devices, and STARPU_NMICCORES
will be the number of cores per device.
- [Starpu-devel] StarPU v1.2.0 released, Nathalie Furmento, 25/08/2016
Archives gérées par MHonArc 2.6.19+.