starpu-devel - [Starpu-devel] StarPU v1.2.0rc1 released

Objet : Developers list for StarPU

Archives de la liste

[Starpu-devel] StarPU v1.2.0rc1 released

From: Nathalie Furmento <nathalie.furmento@labri.fr>
To: starpu-announce@lists.gforge.inria.fr
Subject: [Starpu-devel] StarPU v1.2.0rc1 released
Date: Wed, 11 Mar 2015 10:30:00 +0100
List-archive: <http://lists.gforge.inria.fr/pipermail/starpu-devel/>
List-id: "Developers list. For discussion of new features, code changes, etc." <starpu-devel.lists.gforge.inria.fr>

All,

The StarPU team is pleased to announce the first release candidate of v1.2.0. This new release notably brings an out-of-core support, a MIC Xeon Phi support, an OpenMP runtime support, and a new internal communication system for MPI.

The tarball along with the md5sum file and the OpenPGP signature are available on the Gforge web site

https://gforge.inria.fr/frs/?group_id=1570

or on the secondary web site

http://runtime.bordeaux.inria.fr/StarPU/files/

Nathalie, for the StarPU team.
---

StarPU 1.2.0rc1 (svn revision 14851)
==============================================

New features:
* MIC Xeon Phi support
* SCC support
* New function starpu_sched_ctx_exec_parallel_code to execute a
parallel code on the workers of the given scheduler context
* MPI:
- New internal communication system : a unique tag called
is now used for all communications, and a system
of hashmaps on each node which stores pending receives has
been implemented. Every message is now coupled with an
envelope, sent
before the corresponding data, which allows the receiver to
allocate data correctly, and to submit the matching receive of
the envelope.
- New function
starpu_mpi_irecv_detached_sequential_consistency which
allows to enable or disable the sequential consistency for
the given data handle (sequential consistency will be
enabled or disabled based on the value of the function
parameter and the value of the sequential consistency
defined for the given data)
- New functions starpu_mpi_task_build() and
starpu_mpi_task_post_build()
- New flag STARPU_NODE_SELECTION_POLICY to specify a policy for
selecting a node to execute the codelet when several nodes
own data in W mode.
- New selection node policies can be un/registered with the
functions starpu_mpi_node_selection_register_policy() and
starpu_mpi_node_selection_unregister_policy()
- New environment variable STARPU_MPI_COMM which enables
basic tracing of communications.
- New function starpu_mpi_init_comm() which allows to specify
a MPI communicator.

* New STARPU_COMMUTE flag which can be passed along STARPU_W or
STARPU_RW to
let starpu commute write accesses.
* Out-of-core support, through registration of disk areas as
additional memory
nodes. It can be enabled programmatically or through the
STARPU_DISK_SWAP*
environment variables.
* Reclaiming is now periodically done before memory becomes full.
This can
be controlled through the STARPU_*_AVAILABLE_MEM environment
variables.
* New hierarchical schedulers which allow the user to easily build
its own scheduler, by coding itself each "box" it wants, or by
combining existing boxes in StarPU to build it. Hierarchical
schedulers have very interesting scalability properties.
* Add STARPU_CUDA_ASYNC and STARPU_OPENCL_ASYNC flags to allow
asynchronous
CUDA and OpenCL kernel execution.
* Add STARPU_CUDA_PIPELINE and STARPU_OPENCL_PIPELINE to specify how
many asynchronous tasks are submitted in advance on CUDA and
OpenCL devices. Setting the value to 0 forces a synchronous
execution of all tasks.
* Add CUDA concurrent kernel execution support through
the STARPU_NWORKER_PER_CUDA environment variable.
* Add CUDA and OpenCL kernel submission pipelining, to overlap costs
and allow
concurrent kernel execution on Fermi cards.
* New locality work stealing scheduler (lws).
* Add STARPU_VARIABLE_NBUFFERS to be set in cl.nbuffers, and nbuffers
and
modes field to the task structure, which permit to define codelets
taking a
variable number of data.
* Add support for implementing OpenMP runtimes on top of StarPU
* New performance model format to better represent parallel tasks.
Used to provide estimations for the execution times of the
parallel tasks on scheduling contexts or combined workers.
* starpu_data_idle_prefetch_on_node and
starpu_idle_prefetch_task_input_on_node allow to queue prefetches
to be done
only when the bus is idle.
* Make starpu_data_prefetch_on_node not forcibly flush data out,
introduce
starpu_data_fetch_on_node for that.

Small features:
* Tasks can now have a name (via the field const char *name of
struct starpu_task)
* New functions starpu_data_acquire_cb_sequential_consistency() and
starpu_data_acquire_on_node_cb_sequential_consistency() which allows
to enable or disable sequential consistency
* New configure option --enable-fxt-lock which enables additional
trace events focused on locks behaviour during the execution
* Functions starpu_insert_task and starpu_mpi_insert_task are
renamed in starpu_task_insert and starpu_mpi_task_insert. Old
names are kept to avoid breaking old codes.
* New configure option --enable-calibration-heuristic which allows
the user to set the maximum authorized deviation of the
history-based calibrator.
* Allow application to provide the task footprint itself.
* New function starpu_sched_ctx_display_workers() to display worker
information belonging to a given scheduler context
* The option --enable-verbose can be called with
--enable-verbose=extra to increase the verbosity
* Add codelet size, footprint and tag id in the paje trace.
* Add STARPU_TAG_ONLY, to specify a tag for traces without making
StarPU
manage the tag.
* On Linux x86, spinlocks now block after a hundred tries. This avoids
typical 10ms pauses when the application thread tries to submit
tasks.
* New function char *starpu_worker_get_type_as_string(enum
starpu_worker_archtype type)
* Improve static scheduling by adding support for specifying the task
execution order.
* Add starpu_worker_can_execute_task_impl and
starpu_worker_can_execute_task_first_impl to optimize getting the
working implementations
* Add STARPU_MALLOC_NORECLAIM flag to allocate without running a
reclaim if
the node is out of memory.
* New flag STARPU_DATA_MODE_ARRAY for the function family
starpu_task_insert to allow to define a array of data handles
along with their access modes.
* New configure option --enable-new-check to enable new testcases
which are known to fail
* Add starpu_memory_allocate and _deallocate to let the application
declare
its own allocation to the reclaiming engine.
* Add STARPU_SIMGRID_CUDA_MALLOC_COST and
STARPU_SIMGRID_CUDA_QUEUE_COST to
disable CUDA costs simulation in simgrid mode.

Changes:
* Data interfaces (variable, vector, matrix and block) now define
pack und unpack functions
* StarPU-MPI: Fix for being able to receive data which have not yet
been registered by the application (i.e it did not call
starpu_data_set_tag(), data are received as a raw memory)
* StarPU-MPI: Fix for being able to receive data with the same tag
from several nodes (see mpi/tests/gather.c)
* Remove the long-deprecated cost_model fields and task->buffers
field.
* Fix complexity of implicit task/data dependency, from quadratic to
linear.

Small changes:
* Rename function starpu_trace_user_event() as
starpu_fxt_trace_user_event()

[Starpu-devel] StarPU v1.2.0rc1 released, Nathalie Furmento, 11/03/2015

Archives gérées par MHonArc 2.6.19+.

Archives de la liste

[Starpu-devel] StarPU v1.2.0rc1 released