Accéder au contenu.
Menu Sympa

starpu-devel - Re: [Starpu-devel] StarPU+SimGrid: FetchingInput computation

Objet : Developers list for StarPU

Archives de la liste

Re: [Starpu-devel] StarPU+SimGrid: FetchingInput computation


Chronologique Discussions 
  • From: Luka Stanisic <luka.stanisic@inria.fr>
  • To: Mirko Myllykoski <mirkom@cs.umu.se>
  • Cc: starpu-devel@lists.gforge.inria.fr
  • Subject: Re: [Starpu-devel] StarPU+SimGrid: FetchingInput computation
  • Date: Wed, 14 Dec 2016 11:38:20 +0100
  • List-archive: <http://lists.gforge.inria.fr/pipermail/starpu-devel/>
  • List-id: "Developers list. For discussion of new features, code changes, etc." <starpu-devel.lists.gforge.inria.fr>

Hi Mirko,

Strangely, the difference between Native and SimGrid does seem to come from FetchingInput on workers and/or Allocating on memory node (see the attached table with some stats). Other discrepancies in state durations are less relevant or a product of the previous two. In fact, I am not even sure if maybe longer Allocating is actually responsible for longer corresponding FetchingInputs. I will inspect the code in more details and come to you ASAP.

As duration of your tasks (process_window, left_update, right_update) is predicted quite accurately, I believe that once we solve this Allocating/FetchingInput issue, you will be able to get faithful simulation predictions for more cores and more complicated inputs. In the future, if your tasks are too complex and simple linear regression models arent sufficient, you might want to switch to more advanced multiple linear regression models, currently available only in StarPU trunk (see examples/mlr/mlr.c and trunk documentation for more details), but that will be the part of the next StarPU release.

Best,
Luka

On 14/12/2016 10:22, Mirko Myllykoski wrote:
Hi Luka,

Just to clarify my previous email. I do not doubt SimGrid's ability to accurately predict execution times in general. However, the numerical code I am developing has some features which I believe will make the simulation harder (large number of small tasks, complicated data dependencies, varying sensitivity to parameter value changes, etc).

Best Regards,
Mirko

On 2016-12-14 09:52, Mirko Myllykoski wrote:
Hi Luka,

Here are the two paje traces you requested:

https://dl.dropboxusercontent.com/u/1521774/paje.trace.tar.gz
https://dl.dropboxusercontent.com/u/1521774/paje-simgrid.trace.tar.gz

I must say that the way I am using the starpu_perfmodel::size_base
field is a bit unorthodox. That's why I ran the simulation twice, once
with the size_base field and once without it. My ultimate intention is
to autotune my code using an external black box software. However, the
execution time may vary from a few seconds to hours depending on the
input data and various parameters. I hope that SimGrid would help with
this problem by saving countless CPU hours.

Right now, I am trying to figure out whether this idea is feasible. I
realized that a linear regression model can predict the codelet
execution times quite accurately provided that the input data does not
change too much and parameters are kept constants. These regression
models cannot be used to autotune anything but if SimGrid fails the
predict the total execution time in this simple case, then I can be
quite sure that this overall idea does not work and I should try
something else instead.

Best Regards,
Mirko

On 2016-12-13 16:40, Luka Stanisic wrote:
Hi Mirko,

Indeed, I was wondering if your platform has any GPUs, but as you said
it is a simple 4 cores machine. Adding more CPUs or GPUs in future
shouldnt be a problem.

You are right, SimGrid shouldnt include any significant fetching time
to the simulation since everything is running with shared memory.
However, appearance of FetchingInput state in the traces is possible,
since StarPU is passing through many parts of the code. Still, the
duration of FetchingInput should be negligible.

Could you please share two paje.trace traces (one for real execution
and one for SimGrid), so I can try to understand better what is
happening? If the traces are big (>100MB), it might be better to run
your application with smaller problem size (if possible).


Also from what I have seen, you are using STARPU_REGRESSION_BASED or
STARPU_NL_REGRESSION_BASED performance models for your codelets,
right? Is this something that you need for your application?
Personally, I have never tried to simulate applications using these
models, although I dont see any reason why it shouldnt work. The
starpu_perfmodel::size_base field is actually used by these models,
more information is available here:
http://starpu.gforge.inria.fr/doc/html/OnlinePerformanceTools.html


So my first guess is that you are somehow using codelet perfmodels and
their size_base incorrectly (or there is an unknown bug in StarPU or
StarPU+SimGrid code), which makes simulation longer than expected.
Then, in the traces this is manifested as long FetchingInputs,
although fetching inputs have nothing to do with the actual problem.


Best regards,
Luka
On 13/12/2016 14:34, Mirko Myllykoski wrote:
Hi Luka,

and thank you for your reply.

I performed the same experiment twice, once with the size_base field included and once without it. I erased the samples directory before each experiment and gave it a few rounds to calibrate properly (STARPU_CALIBRATE=1). Here are the corresponding sample folders:

https://dl.dropboxusercontent.com/u/1521774/sampling_with_size_base.tar.gz https://dl.dropboxusercontent.com/u/1521774/sampling_without_size_base.tar.gz In this case, the error seems to be about 35%.

As I mentioned in my previous email, the code is shared memory only (at the moment). I performed the experiment on my local machine (quad-core i5) but my plan is to move on to a bigger machine (28 or 42 cores per node) and distributed memory once everything works.

I don't quite understand why SimGrid would include any fetching time to the simulation since everything is running in shared memory.

Best Regards,
Mirko

On 2016-12-12 18:21, Luka Stanisic wrote:
Hello Mirko,

Indeed, 50% prediction error is quite big and it suggests that
something is probably not correctly configured. Could you please send
us a compressed version of you ".starpu/sampling" folder, the one from
which simulation will read the performance models. This can help us
get the first ideas of the machine and application you are trying to
simulate.

To answer your question, the fetching time is computed based on the
size of the data being transfered, latency and bandwidth of the link
(in machine.platform.xml file) and the possible contention due to
other transfers occurring in parallel.

Best regards,
Luka

On 07/12/2016 12:50, Mirko Myllykoski wrote:
Hi,

my name is Mirko Myllykoski and I work as a PostDoc researcher for the NLAFET project at Umeå University.

I am currently implementing a (shared memory) numerical software using StarPU and I am trying to simulate my code using SimGrid. However, I noticed that the simulated execution time is way off (about 50%). I checked the generated FxT traces using vite and it seems that SimGrid introduces too much fetching time (State: FetchingInput) to the simulation.

How is this fetching time being computed? My performance models include the starpu_perfmodel::size_base data field and I guess that information is somehow used to compute the fetch time.

Best Regards,
Mirko Myllykoski
_______________________________________________
Starpu-devel mailing list
Starpu-devel@lists.gforge.inria.fr
http://lists.gforge.inria.fr/mailman/listinfo/starpu-devel
"Value", "Events_paje_native.trace.csv", "Duration_paje_native.trace.csv", "Events_paje_simgrid.trace.csv", "Duration_paje_simgrid.trace.csv"
"Allocating", 4104, 6.368949, 4104, 1378.633334
"AllocatingReuse", 4104, 0.992813, 4104, 0
"Building task", 2052, 1.348218, 2052, 0
"Callback", 4104, 14.912101, 4104, 40.91
"Deinitializing", 4, 0.082353, 0, 0
"FetchingInput", 2052, 21.215959, 2052, 2613.367004
"Freeing", 4104, 24.679875, 4104, 0
"Idle", 2056, 1.979639, 2056, 0.04
"Initializing", 4, 0.335848, 4, 0
"left_update_pm", 438, 1637.44185, 438, 1556.650487
"Nothing", 12312, 2029.802094, 12312, 1309.524657
"Overhead", 6294, 135.424937, 6294, 64.223261
"process_window_pm", 134, 1064.191107, 134, 1069.998024
"PushingOutput", 2052, 13.358214, 2052, 0.13
"right_update_pm", 1480, 5184.519075, 1480, 5072.887809
"Scheduling", 8198, 19.398561, 8236, 40.068355
"Sleeping", 85, 232.934657, 41, 373.994408
"Submittings task", 11671, 40.554317, 11671, 158.841338
"Submitting task", 147, 9.05943, 147, 1.47
"Waiting task", 1424, 1989.405741, 1423, 2568.8857



Archives gérées par MHonArc 2.6.19+.

Haut de le page