Implement and evaluate performance of asynchronous I/O

According to some NASA developers (FUN3D), an adjoint based adaptive solver can greatly benefit from writing (primal) / reading (dual) data asynchronous.

The idea is to start writing data for time step n, and let the runtime system handle it while the simulation computes step n+1. In a sense it's similar to overlapping communication with computation.

Compute time step n
Write step n (async.)
Compute time step n+1
Wait until step n has been written
Write step n+1 (async.)
und so weiter...

Of course there are several parameters to tune here, e.g. the length of the overlap (how many time steps before syncing). Anyway I think it's a neat idea that is worth trying out.

