Use the following commands to make a fresh clone of your repository:

```
git clone -b m3 git@gitlab.epfl.ch:lamp/student-repositories-s21/cs206-GASPAR.git m3
```

## Useful links

  * [A guide to the Scala parallel collections](https://docs.scala-lang.org/overviews/parallel-collections/overview.html)
  * [The API documentation of the Scala parallel collections](https://www.javadoc.io/doc/org.scala-lang.modules/scala-parallel-collections_2.13/latest/scala/collection/index.html)
  * [The API documentation of the Scala standard library](https://www.scala-lang.org/files/archive/api/2.13.4)
  * [The API documentation of the Java standard library](https://docs.oracle.com/en/java/javase/15/docs/api/index.html)

**If you have issues with the IDE, try [reimporting the
build](https://gitlab.epfl.ch/lamp/cs206/-/blob/master/labs/example-lab.md#ide-features-like-type-on-hover-or-go-to-definition-do-not-work),
if you still have problems, use `compile` in sbt instead.**

## Exercise

Given the following sequential implementation of a function that computes the sequence of rolling windowed averages, your task will be to complete and optimize a parallel version of this code.

```scala
/** Compute the rolling windowed mean of an array.
  *
  *  For an array `arr = Arr(x1, x2, x3, ..., x_n)` the result is
  *  `Arr(x1, (x1+x2)/2, (x1+x2+x3)/3, (x2+x3+x4)/3, ..., (x_{n-2}, x_{n-1}, x_n)/n)`
  */
def rollingWinMeanParallel(arr: Arr[Int]): Arr[Double] = {
  // Transform all numbers to averages of window of size 1
  val arr1 = arr.map(x => AvgWin(x :: Nil))
  // Compute the rolling windowed average by combining the windows together
  val arr2 = arr1.scan(AvgWin(Nil))((acc, x) => acc.pushAll(x))
  // Transform the windowed averages to Doubles
  arr2.map(win => win.toDouble)
  // Drop the extra initial element that was added by the scan
  arr3.tail
```

 This implementation has some issues:
 - It does not use parallelism
 - Creates two intermediate arrays by calling `map`
 - Creates an extra intermediate arrays by calling `tail`
 - Scan returns an extra element we do not need

 We want to parallelize and avoid the creation of the extra arrays.
 As we are calling a `scan` the natural operations we need are `upsweep` and `downsweep`.
 It is possible specialize those operations for our problem by letting those operations do the mapping.
 It is also possible to change those operations to not generate the first element.

We give you a version of `rollingWinMeanSequential` that partially implements the parallelization using `upsweep` and `downsweep`.

 Your tasks in the exercise will be to:
 - TASK 1: Implement the parallelization of `upsweep` and `downsweep`
 - TASK 2: Remove the calls to the `map`
 - TASK 3: Remove the call to `tail`

 You can get partial points for solving part of the tasks.
 The order of the tasks is a suggestion, you may do them in any order if that is simpler for you.

Look at the `Lib` trait to find the definitions of functions and classes you can use (or already used).
In this question we use a `Arr` array class instead of the normal `Array`. You may assume that this class has the same performance characteristics as the normal array. `Arr` provides only a limited set of operations.