Commit d6a93182 authored by Guillaume Martres
Add solution to the 2021 midterm

with 889 additions and 0 deletions
Use the following commands to make a fresh clone of your repository:
git clone -b m1 m1
## Useful links
* [A guide to the Scala parallel collections](
* [The API documentation of the Scala parallel collections](
* [The API documentation of the Scala standard library](
* [The API documentation of the Java standard library](
**If you have issues with the IDE, try [reimporting the
if you still have problems, use `compile` in sbt instead.**
## Exercise
Given the following sequential implementation of a function that computes the sequence of rolling averages, your task will be to complete and optimize a parallel version of this code.
/** Compute the rolling average of array.
* For an array `arr = Arr(x1, x2, x3, ..., xn)` the result is
* `Arr(x1 / 1, (x1 + x2) / 2, (x1 + x2 + x3) / 3, ..., (x1 + x2 + x3 + ... + xn) / n)`
def rollingAveragesSequential(arr: Array[Int]): Array[Double] =
// Transform all numbers to fractions with denominator 1
val arr1 = => Frac(x, 1))
// Compute the rolling average keeping the sum of all elements in the numerator and the count of elements in the denominator.
val arr2 = arr1.scan(Frac(0, 0))((acc, x) => Frac(acc.numerator + x.numerator, acc.denominator + x.denominator))
// Transform fractions to Doubles => frac.toDouble)
// Drop the extra initial element that was added by the scan
This implementation has some issues:
- It does not use parallelism
- Creates two intermediate arrays by calling `map`
- Creates an extra intermediate arrays by calling `tail`
- Scan returns an extra element we do not need
We want to parallelize and avoid the creation of the extra arrays.
As we are calling a `scan` the natural operations we need are `upsweep` and `downsweep`.
It is possible specialize those operations for our problem by letting those operations do the mapping.
It is also possible to change those operations to not generate the first element.
We give you a version of `rollingAveragesSequential` that partially implements the parallelization using `upsweep` and `downsweep`.
Your tasks in the exercise will be to:
- TASK 1: Implement the parallelization of `upsweep` and `downsweep`
- TASK 2: Remove the calls to the `map`
- TASK 3: Remove the call to `tail`
You can get partial points for solving part of the tasks.
The order of the tasks is a suggestion, you may do them in any order if that is simpler for you.
Look at the `Lib` trait to find the definitions of functions and classes you can use (or already used).
In this question we use a `Arr` array class instead of the normal `Array`. You may assume that this class has the same performance characteristics as the normal array. `Arr` provides only a limited set of operations.
# General
# Dotty
# sbt
# datasets
// Student tasks (i.e. submit, packageSubmission)
course := "midterm"
assignment := "m1"
scalaVersion := "3.0.0-RC1"
scalacOptions ++= Seq("-language:implicitConversions", "-deprecation")
libraryDependencies += "org.scalameta" %% "munit" % "0.7.22"
val MUnitFramework = new TestFramework("munit.Framework")
testFrameworks += MUnitFramework
// Decode Scala names
testOptions += Tests.Argument(MUnitFramework, "-s")
testSuite := "m1.M1Suite"
File added
package sbt // To access the private[sbt] compilerReporter key
package filteringReporterPlugin
import Keys._
import ch.epfl.lamp._
object FilteringReporterPlugin extends AutoPlugin {
override lazy val projectSettings = Seq(
// Turn off warning coming from scalameter that we cannot fix without changing scalameter
compilerReporter in (Compile, compile) ~= { reporter => new FilteringReporter(reporter) }
class FilteringReporter(reporter: xsbti.Reporter) extends xsbti.Reporter {
def reset(): Unit = reporter.reset()
def hasErrors: Boolean = reporter.hasErrors
def hasWarnings: Boolean = reporter.hasWarnings
def printSummary(): Unit = reporter.printSummary()
def problems: Array[xsbti.Problem] = reporter.problems
def log(problem: xsbti.Problem): Unit = {
if (!problem.message.contains("An existential type that came from a Scala-2 classfile cannot be"))
def comment(pos: xsbti.Position, msg: String): Unit =
reporter.comment(pos, msg)
override def toString = s"CollectingReporter($reporter)"
package ch.epfl.lamp
import sbt._
import sbt.Keys._
* Coursera uses two versions of each assignment. They both have the same assignment key and part id but have
* different item ids.
* @param key Assignment key
* @param partId Assignment partId
* @param itemId Item id of the non premium version
* @param premiumItemId Item id of the premium version (`None` if the assignment is optional)
case class CourseraId(key: String, partId: String, itemId: String, premiumItemId: Option[String])
* Settings shared by all assignments, reused in various tasks.
object MOOCSettings extends AutoPlugin {
override def requires = super.requires && filteringReporterPlugin.FilteringReporterPlugin
object autoImport {
val course = SettingKey[String]("course")
val assignment = SettingKey[String]("assignment")
val options = SettingKey[Map[String, Map[String, String]]]("options")
val courseraId = settingKey[CourseraId]("Coursera-specific information identifying the assignment")
val testSuite = settingKey[String]("Fully qualified name of the test suite of this assignment")
// Convenient alias
type CourseraId = ch.epfl.lamp.CourseraId
val CourseraId = ch.epfl.lamp.CourseraId
import autoImport._
override val globalSettings: Seq[Def.Setting[_]] = Seq(
// supershell is verbose, buggy and useless.
useSuperShell := false
override val projectSettings: Seq[Def.Setting[_]] = Seq(
parallelExecution in Test := false,
// Report test result after each test instead of waiting for every test to finish
logBuffered in Test := false,
name := s"${course.value}-${assignment.value}"
package ch.epfl.lamp
import sbt._
import Keys._
// import scalaj.http._
import{File, FileInputStream, IOException}
import org.apache.commons.codec.binary.Base64
// import play.api.libs.json.{Json, JsObject, JsPath}
import scala.util.{Failure, Success, Try}
* Provides tasks for submitting the assignment
object StudentTasks extends AutoPlugin {
override def requires = super.requires && MOOCSettings
object autoImport {
val packageSourcesOnly = TaskKey[File]("packageSourcesOnly", "Package the sources of the project")
val packageBinWithoutResources = TaskKey[File]("packageBinWithoutResources", "Like packageBin, but without the resources")
val packageSubmissionZip = TaskKey[File]("packageSubmissionZip")
val packageSubmission = inputKey[Unit]("package solution as an archive file")
lazy val Grading = config("grading") extend(Runtime)
import autoImport._
import MOOCSettings.autoImport._
override lazy val projectSettings = Seq(
fork := true,
connectInput in run := true,
outputStrategy := Some(StdoutOutput),
) ++
packageSubmissionZipSettings ++
inConfig(Grading)(Defaults.testSettings ++ Seq(
unmanagedJars += file("grading-tests.jar"),
definedTests := (definedTests in Test).value,
internalDependencyClasspath := (internalDependencyClasspath in Test).value
/** **********************************************************
val packageSubmissionZipSettings = Seq(
packageSubmissionZip := {
val submission = crossTarget.value / ""
val sources = (packageSourcesOnly in Compile).value
val binaries = (packageBinWithoutResources in Compile).value -> "", binaries -> "binaries.jar"), submission, None)
artifactClassifier in packageSourcesOnly := Some("sources"),
artifact in (Compile, packageBinWithoutResources) ~= (art => art.withName( + "-without-resources"))
) ++
Defaults.packageTaskSettings(packageSourcesOnly, Defaults.sourceMappings) ++
Defaults.packageTaskSettings(packageBinWithoutResources, Def.task {
val relativePaths =
(unmanagedResources in Compile).value.flatMap(Path.relativeTo((unmanagedResourceDirectories in Compile).value)(_))
(mappings in (Compile, packageBin)).value.filterNot { case (_, path) => relativePaths.contains(path) }
val maxSubmitFileSize = {
val mb = 1024 * 1024
10 * mb
/** Check that the jar exists, isn't empty, isn't crazy big, and can be read
* If so, encode jar as base64 so we can send it to Coursera
def prepareJar(jar: File, s: TaskStreams): String = {
val errPrefix = "Error submitting assignment jar: "
val fileLength = jar.length()
if (!jar.exists()) {
s.log.error(errPrefix + "jar archive does not exist\n" + jar.getAbsolutePath)
} else if (fileLength == 0L) {
s.log.error(errPrefix + "jar archive is empty\n" + jar.getAbsolutePath)
} else if (fileLength > maxSubmitFileSize) {
s.log.error(errPrefix + "jar archive is too big. Allowed size: " +
maxSubmitFileSize + " bytes, found " + fileLength + " bytes.\n" +
} else {
val bytes = new Array[Byte](fileLength.toInt)
val sizeRead = try {
val is = new FileInputStream(jar)
val read =
} catch {
case ex: IOException =>
s.log.error(errPrefix + "failed to read sources jar archive\n" + ex.toString)
if (sizeRead != bytes.length) {
s.log.error(errPrefix + "failed to read the sources jar archive, size read: " + sizeRead)
} else encodeBase64(bytes)
/** Task to package solution to a given file path */
lazy val packageSubmissionSetting = packageSubmission := {
val args: Seq[String] = Def.spaceDelimited("[path]").parsed
val s: TaskStreams = streams.value // for logging
val jar = (packageSubmissionZip in Compile).value
val base64Jar = prepareJar(jar, s)
val path = args.headOption.getOrElse((baseDirectory.value / "submission.jar").absolutePath)
/** Task to submit a solution to coursera */
val submit = inputKey[Unit]("submit solution to Coursera")
lazy val submitSetting = submit := {
// Fail if scalafix linting does not pass.
val args: Seq[String] = Def.spaceDelimited("<arg>").parsed
val s: TaskStreams = streams.value // for logging
val jar = (packageSubmissionZip in Compile).value
val assignmentDetails =
courseraId.?.value.getOrElse(throw new MessageOnlyException("This assignment can not be submitted to Coursera because the `courseraId` setting is undefined"))
val assignmentKey = assignmentDetails.key
val courseName =
course.value match {
case "capstone" => "scala-capstone"
case "bigdata" => "scala-spark-big-data"
case other => other
val partId = assignmentDetails.partId
val itemId = assignmentDetails.itemId
val premiumItemId = assignmentDetails.premiumItemId
val (email, secret) = args match {
case email :: secret :: Nil =>
(email, secret)
case _ =>
val inputErr =
s"""|Invalid input to `submit`. The required syntax for `submit` is:
|submit <email-address> <submit-token>
|The submit token is NOT YOUR LOGIN PASSWORD.
|It can be obtained from the assignment page:
premiumItemId.fold("") { id =>
s"""or (for premium learners):
val base64Jar = prepareJar(jar, s)
val json =
| "assignmentKey":"$assignmentKey",
| "submitterEmail":"$email",
| "secret":"$secret",
| "parts":{
| "$partId":{
| "output":"$base64Jar"
| }
| }
def postSubmission[T](data: String): Try[HttpResponse[String]] = {
val http = Http("")
val hs = List(
("Cache-Control", "no-cache"),
("Content-Type", "application/json")
)"Connecting to Coursera...")
val response = Try(http.postData(data)
.option(HttpOptions.connTimeout(10000)) // scalaj default timeout is only 100ms, changing that to 10s
.asString) // kick off HTTP POST
val connectMsg =
s"""|Attempting to submit "${assignment.value}" assignment in "$courseName" course
|- email: $email
|- submit token: $secret""".stripMargin
def reportCourseraResponse(response: HttpResponse[String]): Unit = {
val code = response.code
val respBody = response.body
/* Sample JSON response from Coursera
"message": "Invalid email or token.",
"details": {
"learnerMessage": "Invalid email or token."
// Success, Coursera responds with 2xx HTTP status code
if (response.is2xx) {
val successfulSubmitMsg =
s"""|Successfully connected to Coursera. (Status $code)
|Assignment submitted successfully!
|You can see how you scored by going to:
premiumItemId.fold("") { id =>
s"""or (for premium learners):
|and clicking on "My Submission".""".stripMargin
// Failure, Coursera responds with 4xx HTTP status code (client-side failure)
else if (response.is4xx) {
val result = Try(Json.parse(respBody)).toOption
val learnerMsg = result match {
case Some(resp: JsObject) =>
(JsPath \ "details" \ "learnerMessage").read[String].reads(resp).get
case Some(x) => // shouldn't happen
"Could not parse Coursera's response:\n" + x
case None =>
"Could not parse Coursera's response:\n" + respBody
val failedSubmitMsg =
s"""|Submission failed.
|There was something wrong while attempting to submit.
|Coursera says:
|$learnerMsg (Status $code)""".stripMargin
// Failure, Coursera responds with 5xx HTTP status code (server-side failure)
else if (response.is5xx) {
val failedSubmitMsg =
s"""|Submission failed.
|Coursera seems to be unavailable at the moment (Status $code)
|Check and try again in a few minutes.
// Failure, Coursera repsonds with an unexpected status code
else {
val failedSubmitMsg =
s"""|Submission failed.
|Coursera replied with an unexpected code (Status $code)
// kick it all off, actually make request
postSubmission(json) match {
case Success(resp) => reportCourseraResponse(resp)
case Failure(e) =>
val failedConnectMsg =
s"""|Connection to Coursera failed.
|There was something wrong while attempting to connect to Coursera.
|Check your internet connection.
def failSubmit(): Nothing = {
sys.error("Submission failed")
* *****************
def encodeBase64(bytes: Array[Byte]): String =
new String(Base64.encodeBase64(bytes))
// Used for Coursera submission (StudentPlugin)
// libraryDependencies += "org.scalaj" %% "scalaj-http" % "2.4.2"
// libraryDependencies += "" %% "play-json" % "2.7.4"
// Used for Base64 (StudentPlugin)
libraryDependencies += "commons-codec" % "commons-codec" % "1.10"
// addSbtPlugin("org.scala-js" % "sbt-scalajs" % "0.6.28")
addSbtPlugin("ch.epfl.lamp" % "sbt-dotty" % "0.5.3")
package m1
trait Lib {
/** If an array has `n` elements and `n < THRESHOLD`, then it should be processed sequentially */
final val THRESHOLD: Int = 33
/** Compute the two values in parallel
* Note: Most tests just compute those two sequentially to make any bug simpler to debug
def parallel[T1, T2](op1: => T1, op2: => T2): (T1, T2)
/** A limited array. It only contains the required operations for this exercise. */
trait Arr[T] {
/** Get the i-th element of the array (0-based) */
def apply(i: Int): T
/** Update the i-th element of the array with the given value (0-based) */
def update(i: Int, x: T): Unit
/** Number of elements in this array */
def length: Int
/** Create a copy of this array without the first element */
def tail: Arr[T]
/** Create a copy of this array by mapping all the elements with the given function */
def map[U](f: T => U): Arr[U]
object Arr {
/** Create an array with the given elements */
def apply[T](xs: T*): Arr[T] = {
val arr: Arr[T] = Arr.ofLength(xs.length)
for i <- 0 until xs.length do arr(i) = xs(i)
/** Create an array with the given length. All elements are initialized to `null`. */
def ofLength[T](n: Int): Arr[T] =
/** Create an array with the given length. All elements are initialized to `null`. */
def newArrOfLength[T](n: Int): Arr[T]
/** A fractional number representing `numerator/denominator` */
case class Frac(numerator: Int, denominator: Int) {
def toDouble: Double = numerator.toDouble / denominator
/** Tree result of an upsweep operation. Specialized for `Frac` results. */
trait TreeRes { val res: Frac }
/** Leaf result of an upsweep operation. Specialized for `Frac` results. */
case class Leaf(from: Int, to: Int, res: Frac) extends TreeRes
/** Tree node result of an upsweep operation. Specialized for `Frac` results. */
case class Node(left: TreeRes, res: Frac, right: TreeRes) extends TreeRes
package m1
trait M1 extends Lib {
// Functions and classes of Lib can be used in here
/** Compute the rolling average of array.
* For an array `arr = Arr(x1, x2, x3, ..., xn)` the result is
* `Arr(x1 / 1, (x1 + x2) / 2, (x1 + x2 + x3) / 3, ..., (x1 + x2 + x3 + ... + xn) / n)`
def rollingAveragesParallel(arr: Arr[Int]): Arr[Double] = {
if (arr.length == 0) return Arr.ofLength(0)
val out: Arr[Double] = Arr.ofLength(arr.length)
val tree = upsweep(arr, 0, arr.length)
downsweep(arr, Frac(0, 0), tree, out)
// No need to modify this
def scanOp(acc: Frac, x: Frac) =
Frac(acc.numerator + x.numerator, acc.denominator + x.denominator)
def upsweep(input: Arr[Int], from: Int, to: Int): TreeRes = {
if (to - from < THRESHOLD)
Leaf(from, to, reduceSequential(input, from + 1, to, Frac(input(from), 1)))
else {
val mid = from + (to - from)/2
val (tL, tR) = parallel(
upsweep(input, from, mid),
upsweep(input, mid, to)
Node(tL, scanOp(tL.res, tR.res), tR)
def downsweep(input: Arr[Int], a0: Frac, tree: TreeRes, output: Arr[Double]): Unit = {
tree match {
case Node(left, _, right) =>
downsweep(input, a0, left, output),
downsweep(input, scanOp(a0, left.res), right, output)
case Leaf(from, to, _) =>
downsweepSequential(input, from, to, a0, output)
def downsweepSequential(input: Arr[Int], from: Int, to: Int, a0: Frac, output: Arr[Double]): Unit = {
if (from < to) {
var i = from
var a = a0
while (i < to) {
a = scanOp(a, Frac(input(i), 1))
output(i) = a.toDouble
i = i + 1
def reduceSequential(input: Arr[Int], from: Int, to: Int, a0: Frac): Frac = {
var a = a0
var i = from
while (i < to) {
a = scanOp(a, Frac(input(i), 1))
i = i + 1
package m1
class M1Suite extends munit.FunSuite {
test("Rolling average result test (5pts)") {
test("[TASK 1] Rolling average parallelism test (30pts)") {
test("[TASK 2] Rolling average no `map` test (35pts)") {
test("[TASK 3] Rolling average no `tail` test (30pts)") {
object RollingAveragesBasicLogicTest extends M1 with LibImpl with RollingAveragesTest {
def parallel[T1, T2](op1: => T1, op2: => T2): (T1, T2) = (op1, op2)
def newArrFrom[T](arr: Array[AnyRef]): Arr[T] = new ArrImpl(arr)
object RollingAveragesCallsToParallel extends M1 with LibImpl with RollingAveragesTest {
private var count = 0
def parallel[T1, T2](op1: => T1, op2: => T2): (T1, T2) =
count += 1
(op1, op2)
def newArrFrom[T](arr: Array[AnyRef]): Arr[T] = new ArrImpl(arr)
def parallelismTest() = {
assertParallelCount(Arr(), 0)
assertParallelCount(Arr(1), 0)
assertParallelCount(Arr(1, 2, 3, 4), 0)
assertParallelCount(Arr(Array.tabulate(16)(identity): _*), 0)
assertParallelCount(Arr(Array.tabulate(32)(identity): _*), 0)
assertParallelCount(Arr(Array.tabulate(33)(identity): _*), 2)
assertParallelCount(Arr(Array.tabulate(64)(identity): _*), 2)
assertParallelCount(Arr(Array.tabulate(128)(identity): _*), 6)
assertParallelCount(Arr(Array.tabulate(256)(identity): _*), 14)
assertParallelCount(Arr(Array.tabulate(1000)(identity): _*), 62)
assertParallelCount(Arr(Array.tabulate(1024)(identity): _*), 62)
def assertParallelCount(arr: Arr[Int], expected: Int): Unit = {
try {
count = 0
assert(count == expected, {
val extra = if (expected == 0) "" else s" ${expected/2} for the `upsweep` and ${expected/2} for the `downsweep`"
s"\n$arr\n\nERROR: Expected $expected instead of $count calls to `parallel(...)` for an array of ${arr.length} elements. Current parallel threshold is $THRESHOLD.$extra"
} finally {
count = 0
object RollingAveragesNoMap extends M1 with LibImpl with RollingAveragesTest {
def parallel[T1, T2](op1: => T1, op2: => T2): (T1, T2) = (op1, op2)
def newArrFrom[T](arr: Array[AnyRef]): Arr[T] = new ArrImpl[T](arr) {
override def map[U](f: T => U): Arr[U] = throw Exception("Should not call")
object RollingAveragesNoTail extends M1 with LibImpl with RollingAveragesTest {
def parallel[T1, T2](op1: => T1, op2: => T2): (T1, T2) = (op1, op2)
def newArrFrom[T](arr: Array[AnyRef]): Arr[T] = new ArrImpl[T](arr) {
override def tail: Arr[T] = throw Exception("Should not call Arr.tail")
object RollingAveragesParallel extends M1 with LibImpl with RollingAveragesTest {
import scala.concurrent.duration._
val TIMEOUT = Duration(10, SECONDS)
def parallel[T1, T2](op1: => T1, op2: => T2): (T1, T2) = {
import scala.concurrent._
Await.result(Future(op1).zip(Future(op2)), TIMEOUT) // FIXME not timing-out
def newArrFrom[T](arr: Array[AnyRef]): Arr[T] = new ArrImpl(arr)
trait LibImpl extends Lib {
def newArrFrom[T](arr: Array[AnyRef]): Arr[T]
def newArrOfLength[T](n: Int): Arr[T] =
newArrFrom(new Array(n))
class ArrImpl[T](val arr: Array[AnyRef]) extends Arr[T]:
def apply(i: Int): T =
def update(i: Int, x: T): Unit =
arr(i) = x.asInstanceOf[AnyRef]
def length: Int =
def map[U](f: T => U): Arr[U] =
newArrFrom([AnyRef => AnyRef]))
def tail: Arr[T] =
override def toString: String =
arr.mkString("Arr(", ", ", ")")
override def equals(that: Any): Boolean =
that match
case that: ArrImpl[_] => Array.equals(arr, that.arr)
case _ => false
trait RollingAveragesTest extends M1 {
def tabulate[T](n: Int)(f: Int => T): Arr[T] =
val arr = Arr.ofLength[T](n)
for i <- 0 until n do
arr(i) = f(i)
def basicTests() = {
assertEquals(rollingAveragesParallel(Arr()), Arr[Double]())
assertEquals(rollingAveragesParallel(Arr(1)), Arr[Double](1))
assertEquals(rollingAveragesParallel(Arr(1, 2, 3, 4)), Arr(1, 1.5, 2, 2.5))
assertEquals(rollingAveragesParallel(Arr(4, 4, 4, 4)), Arr[Double](4, 4, 4, 4))
def normalTests() = {
assertEquals(rollingAveragesParallel(Arr(Array.tabulate(64)(identity): _*)), Arr(Array.tabulate(64)(_.toDouble / 2): _*))
assertEquals(rollingAveragesParallel(Arr(4, 4, 4, 4)), Arr[Double](4, 4, 4, 4))
assertEquals(rollingAveragesParallel(Arr(4, 8, 6, 4)), Arr[Double](4, 6, 6, 5.5))
assertEquals(rollingAveragesParallel(Arr(4, 3, 2, 1)), Arr(4, 3.5, 3, 2.5))
assertEquals(rollingAveragesParallel(Arr(Array.tabulate(64)(identity).reverse: _*)), Arr(Array.tabulate(64)(i => 63 - i.toDouble / 2): _*))
assertEquals(rollingAveragesParallel(Arr(Array.tabulate(128)(i => 128 - 2*i).reverse: _*)), Arr(Array.tabulate(128)(i => -126d + i): _*))
def largeTests() = {
assertEquals(rollingAveragesParallel(Arr(Array.tabulate(500)(identity): _*)), Arr(Array.tabulate(500)(_.toDouble / 2): _*))
assertEquals(rollingAveragesParallel(Arr(Array.tabulate(512)(identity): _*)), Arr(Array.tabulate(512)(_.toDouble / 2): _*))
assertEquals(rollingAveragesParallel(Arr(Array.tabulate(1_000)(identity): _*)), Arr(Array.tabulate(1_000)(_.toDouble / 2): _*))
assertEquals(rollingAveragesParallel(Arr(Array.tabulate(10_000)(identity): _*)), Arr(Array.tabulate(10_000)(_.toDouble / 2): _*))
\ No newline at end of file
