However, keep in mind that we will test your assignment on the full data set. So
that means you can downsample for experimentation, but make sure your algorithm
works on the full data set when you submit for grading.
**Note 2:**
The variable `langSpread` corresponds to how far away are languages from
the clustering algorithm's point of view. For a value of 50000, the languages are
too far away to be clustered together at all, resulting in a clustering that only
takes scores into account for each language (similarly to partitioning the data
across languages and then clustering based on the score). A more interesting (but
less scientific) clustering occurs when `langSpread` is set to 1 (we can't
set it to 0, as it loses language information completely), where we cluster according
to the score. See which language dominates the top questions now?
## Computing Cluster Details
After the call to kmeans, we have the following code in method `main`:
```scala
valresults=clusterResults(means,vectors)
printResults(results)
```
Implement the `clusterResults` method, which, for each cluster, computes:
- (a) the dominant programming language in the cluster;
- (b) the percent of answers that belong to the dominant language;
- (c) the size of the cluster (the number of questions it contains);
- (d) the median of the highest answer scores.
Once this value is returned, it is printed on the screen by the `printResults`
method.
## Questions
- Do you think that partitioning your data would help?
- Have you thought about persisting some of your data? Can you think of why persisting your data in memory may be helpful for this algorithm?
- Of the non-empty clusters, how many clusters have "Java" as their label (based on the majority of questions, see above)? Why?
- Only considering the "Java clusters", which clusters stand out and why?
- How are the "C# clusters" different compared to the "Java clusters"?
**Hint:** if you break the grader's time or memory constraints, think of how partitioning or persisting could, if at all, help you gain some performance. Please note that our grader only runs unit tests against your methods. It won't run the **main** method, so make sure to place any caching or partitioning code outside the **main**.