-
Notifications
You must be signed in to change notification settings - Fork 9
Implementing Combiners
Here we will see how the combiners are implemented for various collections, and learn about data structures that are more amenable for parallel computations.
So while the emphasis in the previous week was on data parallel APIs and parallel programming obstructions, this week's lecture will be more algorithmic in nature. It will give you an insight into how to pick the right data structure, and organize the data in a parallel program.
Recall that a transformer operation is a collection operation that creates another collection instead of just a single value. Methods such as filter
, map
, flatMap
and groupBy
are examples of transformer operations. By contrast, methods, such as fold
, sum
, and aggregate
are not transformer operations.
We have seen Builders and Combiners in the last lecture:
Builders are used in sequential collection methods:
trait Builder[T, Repr] {
def +=(elem: T): this.type
def result: Repr
}
trait Combiner[T, Repr] extends Builder[T, Repr] {
def combine(that: Combiner[T, Repr]): Combiner[T, Repr]
}
How can we implement the combine
method efficiently?
- when Repr is a set or a map, combine represents union
- when Repr is a sequence, combine represents concatenation
The combine operation must be efficient, i.e. execute in O(log n + log m)
time, where n and m are the sizes of two input combiners.
Question: Is the method combine efficient?
def combine(xs: Array[Int], ys: Array[Int]): Array[Int] = {
val r = new Array[Int](xs.length + ys.length)
Array.copy(xs, 0, r, 0, xs.length)
Array.copy(ys, 0, r, xs.length, ys.length)
r
}
- Yes.
- No.
Answer:
Arrays cannot be efficiently concatenated.
Typically, set data structures have efficient lookup, insertion and deletion.
- hash tables –
expected O(1)
- balanced trees –
O(log n)
- linked lists –
O(n)
Most set implementations do not have efficient union operation.
Operation complexity for sequences can vary.
- mutable linked lists –
O(1)
prepend and append,O(n)
insertion - functional (cons) lists –
O(1)
prepend operations, everything elseO(n)
- array lists – amortized
O(1)
append,O(1)
random accesss, otherwiseO(n)
Mutable linked list can have O(1)
concatenation, but for most sequences, concatenation is O(n)
.
Week 1
- Introduction to Parallel Computing
- Parallelism on the JVM
- Running Computations in Parallel
- Monte Carlo Method to Estimate Pi
- First Class Tasks
- How fast are parallel programs?
- Benchmarking Parallel Programs
Week 2
- Parallel Sorting
- Data Operations
- Parallel map()
- Parallel fold()
- Associativity I
- Associativity II
- Parallel Scan (Prefix Sum) Operation
Week 3
- Data Parallel Programming
- Data Parallel Operations
- Scala Parallel Collections
- Splitters and Combiners
Week 4