Parallel Sort
Parallel Sort
Range-Partitioning Sort
Choose processors P0, ..., Pm, where m ≤ n -1 to do sorting.
Create range-partition vector with m entries, on the sorting attributes
Redistribute the relation using range partitioning
all tuples that lie in the ith range are sent to processor Pi
Pi stores the tuples it received temporarily on disk Di.
This step requires I/O and communication overhead.
Each processor Pi sorts its partition of the relation locally.
Each processors executes same operation (sort) in parallel with other
processors, without any interaction with the others (data parallelism).
Final merge operation is trivial: range-partitioning ensures that, for 1 j
m, the key values in processor Pi are all less than the key values in Pj.
Parallel Sort (Cont.)
Parallel External Sort-Merge
Assume the relation has already been partitioned among disks
D0, ..., Dn-1 (in whatever manner).
Each processor Pi locally sorts the data on disk Di.
The sorted runs on each processor are then merged to get the
final sorted output.
Parallelize the merging of sorted runs as follows:
The sorted partitions at each processor Pi are range-partitioned
across the processors P0, ..., Pm-1.
Each processor Pi performs a merge on the streams as they are
received, to get a single sorted run.
The sorted runs on processors P0,..., Pm-1 are concatenated to get
the final result.
Comments
Post a Comment