Fragment-and-Replicate Join
Fragment-and-Replicate Join
Partitioning not possible for some join conditions
e.g., non-equijoin conditions, such as r.A > s.B.
For joins were partitioning is not applicable, parallelization can
be accomplished by fragment and replicate technique
Depicted on next slide
Special case – asymmetric fragment-and-replicate:
One of the relations, say r, is partitioned; any partitioning technique
can be used.
The other relation, s, is replicated across all the processors.
Processor Pi then locally computes the join of ri with all of s using
any join technique.
General case: reduces the sizes of the relations at each
processor.
r is partitioned into n partitions,r0, r1, ..., r n-1;s is partitioned into m
partitions, s0, s1, ..., sm-1.
Any partitioning technique may be used.
There must be at least m * n processors.
Label the processors as
P0,0, P0,1, ..., P0,m-1, P1,0, ..., Pn-1m-1.
Pi,j computes the join of ri with sj. In order to do so, ri is replicated to
Pi,0, Pi,1, ..., Pi,m-1, while si is replicated to P0,i, P1,i, ..., Pn-1,i
Any join technique can be used at each processor Pi,j.
Both versions of fragment-and-replicate work with any join condition,
since every tuple in r can be tested with every tuple in s.
Usually has a higher cost than partitioning, since one of the
relations (for asymmetric fragment-and-replicate) or both relations
(for general fragment-and-replicate) have to be replicated.
Sometimes asymmetric fragment-and-replicate is preferable even
though partitioning could be used.
E.g., say s is small and r is large, and already partitioned. It may be
cheaper to replicate s across all processors, rather than repartition r
and s on the join attributes.
Comments
Post a Comment