Fragment-and-Replicate Join


Fragment-and-Replicate Join

􀂄 Partitioning not possible for some join conditions
􀃌 e.g., non-equijoin conditions, such as r.A > s.B.
􀂄 For joins were partitioning is not applicable, parallelization can
be accomplished by fragment and replicate technique
􀃌 Depicted on next slide
􀂄 Special case – asymmetric fragment-and-replicate:
􀃌 One of the relations, say r, is partitioned; any partitioning technique
can be used.
􀃌 The other relation, s, is replicated across all the processors.
􀃌 Processor Pi then locally computes the join of ri with all of s using
any join technique.

􀂄 General case: reduces the sizes of the relations at each
processor.
􀃌 r is partitioned into n partitions,r0, r1, ..., r n-1;s is partitioned into m
partitions, s0, s1, ..., sm-1.
􀃌 Any partitioning technique may be used.
􀃌 There must be at least m * n processors.
􀃌 Label the processors as
􀃌 P0,0, P0,1, ..., P0,m-1, P1,0, ..., Pn-1m-1.
􀃌 Pi,j computes the join of ri with sj. In order to do so, ri is replicated to
Pi,0, Pi,1, ..., Pi,m-1, while si is replicated to P0,i, P1,i, ..., Pn-1,i
􀃌 Any join technique can be used at each processor Pi,j.

􀂄 Both versions of fragment-and-replicate work with any join condition,
since every tuple in r can be tested with every tuple in s.
􀂄 Usually has a higher cost than partitioning, since one of the
relations (for asymmetric fragment-and-replicate) or both relations
(for general fragment-and-replicate) have to be replicated.
􀂄 Sometimes asymmetric fragment-and-replicate is preferable even
though partitioning could be used.
􀃌 E.g., say s is small and r is large, and already partitioned. It may be
cheaper to replicate s across all processors, rather than repartition r
and s on the join attributes.


Comments

Popular posts from this blog

Handling of Skew

USER INTERFACE DESIGN FOR ANNA UNIVERSITY SYLLABUS