Query Optimization
Query Optimization
Query optimization in parallel databases is significantly more complex
than query optimization in sequential databases.
Cost models are more complicated, since we must take into account
partitioning costs and issues such as skew and resource contention.
When scheduling execution tree in parallel system, must decide:
How to parallelize each operation and how many processors to use for it.
What operations to pipeline, what operations to execute independently in
parallel, and what operations to execute sequentially, one after the other.
Determining the amount of resources to allocate for each operation is a
problem.
E.g., allocating more processors than optimal can result in high
communication overhead.
Long pipelines should be avoided as the final operation may wait a lot
for inputs, while holding precious resources
The number of parallel evaluation plans from which to choose from is
much larger than the number of sequential evaluation plans.
Therefore heuristics are needed while optimization
Two alternative heuristics for choosing parallel plans:
No pipelining and inter-operation pipelining; just parallelize every operation
across all processors.
Finding best plan is now much easier --- use standard optimization
technique, but with new cost model
Volcano parallel database popularize the exchange-operator model
– exchange operator is introduced into query plans to partition and
distribute tuples
– each operation works independently on local data on each
processor, in parallel with other copies of the operation
First choose most efficient sequential plan and then choose how best to
parallelize the operations in that plan.
Can explore pipelined parallelism as an option
Choosing a good physical organization (partitioning technique) is
important to speed up queries.
Comments
Post a Comment