Query Processing

Query Processing Overview

Main weapons are:

  1. clever implementations techniques for operators

  2. exploiting ‘equivalencies’ of relational operators

  3. using cost models to choose among alternatives

Workflow

Selections

Schema for Examples

Sailors(S) Each tuple is 50 bytes long, 80 tuples per page, 500 pages

Reserves(R) Each tuple is 40 bytes long, 100 tuples per page, 1000 pages

Simple Selections

The best way to preform a selection depends on:

  1. available indexes/access paths

  2. expected size of the result (number of tuples and/or number of pages)

Estimate result size (reduction factor)

Size of result approximated as:

size_of_relation×(reduction_factors)size\_of\_relation \times \prod(reduction\_factors)

Reduction factor is usually called selectivity. It estimates what portion of the relation will qualify for the given predicate, i.e. satisfy the given condition.

This is estimated by the optimizer.

Alternatives for Simple Selections

  1. With no index, unsorted: We must scan the whole relation, i.e. perform Heap Scan.

    Cost = [R][R]

  2. With no index, but file is sorted:

    Cost = cost of binary search + [Rcondition][R_{condition}] = log2([R])+(RF×[R])\log_2([R]) + (RF\times [R])

  3. With an index on selection attribute:

    Use index to find qualifying data entries, then retrieve corresponding data records

    Cost depends on the number of qualifying tuples

    Clustering is important when calculating the total cost

    1. Clustered index Cost = ([I]+[R])×RF([I]+[R])\times RF

    2. Unclustered index Cost = ([I]+R))×RF([I]+|R|))\times RF

Untitled

General Selection Conditions

Typically queries have multiple predicates (conditions)

Example: day<8/9/94 AND rname='Paul' AND bid=5 AND sid=3

A B+ tree index matches (a combination of) predicates that involve only attributes in a prefix of the search key (upon which we build the index).

Index on <a, b, c> matches predicates on (a, b, c) (a, b) (a)

This implies that only reduction factors of the matching predicates(or primary conjuncts) that are part of the prefix will be used to determine the index cost.

Selection Approach

  1. Find the cheapest access path

    An index or file scan with the least estimated page I/O

  2. Retrieve tuples using it

    Predicates that match this index reduce the number of tuples retrieved (and impact the cost)

  3. Apply the predicates that don't match the index (if any) later on

    These predicates are used to discard some retrieved tuples, but do not affect number of tuples/pages fetched (nor the total cost)

    In this case selection over other predicates is said to be done “on-the-fly”

Example: day < 8/9/24 AND bid = 5 AND sid = 3

A B+ tree index on day can be used: RF=RF(day)RF = RF(day) . Then bid=5 and sid=3 must be checked for each retrieved tuple on the fly

Similarly, a hash index on <bid, sid> could be used RF=RF(bid)×RF(sid)\prod RF=RF(bid)\times RF(sid). Then, day<8/9/94 must be checked on the fly.

Projections

Issue with projection is removing duplicates

Projection can be done based on sorting or hashing. By sorting, the duplicates are adjacent to each other. By hashing, the duplicates fall on the same bucket.

Sorting

Basic approach is to use sorting

  1. Scan R, extract only the needed attributes

  2. Sort the result set(typically using external merge sort)

  3. Remove adjacent duplicates

Usually, we bring data from disk to memory to sort it. But what if the data is too large to fit in memory?

A Simple Two-way Merge Sort

When sorting a file, several sorted subfiles are typically generated in intermediate steps. Each sorted file is called a run.

In the first pass, the pages in the file are read in one at a time. After a page is read in, the records on it are sorted and the sorted page is written out.

In subsequent passes, pairs of runs from the output of the previous pass are read in and merged to produce runs that are twice as long.

If the number of pages in the input file is 2k2^k, for some kk, then:

Pass 00 produces 2k2^k sorted runs of one page each.

Pass 11 produces 2k12^{k-1} sorted runs of two pages each.

Pass 22 produces 2k22^{k-2} sorted runs of four pages each.

Pass kk produces 11 sorted run of 2k2^k pages.

In each pass, we read every page in the file, process it, and write it out. We have 2 I/Os per page, per pass. The overall cost is 2N(log2N+1)2N(\lceil \log_{2}^{N}\rceil+1) I/Os.

Only three buffer pages in the memory is needed (Two inputs and one(most cases, but not necessarily) output). In order to merge 2 sorted runs, we only need two buffer pages. We compare the records respectively. Once one page of one sorted run is run out, we can load a new page. Therefore, to merge 2 sorted runs, only 2 buffer pages are needed. Beyond 2 buffer pages, 1 output page is needed.

External Merge Sort

Two optimizations as opposed to Two-way merge sort:

  • In the initial “conquer pass”, we load B pages and sort them at a time.

  • Merge more than 2 pages at a time. Because we need a output buffer page, we can merge B1B-1 pages at a time.

The number of I/Os needed is 2N×(1+logB1NB)2N\times(1+\lceil \log _{B-1}^{\frac{N}{B}}\rceil).

Sort runs: Make each BB pages sorted (called runs)

Merge runs: Make multiple passes to merge runs

  • Pass 2: Produce runs of length B(B1)B(B-1) pages

  • Pass 3: Produce runs of length B(B1)2B(B-1)^2 pages

  • Pass P: Produce runs of length B(B1)PB(B-1)^P pages

Projection with external sort:

  1. Scan R, extract only the needed attributes

  2. Sort the result set using EXTERNAL SORT

  3. Remove adjacent duplicates

WriteProjectedPages = [R]×PF[R] \times PF

PF: Projection Factor says how much are we projecting, ratio with respect to all attributes ( e.g. keeping 14\frac{1}{4} of attributes, or 10%10\% of all attributes )

Sorting Cost = 2×2\times NumPasses×\timesReadProjectedPages

Hashing

  1. Scan RR, extract only the needed attributes

  2. Hash data into buckets

    Apply hash function h1h_1 to choose one of B1B-1 output buffers

  3. Remove adjacent duplicates from a bucket

    2 tuples from different partitions guaranteed to be distinct

Projection based on External Hashing

Partition data into B1B-1 partitions with h1h_1 hash function

Load each partition, hash it with another hash function(h2h_2) and eliminate duplicates

  1. Partitioning phase

    • Read RR using one input buffer

    • For each tuple:

      Discard unwanted fields

      Apply hash function h1h_1 to choose one of B1B-1 output buffers

    • Result is B1B-1 partitions (of tuples with no unwanted fields)

      2 tuples from different partitions guaranteed to be distinct

  2. Duplicate elimination phase

    • For each partition

      Read it and build an in-memory hash table

      Using hash function h2h_2 on all fields

      While discarding duplicates

    • If partition does not fit in memory

      Apply hash-based projection algorithm recursively to this partition

Joins

To find matches between two lists of records, it is important to sort them first. Otherwise, the process can be fairly inefficient.

Join techniques we will cover:

  • Nested-loops join

  • Sorted-merge join

  • Hash join

Cross product is very expensive. So if you want to perform a condition join, do R×SR \times S followed by a selection is inefficient.

Consider two inputs of tables:

The left input is called the outer input and right input is called the inner input.

Have nothing to do with inner/ outer joins!

Join is associative and communicative

Simple Nested Loops Join

For each tuple in the outer relation R, we scan the entire inner relation S.

Pseudo Code:

Cost = [Outer]+Outer×[Inner][Outer]+|Outer|\times [Inner]

Page-Oriented Nested Loops Join

For each page of R, get each page of S. Write out matching pairs of tuples <r,s><r,s>, where rr is in R page and S is in S-page.

Pseudo code:

Cost = [Outer]+[Outer]×[Inner][Outer]+[Outer]\times [Inner]

Block Nested Loops Join

Page-oriented NL doesn’t exploit extra memory buffers

Use one page as an input buffer for scanning the inner S, one page as the output buffer, and use the remaining pages to hold ‘block’ of outer R.

For each matching tuple rr in R-block, ss in S-page, add <r,s><r,s> to result. Then read next R-block, scan S, etc.

Cost = [Outer]+#Blocks(Outer)×[Inner][Outer]+\# Blocks(Outer)\times [Inner]

#Blocks(Outer)=[Outer]B2\#Blocks(Outer)=\lceil\frac{[Outer]}{B-2}\rceil

Index Nested Loop Join

Cost = [R]+R×(costtolookupmatchingrecordsinS)[R]+|R|\times(cost\,to\,look\,up\,matching\,records\,in\,S)

Sort-Merge Join

It's cumbersom to handle the identical keys in RR and SS.

Average cost = Sort(R)+Sort(S)+[R]+[S]Sort(R)+Sort(S)+[R]+[S]

Worst case cost = Sort(R)+Sort(S)+R×[S]Sort(R)+Sort(S)+|R|\times[S]

Sort(R)=2×#Passes×[R]Sort(R)=2\times \#Passes\times [R]

Hash join

Requires equality predicate.

Naive Hash Join

construct a hash table with a size of B2B-2 pages for RR. Look up records of SS in the hash table. The cost is [R]+[S][R]+[S]. The problem is that RR needs to be fit in the memory.

Grace Hash Join

Two phases:

  • Partition (divide)

  • Build and Probe (conquer)

Partition both relations using hash function h1h_1. RR tuples in partition II will only match SS tuples in partition II. If the partition doesn't fit in the memory, we recursively partiton it. Write the partition into the disk.

Read in a partition of tuples of RR and build a hash table using hash function h2h_2. Scan matching partition of S, probe hash table for matches.

In Partition phase, we read+write both relations.

In Build and Probe phase, we read both relations.

Cost = 3×[R]+3×[S]3\times [R]+3\times [S]

Code Example for Grace Hash Join

General Join Conditions

Equalities over serveral attributes

For inequality conditions, hash join and sort merge join is not applicable. Block NL quite likely to be the best join method here.

Last updated