Query Processing
Query Processing Overview
Main weapons are:
clever implementations techniques for operators
exploiting ‘equivalencies’ of relational operators
using cost models to choose among alternatives
Workflow

Selections
Schema for Examples
Sailors(sid: integer, sname: string, rating: integer, age: real)
Reserves(sid: integer, bid: integer, day: dates, rname: string)
Sailors(S) Each tuple is 50 bytes long, 80 tuples per page, 500 pages
Reserves(R) Each tuple is 40 bytes long, 100 tuples per page, 1000 pages
Simple Selections
The best way to preform a selection depends on:
available indexes/access paths
expected size of the result (number of tuples and/or number of pages)
Estimate result size (reduction factor)
Size of result approximated as:
Reduction factor is usually called selectivity. It estimates what portion of the relation will qualify for the given predicate, i.e. satisfy the given condition.
This is estimated by the optimizer.
Alternatives for Simple Selections
With no index, unsorted: We must scan the whole relation, i.e. perform Heap Scan.
Cost =
With no index, but file is sorted:
Cost = cost of binary search + =
With an index on selection attribute:
Use index to find qualifying data entries, then retrieve corresponding data records
Cost depends on the number of qualifying tuples
Clustering is important when calculating the total cost
Clustered index Cost =
Unclustered index Cost =

General Selection Conditions
Typically queries have multiple predicates (conditions)
Example: day<8/9/94 AND rname='Paul' AND bid=5 AND sid=3
A B+ tree index matches (a combination of) predicates that involve only attributes in a prefix of the search key (upon which we build the index).
Index on <a, b, c>
matches predicates on (a, b, c)
(a, b)
(a)
This implies that only reduction factors of the matching predicates(or primary conjuncts) that are part of the prefix will be used to determine the index cost.
Selection Approach
Find the cheapest access path
An index or file scan with the least estimated page I/O
Retrieve tuples using it
Predicates that match this index reduce the number of tuples retrieved (and impact the cost)
Apply the predicates that don't match the index (if any) later on
These predicates are used to discard some retrieved tuples, but do not affect number of tuples/pages fetched (nor the total cost)
In this case selection over other predicates is said to be done “on-the-fly”
Example: day < 8/9/24 AND bid = 5 AND sid = 3
A B+ tree index on day can be used: . Then bid=5
and sid=3
must be checked for each retrieved tuple on the fly
Similarly, a hash index on <bid, sid> could be used . Then, day<8/9/94
must be checked on the fly.
Projections
Issue with projection is removing duplicates
SELECT DISTINCT R.sid, R.bid
FROM Reserves R;
Projection can be done based on sorting or hashing. By sorting, the duplicates are adjacent to each other. By hashing, the duplicates fall on the same bucket.
Sorting
Basic approach is to use sorting
Scan R, extract only the needed attributes
Sort the result set(typically using external merge sort)
Remove adjacent duplicates
Usually, we bring data from disk to memory to sort it. But what if the data is too large to fit in memory?
A Simple Two-way Merge Sort
When sorting a file, several sorted subfiles are typically generated in intermediate steps. Each sorted file is called a run.
In the first pass, the pages in the file are read in one at a time. After a page is read in, the records on it are sorted and the sorted page is written out.
In subsequent passes, pairs of runs from the output of the previous pass are read in and merged to produce runs that are twice as long.
If the number of pages in the input file is , for some , then:
Pass produces sorted runs of one page each.
Pass produces sorted runs of two pages each.
Pass produces sorted runs of four pages each.
…
Pass produces sorted run of pages.
In each pass, we read every page in the file, process it, and write it out. We have 2 I/Os per page, per pass. The overall cost is I/Os.
Only three buffer pages in the memory is needed (Two inputs and one(most cases, but not necessarily) output). In order to merge 2 sorted runs, we only need two buffer pages. We compare the records respectively. Once one page of one sorted run is run out, we can load a new page. Therefore, to merge 2 sorted runs, only 2 buffer pages are needed. Beyond 2 buffer pages, 1 output page is needed.
External Merge Sort
Two optimizations as opposed to Two-way merge sort:
In the initial “conquer pass”, we load B pages and sort them at a time.
Merge more than 2 pages at a time. Because we need a output buffer page, we can merge pages at a time.
The number of I/Os needed is .
Sort runs: Make each pages sorted (called runs)
Merge runs: Make multiple passes to merge runs
Pass 2: Produce runs of length pages
Pass 3: Produce runs of length pages
…
Pass P: Produce runs of length pages

Projection with external sort:
Scan R, extract only the needed attributes
Sort the result set using EXTERNAL SORT
Remove adjacent duplicates

WriteProjectedPages =
PF: Projection Factor says how much are we projecting, ratio with respect to all attributes ( e.g. keeping of attributes, or of all attributes )
Sorting Cost = NumPassesReadProjectedPages
Hashing
Scan , extract only the needed attributes
Hash data into buckets
Apply hash function to choose one of output buffers
Remove adjacent duplicates from a bucket
2 tuples from different partitions guaranteed to be distinct

Projection based on External Hashing
Partition data into partitions with hash function
Load each partition, hash it with another hash function() and eliminate duplicates

Partitioning phase
Read using one input buffer
For each tuple:
Discard unwanted fields
Apply hash function to choose one of output buffers
Result is partitions (of tuples with no unwanted fields)
2 tuples from different partitions guaranteed to be distinct
Duplicate elimination phase
For each partition
Read it and build an in-memory hash table
Using hash function on all fields
While discarding duplicates
If partition does not fit in memory
Apply hash-based projection algorithm recursively to this partition
Joins
To find matches between two lists of records, it is important to sort them first. Otherwise, the process can be fairly inefficient.
Table A Table B
ID ID
1 7
2 3
7 8
3 2
9 9
Join techniques we will cover:
Nested-loops join
Sorted-merge join
Hash join
Cross product is very expensive. So if you want to perform a condition join, do followed by a selection is inefficient.
Consider two inputs of tables:
The left input is called the outer input and right input is called the inner input.

Have nothing to do with inner/ outer joins!
Join is associative and communicative
Simple Nested Loops Join
For each tuple in the outer relation R, we scan the entire inner relation S.
Pseudo Code:
foreach tuple in R do
foreach tuple s in S do
if ri == sj then add<r, s> to result
Cost =
Page-Oriented Nested Loops Join
For each page of R, get each page of S. Write out matching pairs of tuples , where is in R page and S is in S-page.
Pseudo code:
foreach page bR in R do
foreach page bS in S do
foreach tuple r in bR do
foreach tuple in bS do
if ri == sj then add<r, s> to result
Cost =
Block Nested Loops Join
Page-oriented NL doesn’t exploit extra memory buffers
Use one page as an input buffer for scanning the inner S, one page as the output buffer, and use the remaining pages to hold ‘block’ of outer R.
For each matching tuple in R-block, in S-page, add to result. Then read next R-block, scan S, etc.

Cost =
Index Nested Loop Join
foreach record ri in R:
foreach record sj in S where condi(ri,sj) == true:
yield <ri, sj>
Cost =
Sort-Merge Join
It's cumbersom to handle the identical keys in and .
function sort_merge_join():
sort(R)
sort(S)
p := first tuple of R
q := first tuple of S
result := []
while !(either p or q reaches the end of its relation):
if R[p].join_attribute < S[q].join_attribute:
increment(p)
else if R[p].join_attribute > S[q].join_attribute:
increment(q)
else (R[p].join_attribute == S[q].join_attribute):
mark(q)
while R[p].join_attribute == S[q].join_attribute:
result.add(join(R[p], S[q]))
increment(q)
restore(q)
increment(p)
return result
Average cost =
Worst case cost =
Hash join
Requires equality predicate.
Naive Hash Join
construct a hash table with a size of pages for . Look up records of in the hash table. The cost is . The problem is that needs to be fit in the memory.
Grace Hash Join
Two phases:
Partition (divide)
Build and Probe (conquer)
Partition both relations using hash function . tuples in partition will only match tuples in partition . If the partition doesn't fit in the memory, we recursively partiton it. Write the partition into the disk.
Read in a partition of tuples of and build a hash table using hash function . Scan matching partition of S, probe hash table for matches.
In Partition phase, we read+write both relations.
In Build and Probe phase, we read both relations.
Cost =
Code Example for Grace Hash Join
#include <functional>
#include <unordered_map>
#include <vector>
using Record = std::pair<uint32_t, std::string>;
using Table = std::vector<Record>;
// Hash function for partitioning input tables
struct Hash1 {
size_t operator()(const Record& r) const {
// TODO: Implement your own hash function here
// Use the join key value of the record to compute its hash value
// Return a size_t value that represents the hash value
}
};
// Hash function for constructing the hash table
struct Hash2 {
size_t operator()(const Record& r) const {
// TODO: Implement your own hash function here
// Use the join key value of the record to compute its hash value
// Return a size_t value that represents the hash value
}
};
// Recursive partitioning function
void partition(const Table& table, std::vector<Table>& partitions, size_t depth) {
// TODO: Implement the recursive partitioning function here
// Use the Hash1 hash function to determine which partition each record belongs to
// Continue partitioning until the desired depth is reached
}
// Grace hash join function
Table grace_hash_join(const Table& table1, const Table& table2) {
// Partition the two input tables using the Hash1 hash function
std::vector<Table> partitions1(depth);
partition(table1, partitions1, depth);
std::vector<Table> partitions2(depth);
partition(table2, partitions2, depth);
// Construct the hash table using the Hash2 hash function
std::unordered_map<uint32_t, std::vector<std::string>, Hash2> hash_table;
for (const auto& record : table2) {
auto& bucket = hash_table[record.first];
bucket.push_back(record.second);
}
// Perform the join operation using the partitioned tables and hash table
Table result;
for (const auto& partition : partitions1) {
for (const auto& record1 : partition) {
auto it = hash_table.find(record1.first);
if (it != hash_table.end()) {
for (const auto& record2 : it->second) {
result.emplace_back(record1.first, record1.second + record2);
}
}
}
}
return result;
}
General Join Conditions
Equalities over serveral attributes
For inequality conditions, hash join and sort merge join is not applicable. Block NL quite likely to be the best join method here.
Last updated