/Beyond-LeetCode-SQL

Analysis of SQL Leetcode and classic interview questions. Common pitfalls, anti-patterns and handy tricks are discussed. Sample databases are provided.

Primary LanguageTSQLMIT LicenseMIT

Beyond LeetCode SQL

This repository covers supplementary analysis of SQL for LeetCode and classic interview questions, tradeoff between performance optimization and developmental efficiency, and how it relates to general database design consideration (e.g. indexing and join). Specific sample databases are provided to illustrate tricky interview questions.


LeetCode Selected Probems

Only high-quality problems are selected. Pathological problems such as Find Median Given Frequency of Numbers are not discussed. Entry-level syntax problems such as Combine Two Tables are not discussed.

# Problems Solutions Level Concept
262 Trips and Users MySQL Hard Three-way join; filtering
185 Department Top Three Salaries MySQL, MS SQL Hard Non-equijoin; aggregation; window functionsample
579 Cumulative Salary of Employee MySQL, MS SQL Hard Self-join; left join; aggregation
601 Human Traffic of Stadium MySQL, MS SQL Hard Self-join; de-duplication; window
615 Average Salary MySQL Hard Case; aggregation, join
618 Students Report By Geography MySQL, MS SQL Hard Full join, pivoting

Other undiscussed problems have solutions lumped here.


Classic Interview Questions

This section covers commonly tested concepts during interviews. Many notebooks are inspired by problems people who violated their confidentiality agreement and posted interview materials on Quora, Glassdoor, and 1point3acres. Data are either synthetic or from my personal challenge project.

# Problems Solutions Concept
1 Facebook Advertiser Status MySQL Transition diagram; conditional update
2 Spotify Listening History MySQL Update cumulative sum
3 Monthly Active User MySQL Functional dependency; aggregation; filtering
4 Page Recommendation MySQL Undirected edge; aggregation; existance
5 Pivoting Numeric Data MySQL Pivoting numeric data with case statement
6 Pivoting Text Data MySQL Pivoting text data with self-join
7 Un-pivoting Tables MySQL Un-pivoting tables using cross-join
8 Group by Bins MySQL Create custom column to group by
9 Consecutive Active Users MySQL Self-join, LAG()
10 Recommend Friends MySQL Self-join, de-duplication, aggregation
11 Spotify Similar Friends MySQL Three-way join, de-duplication, aggregation
12 Invalid Search MySQL NULL handling, rate calculation
13 Text Confirmation MySQL Rate calculation
14 Facebook Common Friends MySQL self join, three-way join
15 Facebook Recommend Friend MySQL Self join, four-way join
16 Instagram Common Follower MySQL Self join, Directed edge

Hacks

This section covers esoteric details of SQL language and use cases that may be completely useless in interview. Nevertheless, they come handy when judgement calls and some are simply fun to explore.

# Concept Notebook
1 Random Sampling from Groups MySQL8
2 NULL Pathological Study MySQL8
3 Full Join MySQL8
4 Dynamic Query (Python) MySQL8
5 Stored Procedure MySQL8
6 Hacking Aggregation MySQL8
6 Multi Column Partition MySQL8

Table Optimization

Avoid full-table scan. Used for primary key (automatic), foreign key, commonly used columns.

  • cardinality: the uniqueness of the data.

Index types:

  • single column index
  • unique index (e.g. SSN)
    • cannot create unique index on column with duplicate or NULL values
  • composite index
    • column that get's most recently queried gets placed first
  • implicit index: primary key (automatically unique)

Create index when column:

  • is frequently referneced in order by or group by
  • contains lots of unique values

Avoid index when column:

  • in small table
  • return high percentage of matching data
  • require frequent batch update (drop before updating)
  • contains lots of NULL
  • gets frequently manipulated
  • extremely long string

best practice: rebuild index frequently to reduce fragmentation


Query Optimization

alt-text

  • Place smaller table first when joining multiple tables
  • Largest table is the base table
    • base table is placed on right hand side of equal sign (where clause)
  • Place most restrictive condition last:
    • The condition in the WHERE clause of a statement that returns the fewest rows of data
    • the most restrictive condition was listed last in the WHERE clause,
  • try to use indexed column
FROM TABLE1,   -- Smallest table
     TABLE2,   -- to
     TABLE3    -- Largest table, also base table
WHERE TABLE1.COLUMN = TABLE3.COLUMN    -- Join condition
  AND TABLE2.COLUMN = TABLE3.COLUMN    -- Join condition
[ AND CONDITION1 ]                     -- Filter condition
[ AND CONDITION2 ]                     -- Filter condition
  • Using the like operator and wildcards (flexible search)
  • Avoiding the or operator, use in operator
    • data retrieval is measurably faster by replac- ing OR conditions with the IN predicate
  • Avoiding the HAVING clause
    • try to frame the restriction earlier (where clause)
    • try to keep HAVING clause simple (use constant, not function)
  • avoiding large sort operations
    • it is best to schedule queries with large sorts as periodic batch processes during off-peak database usage so that the performance of most user processes is not affected.
  • Prefer stored procedure
    • compiled and permanently stored in the database in an executable format.
  • Disabling indexes during batch loads
    • When the batch load is complete, you should rebuild the indexes.
    • reduction of fragmentation that is found in the index
  • cost-based optimization: check database server manual
  • Using view: keep the levels of code in your query as flat as possible and to test and tune the statements that make up your views

Formatting for Readability

  • Always begin a new line with each clause in the statement. For example, place the FROM clause on a separate line from the SELECT clause. Then place the WHERE clause on a separate line from the FROM clause, and so on.

  • Use tabs or spaces for indentation when arguments of a clause in the statement exceed one line.

  • Use tabs and spaces consistently.

  • Use table aliases when multiple tables are used in the statement. The use of the full table name to qualify each column in the state- ment quickly clutters the statement and makes reading it difficult.

  • Use remarks sparingly in SQL statements if they are available with- in your specific implementation. Remarks are great for documenta- tion, but too many of them clutter a statement.

  • Begin a new line with each column name in the SELECT clause if many columns are being selected.

  • Begin a new line with each table name in the FROM clause if many tables are being used.

  • Begin a new line with each condition of the WHERE clause. You can easily see all conditions of the statement and the order in which they are used.


Anti-patterns

This section WILL discusses common pitfalls such as nested selects, redundant temporary tables, unnecessary cross join, unnecessary hashset using distinct key word.

# Anti-patterns Notebook
1 Ambiguous Group MySQL8
2 Bad Subquery MySQL8
3 Fail to Use Index MySQL8