SQL performance tuning and query optimisation

1. Figure out what and why you need to optimise.

2. Specify the fields that are needed instead of *

Instead of

select 
*
from table_name

Use

select 
col_1,col_2...
from table_name


3. If you are using cross join and where filter, instead use inner join.

Instead of

select 
*
from table_1 a,table_2 b
where a.id=b.id

Use

select 
col_1,col_2...
from table_1 a
inner join table_2 b on a.id=b.id


4. Create indexes to run queries with filters and aggregations faster.

If few set of columns are used regularly as filter or in join columns then create index in them.

create index index_name
on table_name (col_1,col2..)

Note that using indexes on production write tables might hamper you write response time.

5. When creating complex queries always try and modularise your code into multiple CTE.

This would help you optimise individual logical parts and also narrow down the issues quickly when debugging.

with sample_cte as 
(
    select 
    col_1,col2,sum(value) as sum_value
    from test_table
    group by 1,2
)
,
sample_cte_2 as 
(
    select 
    col_1,col2,sum(value) as sum_value
    from test_table
    group by 1,2
)
select 
*
from sample_cte a
inner join sample_cte_2 b on a.col_1=b.col_2

6. Avoid using computation heavy view which are run frequently and turn them into a table.

7. Make the very primary table without any duplicates, so that you never have to use select distinct ... from .... or select col_1,col_2 ... from .... group by 1,2 type of statements.

8. Avoid joins based on string columns rather use integer columns to join e.g. id columns.

9. If you joining based on datetime or date columns, store another column as the unix_epoch of the datetime/date column.

select 
date_part('epoch',current_date) 
;
1680134400


10. Use EXPLAIN ANALYZE to time and analyse various version of you code.

11. If data allows, set you tables to incremental update rather than truncate and load.

12. If odering does not matter, don't use oder by clause.

13. Maintain and update documentation of tables, views, procedure and data pipelines.

14. Do regular catch up with peers and have a healthy knowledge sharing environment where everyone knows a common most optimised way of quering a particular data point.