-
For SQL use PostgreSQL, it’s the most loved relational database (StackOverflow survey 2018) and it’s a multi-model database (K/V store, Document store (use jsonb), foreign data wrapper, and much more). Any questions?
-
Don’t use BETWEEN (why)
-
SET search_path=pg_catalog
to force to explicitely specify schema names in every object declaration (besides triggers). This will lower bugs and gives better understanding to developers because (explicit > implicit.
-
singular (e.g.
team
notteams
) (Here is why) -
snake_case (e.g.
block_theme_version
) -
double underscore for
n-n
tables (e.g.user__organization
)
-
snake_case (for example:
created_at
. NotcreatedAt
orCreatedAt
) Because in PostgreSQL keywords and unquoted identifiers are case insensitive. -
double underscore for PK and FK columns (e.g. (PK)
user__id
, (PK)user__id
, (FK)organization__id
, (FK)organization__id
) -
NOT NULL
by default, NULL is the exception (think of it as the maybe Monad) -
No abbreviation unless it’s both well-known and very long like
i18n
-
No reserved keywords (Complete list)
-
Use UUID as PK and FK (Where is why), (do not use
serial
) rely ongen_random_uuid()
(benchmark) -
Use
timestamptz
everywhere you need to store a date (e.g.created_at TIMESTAMPTZ DEFAULT now()
(Here is why)) -
updated_at TIMESTAMPTZ DEFAULT now()
unless you plan to leverage event-sourcing -
deleted_at TIMESTAMPTZ DEFAULT NULL
:-
unless you plan to leverage event-sourcing
-
don’t forget to
deleted_at
-
-
Comment each column, explain your rational, explain your decisions, should be in plain english
-
Boolean columns must start with either
is
orhas
.
General rule is: {tablename}_{columnname(s)}_{suffix}
(e.g. table_name_column_name_a__pkey
) where the suffix is one of the
following:
-
Primary Key constraint:
pk
-
Foreign key:
fk
-
Unique constraint:
key
-
Check constraint:
chk
-
Exclusion constraint:
exl
-
Any other kind of index:
idx
-
table_name_column_name_pk
in case of a single column PK -
table_name_column_name1_column_name2_column_name3_pk
in case of multiple columns as primary key (column_name1
,column_name2
,column_name3
)
-
table_name_from_column_table_name_to_column__fk
-
Always specify
ON DELETE
ON UPDATE
in order to force you to think about reference consequences
They are 3 types of functions, notify
functions and private
functions and public
functions
-
notify, format: notify[schema_name][table_name][event] (e.g.
notify_authentication_user_created(user_id)
): should only format the notification message underneath and use pg_notify. Beware of the 8000 characters limit, only send metadata (ids), data should be asked by workers through the API. If you really wish to send data then pg_kafka might be a better alternative. -
private, format: _[
_function_name_
] (e.g._reset_failed_login
): must never be exposed through the public schema. Used mainly for consistency and business-rules -
public, format [
_function_name_
] (e.g.log_in(email, password)
): must be exposed through the public schema.
Don’t use enums, you will have issue over time because you cannot remove element from an enum.
Always use true
and false
, without single-quote.
PostgreSQL documentation says that TRUE
and FALSE
should be prefered because they are more SQL compliant but hey, LET’S STOP YELLING WHEN WE WRITE SQL SHALL WE?
-
utiliser BNCF (au dessus de la 3NF) (cf normal form)
-
leverage
using
, so instead of:
select <fields> from
table_1
inner join table_2
on table_1.table_1_id =
table_2.table_1_id
use:
select <fields> from
table_1
inner join table_2
using (table_1_id)
-
don’t use PostgreSQL enums you will have issues when you need to remove some values over time. Use a dedicated table instead.
-
use the right PostgreSQL types:
inet (IP address) timestamp with time zone point (2D point) tstzrange (time range) interval (duration)
-
prefer
jsonb
tojson
and sql arrays -
constraint should be inside your database as much as possible:
create table reservation(
reservation_id uuid primary key,
dates tstzrange not null,
exclude using gist (dates with &&)
);
-
use row-level-security to ensure R/U/D access on each table rows
(source)
docker run --rm --network=none guriandoro/sqlparse:0.3.1 "SELECT several, columns from a_table as a join another_table as b where a.id = 1;"
Since we do want to limit everything in space and time, configure statement_timeout
on role to let your database abort any statement that takes more than the specified amount of time (in ms).
-- Limit in time SQL queries => improve overall reliability
-- https://www.postgresql.org/docs/current/runtime-config-client.html
-- PostgreSQL WILL ABORT any statement that takes more than the specified amount of time (in milliseconds)
-- If you do have an issue with that, please first (from first to last):
-- - .. check that your query is relying on indices (did you use EXPLAIN (ANALYZE, BUFFERS) ?)
-- - .. consider materialized views
-- - .. ensure pg cache settings are OK
-- - .. ensure the disk is SSD and fast enough
-- - .. ensure the server has enough CPU & RAM
-- - .. check if its for analytics purposes, if so then requesting a postgres replica might be a better idea
-- When all these above points were evaluated *then* we can all talk about increasing the values below :)
alter role APP_ROLE_THAT_DOES_THE_QUERY set statement_timeout to '250ms';
Your cache hit ratio tells you how often your data is served from in memory vs. having to go to disk. Serving from memory vs. going to disk will be orders of magnitude faster, thus the more you can keep in memory the better. Of course you could provision an instance with as much memory as you have data, but you don’t necessarily have to. Instead watching your cache hit ratio and ensuring it is at 99% is a good metric for proper performance. (Source)
SELECT
sum(heap_blks_read) as heap_read,
sum(heap_blks_hit) as heap_hit,
sum(heap_blks_hit) / (sum(heap_blks_hit) + sum(heap_blks_read)) as ratio
FROM
pg_statio_user_tables;
Under the covers Postgres is essentially a giant append only log. When you write data it appends to the log, when you update data it marks the old record as invalid and writes a new one, when you delete data it just marks it invalid. Later Postgres comes through and vacuums those dead records (also known as tuples). All those unvacuumed dead tuples are what is known as bloat. Bloat can slow down other writes and create other issues. Paying attention to your bloat and when it is getting out of hand can be key for tuning vacuum on your database. (Source)
WITH constants AS (
SELECT current_setting('block_size')::numeric AS bs, 23 AS hdr, 4 AS ma
), bloat_info AS (
SELECT
ma,bs,schemaname,tablename,
(datawidth+(hdr+ma-(case when hdr%ma=0 THEN ma ELSE hdr%ma END)))::numeric AS datahdr,
(maxfracsum*(nullhdr+ma-(case when nullhdr%ma=0 THEN ma ELSE nullhdr%ma END))) AS nullhdr2
FROM (
SELECT
schemaname, tablename, hdr, ma, bs,
SUM((1-null_frac)*avg_width) AS datawidth,
MAX(null_frac) AS maxfracsum,
hdr+(
SELECT 1+count(*)/8
FROM pg_stats s2
WHERE null_frac<>0 AND s2.schemaname = s.schemaname AND s2.tablename = s.tablename
) AS nullhdr
FROM pg_stats s, constants
GROUP BY 1,2,3,4,5
) AS foo
), table_bloat AS (
SELECT
schemaname, tablename, cc.relpages, bs,
CEIL((cc.reltuples*((datahdr+ma-
(CASE WHEN datahdr%ma=0 THEN ma ELSE datahdr%ma END))+nullhdr2+4))/(bs-20::float)) AS otta
FROM bloat_info
JOIN pg_class cc ON cc.relname = bloat_info.tablename
JOIN pg_namespace nn ON cc.relnamespace = nn.oid AND nn.nspname = bloat_info.schemaname AND nn.nspname <> 'information_schema'
), index_bloat AS (
SELECT
schemaname, tablename, bs,
COALESCE(c2.relname,'?') AS iname, COALESCE(c2.reltuples,0) AS ituples, COALESCE(c2.relpages,0) AS ipages,
COALESCE(CEIL((c2.reltuples*(datahdr-12))/(bs-20::float)),0) AS iotta -- very rough approximation, assumes all cols
FROM bloat_info
JOIN pg_class cc ON cc.relname = bloat_info.tablename
JOIN pg_namespace nn ON cc.relnamespace = nn.oid AND nn.nspname = bloat_info.schemaname AND nn.nspname <> 'information_schema'
JOIN pg_index i ON indrelid = cc.oid
JOIN pg_class c2 ON c2.oid = i.indexrelid
)
SELECT
type, schemaname, object_name, bloat, pg_size_pretty(raw_waste) as waste
FROM
(SELECT
'table' as type,
schemaname,
tablename as object_name,
ROUND(CASE WHEN otta=0 THEN 0.0 ELSE table_bloat.relpages/otta::numeric END,1) AS bloat,
CASE WHEN relpages < otta THEN '0' ELSE (bs*(table_bloat.relpages-otta)::bigint)::bigint END AS raw_waste
FROM
table_bloat
UNION
SELECT
'index' as type,
schemaname,
tablename || '::' || iname as object_name,
ROUND(CASE WHEN iotta=0 OR ipages=0 THEN 0.0 ELSE ipages/iotta::numeric END,1) AS bloat,
CASE WHEN ipages < iotta THEN '0' ELSE (bs*(ipages-iotta))::bigint END AS raw_waste
FROM
index_bloat) bloat_summary
ORDER BY raw_waste DESC, bloat DESC
Postgres makes it simply to query for unused indexes so you can easily give yourself back some performance by removing them (Source)
SELECT
schemaname || '.' || relname AS table,
indexrelname AS index,
pg_size_pretty(pg_relation_size(i.indexrelid)) AS index_size,
idx_scan as index_scans
FROM pg_stat_user_indexes ui
JOIN pg_index i ON ui.indexrelid = i.indexrelid
WHERE NOT indisunique AND idx_scan < 50 AND pg_relation_size(relid) > 5 * 8192
ORDER BY pg_relation_size(i.indexrelid) / nullif(idx_scan, 0) DESC NULLS FIRST,
pg_relation_size(i.indexrelid) DESC;
pg_stat_statements is useful for monitoring your database query performance. It records a lot of valuable stats about which queries are run, how fast they return, how many times their run, etc. Checking in on this set of queries regularly can tell you where is best to add indexes or optimize your application so your query calls may not be so excessive. (Source)
SELECT query,
calls,
total_time,
total_time / calls as time_per,
stddev_time,
rows,
rows / calls as rows_per,
100.0 * shared_blks_hit / nullif(shared_blks_hit + shared_blks_read, 0) AS hit_percent
FROM pg_stat_statements
WHERE query not similar to '%pg_%'
and calls > 500
--ORDER BY calls
--ORDER BY total_time
order by time_per
--ORDER BY rows_per
DESC LIMIT 20;
-
Google Cloud PostgreSQL
-
Pros
-
Cons
-
No support for plv8
-
Any features that require
superuser
privileges are not supported -
postgres
role is not asuperuser
-
Can create roles
-
Can not select from tables that are restricted by default like
pg_shadow
-
Thus can not edit
pg_catalog.pg_class
(in order to change row level security activation for example) -
Can read from all necessary tables other than
pg_authid
-
-
-
-
Scaleway Managed PostgreSQL:
-
Pros
-
multi-schema support
-
configuration options are editable
-
user/role management is self-service
-
-
Cons
-
/
-
-
-
OVH Cloud SQL
-
Pros
-
/
-
-
Cons
-
no multi-schema support
-
-
== Where is this convention used?
-
@Netwo
-
@OuestFrance
-
@MotionDynamic_
-
@Oxmoto
-
@iAdvize
-
@Bringr
-
@Redsmin