On programming and stuff

Python in Production (II) - Postgresql

| Comments


Database tuning is often considered unnecessary, and many people leave it for the very end of development or completely skip that part. I'll be covering logging, backups, indexing and orm role, in order to give you some insight into different database related tasks.


The thing that many people don't get right when it comes to configuring postgresql properly is logging.

$ vi postgresql.conf
  log_destination = 'stderr' # replace with csvlog when you use scripts/tools to analyze logs

  logging_collector = on # switch log functionality on

  log_directory = '/var/log/postgresql' # where our logs will go

  log_filename = 'postgresql-%Y-%m-%d.log' # log file naming strategy

  log_rotation_age = 1d # log files larger older than 512MB will be rotated

  log_rotation_size = 512MB # log files larger than 512MB will be rotated

  log_min_duration_statement = 50 # each query above 50ms will be logged

  log_line_prefix = '%t [%p]: [%l-1] db=%d,user=%u ' # how our logs will be written

  log_checkpoints = on
  log_connections = on
  log_disconnections = on
  log_hostname = on
  log_lock_waits = on
  log_temp_files = 0  

connections, disconnections, lock_waits, temp_files, checkpoints not only give us some overview but may be helpful to debug other configuration parameters that rely for instance on memory.

You can rely on that logging configuration for most of you projects - having them rotated daily with reasonable metadata will surely help you analyse the problems and bottlenecks of you DB setup.

Logging - PgBadger

Logging itself is a useful little feature, but we can do better :) We can have our logs analyzed by pgBadger. Which will not only aggregate some useful information like crud statistics, but provide a nice graphical representation of what's being done in our database.

Typical setup involves:

  1. Downloading & installing pgbadger according to official documentation.
  2. Setting up a cronjob that uses bash script which handles log files from /var/log/postgresql on a daily basis, creates pgbadger report and stores/uploads it somehwere.

There're many tutorials and guides how to setup pgbadger and tie it with cron: http://www.antelink.com/blog/using-pgbadger-monitor-your-postgresql-activity.html. In the end You'll be able to get visual representation of database queries which may look for instance like that:



There is no golden rule here, backup setup depends on project's business value and couple of other factors. For a small project having a small bash script which performs pg_dump and later pushing the output file to remote destination should be enough.

touch ~/scripts/database_backup.sh && vi ~/scripts/database_backup.sh

Add following script to the created file.


pg_dump -h localhost -U <user> --no-password -F t <database> > /srv/backups/database/db-backup-`date +"%s"`.sql.tar

Save the file and create a cron entry:

crontab -e
# Database backup

0 0,12,19 * * * sh /home/<user>/scripts/database_backup.sh

It's an anti-pattern to store backups on the same server so either connect to your database from remote location or have it another way and create a script that will push your backups to another destination.

Backups - WAL-e

WAL-e is a tool created by heroku that provides continous archiving of WAL segments. To put it simply: if your app can't afford to loose transactions it's a way to go.

All your WAL files will be stored on S3 for backups, in case You need them. WAL is basically a utility which is used by postgresql when it comes to keeping track of database changes before they're applied. Archiving WAL segments with WAL-E will allow you to restore your database to the state from just before a crash.


It's not exactly part of this tutorial, but since I've mentioned creating cronjobs I think it's a good time to introduce You to flock. Most of linux distributions have a command called flock, which will run a command only if it can get a lock on a certain file.

So changing our entry in crontab to:

flock -n /tmp/database_backup.lock -c sh /home/<user>/scripts/database_backup.sh

will prevent us from running duplicate cronjobs, which not only slows server but may lead to hard to track errors. That's how You can easily take care of problems that your cronjobs make cause. Personally, I wrap all crontab entries with flock and I recommend You to do so.


No golden rule here. First play with EXPLAIN ANALYZE to find slow queries. If that's not an option at some point use pgbadger and logs to find queries that may suffer without indexes.

Use partial indexes (also known as filtered) for stuff like:

  • querying database against a column which is NULL for most of the rows
  • querying database against some business value (salary, point of time, status or kind field)

for instance:

SELECT * from orders where payment_deadline IS NOT NULL;


SELECT * from orders where owner_id WHERE value > 1000;

This will index only subset of rows, which will keep index size significantly smaller compared to situation in which you'd index whole table.

Use composite indexes (known also as multicolumn) to index queries that rely constantly on same filtering conditions, like:

SELECT * from orders where owner_id = <owner_id> AND status = <status>;

Generally, indexing foreign keys can be considered good practice (some SQL databases do that automatically). For that You may find the following query useful (shamelessly borrowed from here):

WITH fk_actions ( code, action ) AS (
    VALUES ( 'a', 'error' ),
        ( 'r', 'restrict' ),
        ( 'c', 'cascade' ),
        ( 'n', 'set null' ),
        ( 'd', 'set default' )
fk_list AS (
    SELECT pg_constraint.oid as fkoid, conrelid, confrelid as parentid,
        conname, relname, nspname,
        fk_actions_update.action as update_action,
        fk_actions_delete.action as delete_action,
        conkey as key_cols
    FROM pg_constraint
        JOIN pg_class ON conrelid = pg_class.oid
        JOIN pg_namespace ON pg_class.relnamespace = pg_namespace.oid
        JOIN fk_actions AS fk_actions_update ON confupdtype = fk_actions_update.code
        JOIN fk_actions AS fk_actions_delete ON confdeltype = fk_actions_delete.code
    WHERE contype = 'f'
fk_attributes AS (
    SELECT fkoid, conrelid, attname, attnum
    FROM fk_list
        JOIN pg_attribute
            ON conrelid = attrelid
            AND attnum = ANY( key_cols )
    ORDER BY fkoid, attnum
fk_cols_list AS (
    SELECT fkoid, array_agg(attname) as cols_list
    FROM fk_attributes
    GROUP BY fkoid
index_list AS (
    SELECT indexrelid as indexid,
        pg_class.relname as indexname,
        indpred is not null as has_predicate,
        pg_get_indexdef(indexrelid) as indexdef
    FROM pg_index
        JOIN pg_class ON indexrelid = pg_class.oid
    WHERE indisvalid
fk_index_match AS (
    SELECT fk_list.*,
        indkey::int[] as indexatts,
        array_length(key_cols, 1) as fk_colcount,
        array_length(indkey,1) as index_colcount,
        round(pg_relation_size(conrelid)/(1024^2)::numeric) as table_mb,
    FROM fk_list
        JOIN fk_cols_list USING (fkoid)
        LEFT OUTER JOIN index_list
            ON conrelid = indrelid
            AND (indkey::int2[])[0:(array_length(key_cols,1) -1)] @> key_cols

fk_perfect_match AS (
    SELECT fkoid
    FROM fk_index_match
    WHERE (index_colcount - 1) <= fk_colcount
        AND NOT has_predicate
        AND indexdef LIKE '%USING btree%'
fk_index_check AS (
    SELECT 'no index' as issue, *, 1 as issue_sort
    FROM fk_index_match
    WHERE indexid IS NULL
    SELECT 'questionable index' as issue, *, 2
    FROM fk_index_match
    WHERE indexid IS NOT NULL
        AND fkoid NOT IN (
            SELECT fkoid
            FROM fk_perfect_match)
parent_table_stats AS (
    SELECT fkoid, tabstats.relname as parent_name,
        (n_tup_ins + n_tup_upd + n_tup_del + n_tup_hot_upd) as parent_writes,
        round(pg_relation_size(parentid)/(1024^2)::numeric) as parent_mb
    FROM pg_stat_user_tables AS tabstats
        JOIN fk_list
            ON relid = parentid
fk_table_stats AS (
    SELECT fkoid,
        (n_tup_ins + n_tup_upd + n_tup_del + n_tup_hot_upd) as writes,
        seq_scan as table_scans
    FROM pg_stat_user_tables AS tabstats
        JOIN fk_list
            ON relid = conrelid
SELECT nspname as schema_name,
    relname as table_name,
    conname as fk_name,
FROM fk_index_check
    JOIN parent_table_stats USING (fkoid)
    JOIN fk_table_stats USING (fkoid)
WHERE table_mb > 9
    AND ( writes > 1000
        OR parent_writes > 1000
        OR parent_mb > 10 )
ORDER BY issue_sort, table_mb DESC, table_name, fk_name;

To put it simply: this query finds foreign keys that are not indexed :)

Know Your ORM

At the very end - I don't intend to post here another ORM showdown. What I want I want to outline is that it's good to learn your ORM, consider:

We will be playing with database with couple thousands of rows and following sqlalchemy models. Note that foreign keys are indexed already.

class Country(Base):
    __tablename__ = 'countries'

    id = Column(Integer, primary_key=True)
    name = Column(String(255), nullable=False)

class Player(Base):
    __tablename__ = 'players'

    id = Column(Integer, primary_key=True)
    name = Column(String(255), nullable=False)
    country_id = Column(ForeignKey(u'countries.id'))
    team_id = Column(ForeignKey(u'teams.id'))

    country = relationship(u'Country')
    team = relationship(u'Team')

class Team(Base):
    __tablename__ = 'teams'

    id = Column(Integer, primary_key=True)
    name = Column(String(255), nullable=False)

First let's start simple. Let's fetch all player ids, names and their team_ids;

djangodb=> EXPLAIN ANALYZE SELECT id, name, team_id FROM players;
                                                   QUERY PLAN                                                   
 Seq Scan on players  (cost=0.00..3274.00 rows=200000 width=15) (actual time=0.010..33.241 rows=200000 loops=1)
 Total runtime: 42.780 ms

Ok, let's see how fast sqlalchemy will deal with that query


3.5 seconds. No surprise here - we have to create objects, allocate memory ang generally process everything. But can we do better ?

Let's try fetching particular columns first

session.query(Player.id, Player.name, Player.team_id).all()

0.6 seconds. Not bad. But wait, we can drop declarative and try using core

players = Player.__table__
statement = select([players.c.id, players.c.name, players.c.team_id])
return engine.execute(statement).fetchall()

0.4 seconds. Now, we're talking

Now, let's do some joining

EXPLAIN ANALYZE SELECT p.id, p.name, c.name, t.name FROM players p LEFT OUTER JOIN countries c ON p.country_id = c.id LEFT OUTER JOIN teams t ON p.team_id = t.id;
                                                              QUERY PLAN                                                              
 Hash Left Join  (cost=13118.00..29534.00 rows=200000 width=25) (actual time=160.197..450.810 rows=200000 loops=1)
   Hash Cond: (p.team_id = t.id)
   ->  Hash Left Join  (cost=6559.00..16404.00 rows=200000 width=22) (actual time=85.288..226.012 rows=200000 loops=1)
         Hash Cond: (p.country_id = c.id)
         ->  Seq Scan on players p  (cost=0.00..3274.00 rows=200000 width=19) (actual time=0.016..16.286 rows=200000 loops=1)
         ->  Hash  (cost=3082.00..3082.00 rows=200000 width=11) (actual time=85.046..85.046 rows=200000 loops=1)
               Buckets: 4096  Batches: 16  Memory Usage: 558kB
               ->  Seq Scan on countries c  (cost=0.00..3082.00 rows=200000 width=11) (actual time=0.016..26.853 rows=200000 loops=1)
   ->  Hash  (cost=3082.00..3082.00 rows=200000 width=11) (actual time=74.713..74.713 rows=200000 loops=1)
         Buckets: 4096  Batches: 16  Memory Usage: 558kB
         ->  Seq Scan on teams t  (cost=0.00..3082.00 rows=200000 width=11) (actual time=0.004..23.784 rows=200000 loops=1)
 Total runtime: 458.158 ms

So, using all those 3 techiniques:

session.query(Player.id, Player.name, Country.name, Team.name).\
players = Player.__table__
countries = Country.__table__
teams = Team.__table__
statement = select(
    [players.c.id, players.c.name, countries.c.name, teams.c.name]

takes respectively: 9.5, 1.3 and 0.95 seconds.

Apart from powerful core that allows You to build low level queries, sqlalchemy also comes with yield_per, bundles, powerful join system and from_statement, which are really handy when queries need to perform slightly better.

Stay tuned for part 3!


comments powered by Disqus