On programming and stuff

Python in Production (I) - Linux

| Comments


The aim of this guide (I plan to create couple of separate parts for different aspects of running web application in production mode) is to introduce You to key concepts and problems that may occur on live server. I'll touch server itself, database, application server and deployment and provide You with configuration files and flow that I use and rely on. Note that it's extremely subjective, but I think that configuration may be considered at least as "reasonable default". Let's dig in, shall we?


Most of the posts refer to the stack I work on, which is:
- nginx
- haproxy
- uwsgi
- redis
- postgresql
- python2.7


By Linux I refer to Ubuntu since that's the distribution I use for most of my projects. So, playing with linux configuration we can affect three things mostly:
- I/O
- Nginx
- Memory

Max open files

By default set to 1024. Since sockets are used to communication between different tools, they affect conncurrent connections which our stack is able to handle. If we expect high traffic, it's a good practice to tune that setting:

$ ulimit -n 99999
$ vi /etc/security/limits.conf
    nginx       soft    nofile  99999
    nginx       hard    nofile  99999

Now reload the changes

$ sudo sysctl -p

Kernel queue for accepting new connections

By default set to 128, it represents a size of kernel queue for accepting new connections.

$ sysctl -w net.core.somaxconn=99999
$ vi /etc/sysctl.d/haproxy-tuning.conf

Usable ports

By default set to 32768-61000, it represents range of ports that can be used by our system. The number affects number of concurrent open connections.

$ sysctl -w net.ipv4.ip_local_port_range="10000 61000"
$ vi /etc/sysctl.d/haproxy-tuning.conf
    net.ipv4.ip_local_port_range=10000 61000

Socket recycling


Common misconception is to enable fast recycling (most of tuning guides provide such advice), so sockets do not stay in TIME_WAIT, like that:

$ vi /etc/sysctl.conf
  net.ipv4.tcp_tw_recycle = 1
  net.ipv4.tcp_tw_reuse = 1

However, as explained here: Click its highly discouraged.

Filesystem access

In order to improve I/O we can tell linux not to store information about last file access or read time (which it keeps by default). In order to change that, modify confiration of a partition which your files reside in


$ vi /etc/fstab
    UUID=<UUID> /               ext4    errors=remount-ro 0       1


$ vi /etc/fstab
    UUID=<UUID> /               ext4    noatime,nodiratime,errors=remount-ro 0       1

noatime affects files, and nodiratime directories respectively

In memory filesystem for /tmp


$ vi /etc/fstab
    tmpfs /tmp               tmpfs    defaults,nosuid,noatime 0       0

to your /etc/fstab file results in replacing a filesystem for /tmp directory with an in-memory filesystem. This will highly increase I/O performance on file uploads. Note that it may be a bottleneck when files that are being uploaded are large or if you are lacking RAM.

At the very end mount new filesystem:

$ mount /tmp

Getting swap right

When you're forced to add some swap for your system be sure to put those two lines in your sysctl.conf:

$ vi /etc/sysctl.conf
  vm.vfs_cache_pressure = 50

which respectively tell our system not to swap data of RAM to swap place that often (swapiness) and
tell our system to cache access data so it's not looked up frequently (vfs_cache_pressure).

The three things I want you to remember after this part are:

  1. Default configuration of you system is good, but may not be properly tuned for high loads and getting maximum out of tools from your stack (nginx, haproxy).
  2. No one knows all that stuff by heart (at least I don't), so if you find that article useful, save it somewhere so you can look it up later when it comes to configuration, or create an ansible playbook that deals with that stuff :-).
  3. Stay tuned for part 2 !



comments powered by Disqus