Apache Airflow Installation on Ubuntu

Taufiq Ibrahim
4 min readOct 2, 2017

--

This is the documentation of Apache Airflow installation using Ubuntu on Windows.

Preparation

pip

Apache Airflow requires pip for installation. Run below command.

sudo apt-get install python-pip

Perhaps if the default installed pip is not the up-to-date version, you may want to consider updating it:

sudo pip install --upgrade pip

Installing Database Backend (PostgreSQL)

sudo apt-get install postgresql postgresql-contrib

Now we already have Postgres installed. Next, we need to create

  • a database for Airflow
  • a user having access to the database

Creating Postgres User and Database

  • Creating new Linux user airflow.
sudo adduser airflow
[sudo] password for tole:
Adding user `airflow’ …
Adding new group `airflow’ (1001) …
Adding new user `airflow’ (1001) with group `airflow’ …
Creating home directory `/home/airflow’ …
Copying files from `/etc/skel’ …
Enter new UNIX password:
Retype new UNIX password:
passwd: password updated successfully
Changing the user information for airflow
Enter the new value, or press ENTER for the default
Full Name []: airflow
Room Number []:
Work Phone []:
Home Phone []:
Other []:
Is the information correct? [Y/n] Y
  • Use usermod to add new user airflow into the sudo group.
sudo usermod -aG sudo airflow
  • Now we change the shell into user airflow.
su - airflow
  • Now we use user postgres to create psql role for airflow.
sudo -u postgres psql

Note: when I run above line in Ubuntu on Windows, error was thrown like below:

sudo: setresuid() [1000, 113, 1000] -> [-1, 0, -1]: Operation not permitted
sudo: unable to set runas group vector: Operation not permitted
sudo: PERM_ROOT: setresuid(0, -1, 0): Operation not permitted

Temporary solution was using:

user@server:~$sudo su postgres
postgres@server:/home/user$ psql

Now we’re in postgres=#

Let’s create Postgres user for airflow. Still in psql console.

postgres=# CREATE USER airflow PASSWORD ‘a1rfl0w’;
CREATE ROLE
postgres=# CREATE DATABASE airflow;
CREATE DATABASE
postgres=# GRANT ALL PRIVILEGES ON ALL TABLES IN SCHEMA public TO airflow;
GRANT
postgres=# \du
List of roles
Role name | Attributes | Member of
— — — — — -+ — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — + — — — — — -
airflow | | {}
postgres | Superuser, Create role, Create DB, Replication, Bypass RLS | {}

Now let’s try to check wether the database already set up and can be accessed by user airflow.

psql -d airflow
psql (9.5.9)
Type "help" for help.
airflow=> \conninfo
You are connected to database "airflow" as user "airflow" via socket in "/var/run/postgresql" at port "5432".
airflow=>

If something like above shown, then you’re good.

Change pg_hba.conf Setting

We also need to reconfigure pg_hba.conf to allow connection from airflow.

sudo nano /etc/postgresql/9.5/main/pg_hba.conf

Change IPV4 address to 0.0.0.0/0 and the IPV4 method to trust.

# IPv4 local connections:
host all all 0.0.0.0/0 trust
// Restart the service
sudo service postgresql restart

Next, we configure postgresql.conf.

sudo nano /etc/postgresql/9.5/main/postgresql.conf# — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — 
# CONNECTIONS AND AUTHENTICATION
# — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — —
# — Connection Settings -#listen_addresses = ‘localhost’ # what IP address(es) to listen on;
listen_addresses = ‘*’ # for Airflow connection
// Restart the service
sudo service postgresql restart

Airflow Installation

Set Up Airflow Default Home

export AIRFLOW_HOME=~/airflow

Installing Airflow

pip install "[airflow[postgres, mssql, celery, rabbitmq]"

Check here or more details on the list of the subpackages and what they enable.

Starting Up Airflow Database

After successful packages installation, we can start Airflow’s database by issuing:

airflow initdb

The command will generate airflow.cfg file in Airflow’s home directory we set up earlier.

Set up airflow.cfg

  • we should use CeleryExecutor instead of SequentialExecutor if we want to run the pipeline in the webUI
executor = CeleryExecutor
  • we should pass along the connection info of the postgresql database airflow we just created
sql_alchemy_conn = postgresql+psycopg2://ubuntu@localhost:5432/airflow

Save it and run airflow initdb

Disabling Examples

nano airflow/airflow.cfg....# Whether to load the examples that ship with Airflow. It’s good to
# get started, but you probably want to set this to False in a production
# environment
load_examples = False

Starting Airflow Web Server

This command will start web server in your localhost at port 8080.

airflow webserver

Next. We will cover how to integrate Airflow with Microsoft SQL Server database.

--

--