Apache Airflow Installation on Ubuntu

Preparation

pip

Apache Airflow requires pip for installation. Run below command.

sudo apt-get install python-pip

Installing Database Backend (PostgreSQL)

sudo apt-get install postgresql postgresql-contrib
  • a database for Airflow
  • a user having access to the database

Creating Postgres User and Database

  • Creating new Linux user airflow.
sudo adduser airflow
[sudo] password for tole:
Adding user `airflow’ …
Adding new group `airflow’ (1001) …
Adding new user `airflow’ (1001) with group `airflow’ …
Creating home directory `/home/airflow’ …
Copying files from `/etc/skel’ …
Enter new UNIX password:
Retype new UNIX password:
passwd: password updated successfully
Changing the user information for airflow
Enter the new value, or press ENTER for the default
Full Name []: airflow
Room Number []:
Work Phone []:
Home Phone []:
Other []:
Is the information correct? [Y/n] Y
  • Use usermod to add new user airflow into the sudo group.
sudo usermod -aG sudo airflow
  • Now we change the shell into user airflow.
su - airflow
  • Now we use user postgres to create psql role for airflow.
sudo -u postgres psql
sudo: setresuid() [1000, 113, 1000] -> [-1, 0, -1]: Operation not permitted
sudo: unable to set runas group vector: Operation not permitted
sudo: PERM_ROOT: setresuid(0, -1, 0): Operation not permitted
user@server:~$sudo su postgres
postgres@server:/home/user$ psql
postgres=# CREATE USER airflow PASSWORD ‘a1rfl0w’;
CREATE ROLE
postgres=# CREATE DATABASE airflow;
CREATE DATABASE
postgres=# GRANT ALL PRIVILEGES ON ALL TABLES IN SCHEMA public TO airflow;
GRANT
postgres=# \du
List of roles
Role name | Attributes | Member of
— — — — — -+ — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — + — — — — — -
airflow | | {}
postgres | Superuser, Create role, Create DB, Replication, Bypass RLS | {}
psql -d airflow
psql (9.5.9)
Type "help" for help.
airflow=> \conninfo
You are connected to database "airflow" as user "airflow" via socket in "/var/run/postgresql" at port "5432".
airflow=>

Change pg_hba.conf Setting

We also need to reconfigure pg_hba.conf to allow connection from airflow.

sudo nano /etc/postgresql/9.5/main/pg_hba.conf
# IPv4 local connections:
host all all 0.0.0.0/0 trust
// Restart the service
sudo service postgresql restart
sudo nano /etc/postgresql/9.5/main/postgresql.conf# — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — 
# CONNECTIONS AND AUTHENTICATION
# — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — —
# — Connection Settings -#listen_addresses = ‘localhost’ # what IP address(es) to listen on;
listen_addresses = ‘*’ # for Airflow connection
// Restart the service
sudo service postgresql restart

Airflow Installation

Set Up Airflow Default Home

export AIRFLOW_HOME=~/airflow

Installing Airflow

pip install "[airflow[postgres, mssql, celery, rabbitmq]"

Starting Up Airflow Database

After successful packages installation, we can start Airflow’s database by issuing:

airflow initdb

Set up airflow.cfg

  • we should use CeleryExecutor instead of SequentialExecutor if we want to run the pipeline in the webUI
executor = CeleryExecutor
  • we should pass along the connection info of the postgresql database airflow we just created
sql_alchemy_conn = postgresql+psycopg2://ubuntu@localhost:5432/airflow

Disabling Examples

nano airflow/airflow.cfg....# Whether to load the examples that ship with Airflow. It’s good to
# get started, but you probably want to set this to False in a production
# environment
load_examples = False

Starting Airflow Web Server

This command will start web server in your localhost at port 8080.

airflow webserver

--

--

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store