Apache Airflow Installation on Ubuntu
This is the documentation of Apache Airflow installation using Ubuntu on Windows.
Preparation
pip
Apache Airflow requires pip for installation. Run below command.
sudo apt-get install python-pip
Perhaps if the default installed pip
is not the up-to-date version, you may want to consider updating it:
sudo pip install --upgrade pip
Installing Database Backend (PostgreSQL)
sudo apt-get install postgresql postgresql-contrib
Now we already have Postgres installed. Next, we need to create
- a database for Airflow
- a user having access to the database
Creating Postgres User and Database
- Creating new Linux user airflow.
sudo adduser airflow
[sudo] password for tole:
Adding user `airflow’ …
Adding new group `airflow’ (1001) …
Adding new user `airflow’ (1001) with group `airflow’ …
Creating home directory `/home/airflow’ …
Copying files from `/etc/skel’ …
Enter new UNIX password:
Retype new UNIX password:
passwd: password updated successfully
Changing the user information for airflow
Enter the new value, or press ENTER for the default
Full Name []: airflow
Room Number []:
Work Phone []:
Home Phone []:
Other []:
Is the information correct? [Y/n] Y
- Use usermod to add new user airflow into the sudo group.
sudo usermod -aG sudo airflow
- Now we change the shell into user airflow.
su - airflow
- Now we use user postgres to create psql role for airflow.
sudo -u postgres psql
Note: when I run above line in Ubuntu on Windows, error was thrown like below:
sudo: setresuid() [1000, 113, 1000] -> [-1, 0, -1]: Operation not permitted
sudo: unable to set runas group vector: Operation not permitted
sudo: PERM_ROOT: setresuid(0, -1, 0): Operation not permitted
Temporary solution was using:
user@server:~$sudo su postgres
postgres@server:/home/user$ psql
Now we’re in postgres=#
Let’s create Postgres user for airflow. Still in psql console.
postgres=# CREATE USER airflow PASSWORD ‘a1rfl0w’;
CREATE ROLE
postgres=# CREATE DATABASE airflow;
CREATE DATABASE
postgres=# GRANT ALL PRIVILEGES ON ALL TABLES IN SCHEMA public TO airflow;
GRANT
postgres=# \du
List of roles
Role name | Attributes | Member of
— — — — — -+ — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — + — — — — — -
airflow | | {}
postgres | Superuser, Create role, Create DB, Replication, Bypass RLS | {}
Now let’s try to check wether the database already set up and can be accessed by user airflow.
psql -d airflow
psql (9.5.9)
Type "help" for help.airflow=> \conninfo
You are connected to database "airflow" as user "airflow" via socket in "/var/run/postgresql" at port "5432".
airflow=>
If something like above shown, then you’re good.
Change pg_hba.conf Setting
We also need to reconfigure pg_hba.conf to allow connection from airflow.
sudo nano /etc/postgresql/9.5/main/pg_hba.conf
Change IPV4 address to 0.0.0.0/0 and the IPV4 method to trust.
# IPv4 local connections:
host all all 0.0.0.0/0 trust// Restart the service
sudo service postgresql restart
Next, we configure postgresql.conf.
sudo nano /etc/postgresql/9.5/main/postgresql.conf# — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — —
# CONNECTIONS AND AUTHENTICATION
# — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — # — Connection Settings -#listen_addresses = ‘localhost’ # what IP address(es) to listen on;
listen_addresses = ‘*’ # for Airflow connection// Restart the service
sudo service postgresql restart
Airflow Installation
Set Up Airflow Default Home
export AIRFLOW_HOME=~/airflow
Installing Airflow
pip install "[airflow[postgres, mssql, celery, rabbitmq]"
Check here or more details on the list of the subpackages and what they enable.
Starting Up Airflow Database
After successful packages installation, we can start Airflow’s database by issuing:
airflow initdb
The command will generate airflow.cfg
file in Airflow’s home directory we set up earlier.
Set up airflow.cfg
- we should use CeleryExecutor instead of SequentialExecutor if we want to run the pipeline in the webUI
executor = CeleryExecutor
- we should pass along the connection info of the postgresql database
airflow
we just created
sql_alchemy_conn = postgresql+psycopg2://ubuntu@localhost:5432/airflow
Save it and run airflow initdb
Disabling Examples
nano airflow/airflow.cfg....# Whether to load the examples that ship with Airflow. It’s good to
# get started, but you probably want to set this to False in a production
# environment
load_examples = False
Starting Airflow Web Server
This command will start web server in your localhost at port 8080.
airflow webserver
Next. We will cover how to integrate Airflow with Microsoft SQL Server database.