智子airflow配置指南

安装python3

https://www.yuzhi100.com/tutorial/centos/centos-anzhuang-python36

1
2
3
4
5
6
7
8
9
10
11
#安装EPEL依赖
sudo yum install epel-release

#安装IUS软件源
sudo yum install https://centos7.iuscommunity.org/ius-release.rpm

sudo yum install python36u
sudo ln -s /bin/python3.6 /bin/python3

sudo yum install python36u-pip
sudo ln -s /bin/pip3.6 /bin/pip3

安装airflow

1. 添加环境变量

1
export SLUGIFY_USES_TEXT_UNIDECODE=yes

2. 环境安装

1
2
3
4
5
6
sudo yum install python36u-devel.x86_64

sudo yum install mysql-community-devel.x86_64

# sasl/sasl.h: No such file or directory
yum install gcc-c++ cyrus-sasl-devel.x86_64

3. 元数据库配置(mysql)

1
2
3
4
5
6
7
8
9
-- xxxx

CREATE DATABASE airflow;

GRANT all privileges on airflow.* TO 'root'@'localhost' IDENTIFIED BY 'xxxx';

ALTER USER 'root'@'localhost' IDENTIFIED BY 'xxxx' PASSWORD EXPIRE NEVER;

ALTER USER 'root'@'localhost' IDENTIFIED WITH mysql_native_password BY 'xxxx';

4. \$AIRFLOW_HOME/airflow.cfg文件配置

将AIRFLOW_HOME加入环境变量

1
sql_alchemy_conn = mysql://root:xxxx@localhost:3306/airflow
1
2
3
4
5
# -*- coding: utf-8 -*-
from cryptography.fernet import Fernet

fernet_key= Fernet.generate_key()
print(fernet_key) # your fernet_key, keep it in secured place!
1
2
# 安装加密模块
pip install flask-bcrypt

暴露端口5001

5. 配置用户

1
2
3
4
5
6
7
8
9
10
11
12
13
14
# -*- coding: utf-8 -*-
import airflow
from airflow import models, settings
from airflow.contrib.auth.backends.password_auth import PasswordUser

user = PasswordUser(models.User())
user.username = 'alithink'
user.email = 'xxxx'
user.password = 'xxxx'
session = settings.Session()
session.add(user)
session.commit()
session.close()
exit()

启动airflow

1
2
3
4
nohup airflow webserver -p 5001 &

# 每次resetdb后scheduler要重启
nohup airflow scheduler &

airflow tips

  • cfg配置改变后要进行重启
  • 默认utc时间,建议在dag配置的时候进行时区的考量(web ui只支持utc…)
  • dag开关置为on之后,如果scheduler已启动,start-date到目前每个执行计划节点的任务都会依次执行。
  • 可以点击立刻执行,进行手动dag执行。
  • 每个节点的日志,可以点击对应task,然后查看task instance log
  • UTC时间,需要在原本打算设置的时间减8小时
  • catch_up: 如果指定的开始时间早于当前时间且catch_up设置为true,那么airflow会把过去‘遗漏’的调度执行一遍