Azkaban 研究
内容简介
适合 人群:
大数据开发者、DevOps、运维工程师
您将了解到:
azkaban API 调用、带参数化的 workflow、邮件报警、控制台查看和用户管理
选择 理由:
开源、官方文档支持很好,对比 airflow 时间概念清晰、UI 优秀,良好的用户权限控制
使用 场景:
在每天要完成数据仓库的清洗,数据更新的任务下,azkaban 具有 schedule 和任务处理逻辑的功能; 同时 DevOps 也具有安全的可交付性
您的 收获:
azkaban 的部署、API、ETL 参数、user、notice 的快速实施
内容概览
azkaban deploy
azkaban user management
azkaban API
azkaban 参数化 Run Job
azkaban email notice 、 azkaban UI console
--------------------------------------------------------------------------------------------------------
--------------------------------------------------------------------------------------------------------
文章内容
-azkaban deploy
deploy 主要说明:
azkaban 作为 workflow ,运行的任务大部分远程主机上;
而提供azkaban服务的主机负责存放 ETL 任务的脚本;
当面对任务并发数多时,可 deply 为azkaban-multi-executor 模式;
而 azkaban-multi-executor 模式的 是增加 executor 连接上 mysql 并使用统一的配置文件就好
加入示例:
.../azkaban/azkaban-exec-server/build/install/azkaban-exec-server/conf/azkaban.properties
# mysql
......
mysql.host=mysqlhost
......
deploy 主要角色:
mysql: azkaban 后端存储数据库
azkaban-web: UI
console 和 API控制主机
azkaban-executor:
workflow 任务执行的host
deploy 主要步骤:
编译 azkaban 源码
初始化 azkaban 数据库
配置 azkaban 连接 、用户 、任务调度 、邮件信息
- **源码编译:
git clonehttps://github.com/azkaban/azkaban.git
cd azkaban
./gradlewinstallDist
编译后的主要文件:
azkaban/azkaban-db/build/sql
azkaban/azkaban-exec-server/build/
azkaban/azkaban-web-server/build/
zkaban/az-exec-util
azkaban/az-examples/flow20-projects/basicFlow20Project.zip
- **初始化 数据库:
sql 文件:azkaban/azkaban-db/build/sql/create-all-sql-3.82.0-8-g11595ad.sql
- **连接配置:
database.type=mysql
- ** 用户
azkaban-users.xml
- ** 邮箱
mail.sender=help@xxx.com
……
- ** 任务调度
允许分配至上个executor
azkaban.executorselector.comparator.LastDispatched=0
允许内存不足1G 时分配任务
azkaban.executorselector.comparator.Memory=0
允许的任务数
azkaban.executorselector.comparator.NumberOfAssignedFlowComparator=10
允许CPU 不足1G 分配
azkaban.executorselector.comparator.CpuUsage=0……
- ** azkaban lib
cdazkaban/az-exec-util/src/main/c gcc execute-as-user.c -o execute-as-user chownroot execute-as-user chmod 6050 execute-as-user
azkaban.jobtype.plugin.dir=azkaban/azkaban-exec-server/build/install/azkaban-exec-server/plugins/jobtypesazkaban.native.lib=azkaban/az-exec-util/src/main/c
--------------------------------------------------------------------------------------------------------
--------------------------------------------------------------------------------------------------------
-azkaban user management
azkaban 用户管理 也是 azkaban 权限管理
user.manager.xml.file=azkaban/azkaban-web-server/build/install/azkaban-web-server/conf/azkaban-users.xml
lockdown.create.projects=true
权限级别:
Permissions Values
ADMIN Grants allaccess to everything in Azkaban.
READ Gives usersread only access to every project and their logs
WRITE Allowsusers to upload files, change job properties or remove any project
EXECUTE Allowsusers to trigger the execution of any flow
SCHEDULE Userscan add or remove schedules for any flows
CREATEPROJECTS Allows users to create new projects if project creation is locked down
--------------------------------------------------------------------------------------------------------
--------------------------------------------------------------------------------------------------------
-azkaban API
SessionId:
curl -k -X POST--data "action=login&username=azkaban&password=azkaban"http://localhost:8081
Execute a Flow:
curl -k --get
--data 'session.id=session.id′−−data′ajax=executeFlow′−−data′project=session.id′−−data′ajax=executeFlow′−−data′project={projectname}'
--data 'flow=${flowname}' http://localhost:8081/executor
-- - parameter :
failureEmails=xxx@xxx.com, xxy@xxx.com
-- - scriptsparameter & otherparameter:
flowOverride[parameter_name]=value
Schedule a period-based Flow:
curl -khttp://HOST:PORT/schedule -d "ajax=scheduleFlow&isrecurring=on
&period=5w &projectName=PROJECTNAME &flow=FLOWNAME
&projectId=PROJECTID &scheduleTime=12,00,pm,PDT&scheduleDate=07/22/2014" -b azkaban.browser.session.id=SESSION_ID
-- - parameter:
PROJECT_ID : select * from azkaban.projects where project =''project_name";
scheduleTime:按照 北京时间的话,要减去8小时候时间后与当前同步
scheduleDate:flow开始时间
---
--------------------------------------------------------------------------------------------------------
--------------------------------------------------------------------------------------------------------
-azkaban参数化 Run Job
模板 flow 文件:
azkaban/az-examples/flow20-projects/basicFlow20Project.zip
-- - flow 文件:flowname.flow
"${parame}"
-- - 参数文件:flowname.job
parame={parame}
-- - flow 依赖
flowname.flow:
dependsOn:
- jobA
--------------------------------------------------------------------------------------------------------
-azkaban email notice
config sender
#mail settings
mail.sender=help@xxx.com
mail.host=smtp.xxx.xxx.cn
mail.port=587
mail.user=help@xxx.com
mail.password=xxx
mail.tls=true
notice config:
curl -k --get
--data 'session.id=session.id′−−data′ajax=executeFlow′−−data′project=session.id′−−data′ajax=executeFlow′−−data′project={projectname}'--data 'failureEmails=xxx@xxx.com, xxy@xxx.com' --data 'flow=${flowname}'http://localhost:8081/executor
---
--------------------------------------------------------------------------------------------------------
--------------------------------------------------------------------------------------------------------
-azkaban UI console & run a project
web lib config:
web.resource.dir=azkaban/azkaban-web-server/build/install/azkaban-web-server/web/
executor port:
executor.port=12321
start server
-- - executor:
azkaban/azkaban-exec-server/build/install/azkaban-exec-server/bin/start-exec.sh
curlhttp://executorhost:12321/executor?action=activate
-- - web:azkaban/azkaban-web-server/build/install/azkaban-web-server/bin/start-web.sh
run a project flow by UI