Airflow

Description

Airflow terms and concepts.
Zdeněk Šimůnek
Flashcards by Zdeněk Šimůnek, updated more than 1 year ago
Zdeněk Šimůnek
Created by Zdeněk Šimůnek over 3 years ago
10
0

Resource summary

Question Answer
DAG Directed Acyclic Graph - Collection of tasks, their dependencies and settings. - Defined in .py script as code.
XCom Feature for cross communication between tasks.
dags_folder - The folder where airflow pipelines live. - This path must be absolute. - Airflow looks in your DAGS_FOLDER for modules that contain DAG objects in their GLOBAL NAMESPACE and adds the objects it finds in the DagBag.
DAG Run - An instance of a DAG, containing task instances that run for a specific execution_date. - Created by the Airflow scheduler or an external trigger.
Task - A Task defines a unit of work within a DAG; it is represented as a node in the DAG graph, and it is written in Python. - Each task is an implementation of an Operator.
Operator An operator describes a single task in a workflow.
Sensor An Operator that waits (polls) for a certain time, file, database row, S3 key, etc.
chain(op1, [op2, op3], [op4, op5], op6) op1 >> [op2, op3] op2 >> op4 op3 >> op5 [op4, op5] >> op6
Task Instance An instance of a task - that has been assigned to a DAG and has a state associated with a specific DAG run (i.e for a specific execution_date).
execution_date The logical date and time for a DAG Run and its Task Instances.
Jinja Jinja is a modern and designer-friendly templating language for Python, modelled after Django’s templates.
Hooks - Hooks are interfaces to external platforms and databases like Hive, S3, MySQL, Postgres, HDFS, and Pig. - Hooks implement a common interface when possible, and act as a building block for operators.
Pools Airflow pools can be used to limit the execution parallelism on arbitrary sets of tasks.
Connections The information needed to connect to external systems is stored in the Airflow metastore database. A conn_id is defined there, and hostname / login / password / schema information attached to it. Airflow pipelines retrieve centrally-managed connections information by specifying the relevant conn_id.
Show full summary Hide full summary

Similar

Code Challenge Flow Chart
Charlotte Hilton
Flvs foundations of programming dba 2
mariaha vassar
psycholgoy as level topic 2 - memory
Talya Hambling
EDEXCEL IGCSE (9-1) COMPUTER SCIENCE
CreativeKai 03
Chapter 10: Medical coding
Kelly Martin
Basic Python - Strings
Rebecca Noel
Our Story
Natalia R
Coding Test!
vapetrop
BGE HTML + CSS
Ian Simpson