Get in Touch

Course Outline

Introduction to Apache Airflow

  • Understanding workflow orchestration.
  • Key features and benefits of Apache Airflow.
  • Overview of Airflow 2.x improvements and the ecosystem.

Architecture and Core Concepts

  • Scheduler, web server, and worker processes.
  • DAGs, tasks, and operators.
  • Executors and backends (Local, Celery, Kubernetes).

Installation and Setup

  • Installing Airflow in local and cloud environments.
  • Configuring Airflow with various executors.
  • Setting up metadata databases and connections.

Navigating the Airflow UI and CLI

  • Exploring the Airflow web interface.
  • Monitoring DAG runs, tasks, and logs.
  • Using the Airflow CLI for administrative tasks.

Authoring and Managing DAGs

  • Creating DAGs using the TaskFlow API.
  • Utilizing operators, sensors, and hooks.
  • Managing dependencies and scheduling intervals.

Integrating Airflow with Data and Cloud Services

  • Connecting to databases, APIs, and message queues.
  • Executing ETL pipelines with Airflow.
  • Cloud integrations: AWS, GCP, and Azure operators.

Monitoring and Observability

  • Task logs and real-time monitoring.
  • Metrics collection with Prometheus and Grafana.
  • Alerting and notifications via email or Slack.

Securing Apache Airflow

  • Role-based access control (RBAC).
  • Authentication through LDAP, OAuth, and SSO.
  • Secrets management using Vault and cloud secret stores.

Scaling Apache Airflow

  • Parallelism, concurrency, and task queues.
  • Using CeleryExecutor and KubernetesExecutor.
  • Deploying Airflow on Kubernetes with Helm.

Best Practices for Production

  • Version control and CI/CD for DAGs.
  • Testing and debugging DAGs.
  • Maintaining reliability and performance at scale.

Troubleshooting and Optimization

  • Debugging failed DAGs and tasks.
  • Optimizing DAG performance.
  • Common pitfalls and strategies to avoid them.

Summary and Next Steps

Requirements

  • Experience with Python programming.
  • Familiarity with data engineering or DevOps concepts.
  • Understanding of ETL processes or workflow orchestration.

Audience

  • Data scientists.
  • Data engineers.
  • DevOps and infrastructure engineers.
  • Software developers.
 21 Hours

Number of participants


Price per participant

Testimonials (7)

Upcoming Courses

Related Categories