As easy as breathing: manage your workflows with Airflow!
Madison Swain-Bowden
Apache Airflow is an open source workflow management tool that's been called "cron on steroids". For a career data engineer, this tool has been central in my success at orchestrating and maintaining data pipelines. But Airflow's applications have grown far beyond the intent for which it was originally built. What was once a machine learning training engine is now a tool I've used extensively over the last 8 years. I've used it across 4 jobs, in several different roles; for side projects and critical infrastructure; for manually triggered jobs and automated workflows; for IT (Ookla/Speedtest.net), science (Allen Institute for Cell Science), the commons (Openverse), the future (Babylist), and liberation (Orca Collective). In this talk, I'll be sharing a brief overview of what Apache Airflow is and how it might be able to help manage your workflows too! As an Airflow user and contributor for the last 8 years, I've seen how this tool can quickly become the hammer for every nail you see. Part of what makes Airflow powerful is that you can define its workflows in pure Python; this means you can leverage all of the clever language features and libraries Python has to offer when setting up a job. No more pesky and repetitive YAML files (GitHub Actions) or domain-specific languages (Jenkins). Use the language and libraries you're familiar with while getting automatic retries, error handling, control flow, and so much more.