I'm sure this is right for someone, everyone has different requirements, but I don't really want a lighter-weight Airflow. I want an Airflow that runs and scales in the cloud, has extensive observability (monitoring, tracing), has a full API, and maybe some clear way to test workflows.
I was looking into how Google's Cloud Composer is run, which is a managed Airflow service. They use gcsfuse to mount a directory for logs, because Airflow insists on writing logs to local disk with no cleanup system, even if you configure logs to be sent to S3/GCS. To health check the scheduler they query Stackdriver Logging to see if has logged anything in the last five minutes, because the scheduler has no /healthz or other way to check health. There is no built it way to monitor workflows, so you can't easily do something like graph failures by workflow, email on failure is about all you get. A GUI-first app that requires local storage is not what I expect these days.
> I want an Airflow that runs and scales in the cloud
I'd encourage you to look at Reflow [1] which takes a different approach: it's entirely self-managing: you run Reflow like you would a normal programming language interpreter ("reflow run myjob.rf") and Reflow creates ephemeral nodes that scale elastically and that tear themselves down, only for the purpose of running the program.
> has extensive observability (monitoring, tracing)
Reflow includes a good amount of observability tools out of the box; we're also working on integrating tracing facilities (e.g., reporting progress to Amazon x-ray).
> has a full API, and maybe some clear way to test workflows.
Reflow's approach to testing is exactly like any other programming language: you write modules that can either be used in a "main" program, or else be used in tests.
+1 to this, I've had multiple teams that have shied away from airflow because of its operability & deployment story, despite needing something that has all its features.
In terms of radically different takes on workflow engines, I'm very interested in reflow. I haven't used it enough to know if the rough edges are a deal breaker.
> I want an Airflow that runs and scales in the cloud, has extensive observability (monitoring, tracing), has a full API, and maybe some clear way to test workflows.
I am an Engineer over at Astronomer.io and we are working on making Airflow the best piece of software it can be. We offer both Enterprise and Cloud editions. We are solving exactly the problems you describe above.
Take a look and feel free to reach out if you have any questions or feedback.
if I may, have a look at Streamsets (https://streamsets.com/) We looked at their open source version in our research and it seems pretty comprehensive. Not sure if it's a simple single click deploy on your cloud provider, but if it's deployed on a Spark cluster or with Mesos, I'm sure you've got what you need.
I was looking into how Google's Cloud Composer is run, which is a managed Airflow service. They use gcsfuse to mount a directory for logs, because Airflow insists on writing logs to local disk with no cleanup system, even if you configure logs to be sent to S3/GCS. To health check the scheduler they query Stackdriver Logging to see if has logged anything in the last five minutes, because the scheduler has no /healthz or other way to check health. There is no built it way to monitor workflows, so you can't easily do something like graph failures by workflow, email on failure is about all you get. A GUI-first app that requires local storage is not what I expect these days.