For those of us stuck on AWS it's sad not having BigQuery, but the thing that really gets me is not having Dataflow
Most of industry still seems unaware that no-knobs data query and pipeline systems even exist. If I only had a dollar for every time I saw a PR tweaking the memory settings of some Spark job or hive query that stopped running as the input data grew....
I'd love to see more people write their workflows using the Apache Beam API so they'll have the option to switch to a no-knobs, scalable pipeline engine in the future even if they're not using one today.
Most of industry still seems unaware that no-knobs data query and pipeline systems even exist. If I only had a dollar for every time I saw a PR tweaking the memory settings of some Spark job or hive query that stopped running as the input data grew....
I'd love to see more people write their workflows using the Apache Beam API so they'll have the option to switch to a no-knobs, scalable pipeline engine in the future even if they're not using one today.