Hacker News new | past | comments | ask | show | jobs | submit login

I'm routinely confused about the role these sort of tools play. I conceptually understand what they're doing but weren't these problems solved a long time ago with materialized views and foreign data wrappers?

The hard part of ETL for me has always been gracefully handling the outliers, the data that you need to look at within the context of multiple rows (e.g. duplicate rows that aren't exactly the same so it's not a simple SELECT DISTINCT) in order to make the correct decision, or the entity matching/mapping/categorization that often is necessary. Having the lookup tables that often need manual oversight when new entries show up that haven't yet been properly tagged. Or if you're ingesting address data and you want to normalize it through a geocoder, where exactly does that occur in these SQL-templating "pipelines"?

I feel like these are basically focused on moving data between different DBs and generating group-by queries to populate some rollup tables?




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: