Glad this is getting some love. This is seriously good software. Have you guys supported generic substring search yet? I recall it was not supported as of a few months ago.
No just curious. I understand how your indexing structure based on SSTables could find it challenging to support substring search in general. I think it tradeoff between fast querying and flexible functionality
Yes but there is also the inverse carrot problem. E.g. if the pilots have radar, they are more liable to rely on it and neglect other aspects of flying. Similarly in business, it is simply harder for folks who grew up rich to develop the level of grit that comes natural to the less privileged.
I may sound like a rich apologist, but please believe me when I say it is harder to spend 10 hrs a day cranking on a risky startup if you know you can be clubbing with daddy's money
I work on Quokka (https://github.com/marsupialtail/quokka). I support Iceberg reads. Recently we are adding SQL support from just parsing the DuckDB logical plan, though that is very challenging as well.
The Python world lacks a standard for a plug and play SQL query optimizer. Apache Calcite is good for the JVM world, but not great if you are trying to cut out the JVM.
While we are on this topic, the challenge with data lakes for Python based projects like Daft and Quokka (what I work on) is the poor Python support for data lakes like Delta, Iceberg and Hudi. Delta has the best support but its Python API is consistently behind the Java ones. Iceberg doesn't support Python writes. Hudi doesn't support anything Python.
I have users demanding Iceberg writes and Hudi reads/writes. I don't know what to tell them, since I don't have the resources to add a reader/writer myself for those projects.
Hopefully as DuckDB becomes more popular we will see Python bindings for these popular data lake formats this year.