Hacker Newsnew | past | comments | ask | show | jobs | submit | marsupialtail_2's commentslogin

Butterfly flies 2600 miles across the ocean just to be caught by a human in a jar to be DNA sequenced ...


The sincerest form of flattery is when AWS decides to come up with a big consortium to displace you with some open source.

Incidentally the most effectively way to stall a project according to the CIA is to have a huge guiding committee with clearly diverging interests.

Redis will win because it's focused on its users. It's competitors will lose. Like OpenSearch, like OpenCL etc.



Glad this is getting some love. This is seriously good software. Have you guys supported generic substring search yet? I recall it was not supported as of a few months ago.


Not yet. Only prefixes. Also you could probably cook something with an ngram tokenizer.

Is it for a field with a high cardinality? If you tell us more about your use case, maybe we can find a workaround.


No just curious. I understand how your indexing structure based on SSTables could find it challenging to support substring search in general. I think it tradeoff between fast querying and flexible functionality


Thanks for the shoutout!


Yes but there is also the inverse carrot problem. E.g. if the pilots have radar, they are more liable to rely on it and neglect other aspects of flying. Similarly in business, it is simply harder for folks who grew up rich to develop the level of grit that comes natural to the less privileged.

I may sound like a rich apologist, but please believe me when I say it is harder to spend 10 hrs a day cranking on a risky startup if you know you can be clubbing with daddy's money


Hi Justin, you might be interested in my blog: https://github.com/marsupialtail/quokka/blob/master/blog/bac... advocating a cloud based approach.

You don't have to use the system I am building, but it's worth thinking about that design.


Cool, thanks. I'll check it out!


Perhaps I charged too little when I contracted away my 10x random forest inference solution...


SQL support is very challenging.

I work on Quokka (https://github.com/marsupialtail/quokka). I support Iceberg reads. Recently we are adding SQL support from just parsing the DuckDB logical plan, though that is very challenging as well.

The Python world lacks a standard for a plug and play SQL query optimizer. Apache Calcite is good for the JVM world, but not great if you are trying to cut out the JVM.


While we are on this topic, the challenge with data lakes for Python based projects like Daft and Quokka (what I work on) is the poor Python support for data lakes like Delta, Iceberg and Hudi. Delta has the best support but its Python API is consistently behind the Java ones. Iceberg doesn't support Python writes. Hudi doesn't support anything Python.

I have users demanding Iceberg writes and Hudi reads/writes. I don't know what to tell them, since I don't have the resources to add a reader/writer myself for those projects.

Hopefully as DuckDB becomes more popular we will see Python bindings for these popular data lake formats this year.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: