More

marsupialtail_2 · on July 1, 2024

Butterfly flies 2600 miles across the ocean just to be caught by a human in a jar to be DNA sequenced ...

marsupialtail_2 · on March 29, 2024

The sincerest form of flattery is when AWS decides to come up with a big consortium to displace you with some open source.

Incidentally the most effectively way to stall a project according to the CIA is to have a huge guiding committee with clearly diverging interests.

Redis will win because it's focused on its users. It's competitors will lose. Like OpenSearch, like OpenCL etc.

marsupialtail_2 · on Feb 16, 2024

code: https://github.com/Vince7778/Emojile

marsupialtail_2 · on Jan 9, 2024

Glad this is getting some love. This is seriously good software. Have you guys supported generic substring search yet? I recall it was not supported as of a few months ago.

fulmicoton · on Jan 9, 2024

Not yet. Only prefixes. Also you could probably cook something with an ngram tokenizer.

Is it for a field with a high cardinality? If you tell us more about your use case, maybe we can find a workaround.

marsupialtail_2 · on Jan 9, 2024

No just curious. I understand how your indexing structure based on SSTables could find it challenging to support substring search in general. I think it tradeoff between fast querying and flexible functionality

marsupialtail_2 · on Sept 12, 2023

Thanks for the shoutout!

marsupialtail_2 · on Aug 13, 2023

Yes but there is also the inverse carrot problem. E.g. if the pilots have radar, they are more liable to rely on it and neglect other aspects of flying. Similarly in business, it is simply harder for folks who grew up rich to develop the level of grit that comes natural to the less privileged.

I may sound like a rich apologist, but please believe me when I say it is harder to spend 10 hrs a day cranking on a risky startup if you know you can be clubbing with daddy's money

marsupialtail_2 · on June 30, 2023

Hi Justin, you might be interested in my blog: https://github.com/marsupialtail/quokka/blob/master/blog/bac... advocating a cloud based approach.

You don't have to use the system I am building, but it's worth thinking about that design.

_zkyx · on June 30, 2023

Cool, thanks. I'll check it out!

marsupialtail_2 · on June 10, 2023

Perhaps I charged too little when I contracted away my 10x random forest inference solution...

marsupialtail_2 · on June 7, 2023

SQL support is very challenging.

I work on Quokka (https://github.com/marsupialtail/quokka). I support Iceberg reads. Recently we are adding SQL support from just parsing the DuckDB logical plan, though that is very challenging as well.

The Python world lacks a standard for a plug and play SQL query optimizer. Apache Calcite is good for the JVM world, but not great if you are trying to cut out the JVM.

marsupialtail_2 · on June 7, 2023

While we are on this topic, the challenge with data lakes for Python based projects like Daft and Quokka (what I work on) is the poor Python support for data lakes like Delta, Iceberg and Hudi. Delta has the best support but its Python API is consistently behind the Java ones. Iceberg doesn't support Python writes. Hudi doesn't support anything Python.

I have users demanding Iceberg writes and Hudi reads/writes. I don't know what to tell them, since I don't have the resources to add a reader/writer myself for those projects.

Hopefully as DuckDB becomes more popular we will see Python bindings for these popular data lake formats this year.