More

ddematheu · on Aug 14, 2024

Interesting performance with GPT 3.5, what does performance on Llama look like? What about smaller models like Llama 3.1?

hellovai · on Aug 14, 2024

We have some preliminary data with llama3.1 and we find that the smaller model gets to around 70% with BAML (+20% from base), but we'll update this dashboard with llama3.1 by end of week!

ddematheu · on Nov 22, 2023

I don't disagree with all your points. That said, what we have built has proven useful for us as we have built pipelines for customers and think it might be useful for others.

Probably the main point I disagree with you is that RAG is just ETL. If that was the case, all of the AI apps people are building would be AMAZING because we solved the ETL problem years ago. Yet, app after app being released have issues like hallucinations and incorrect data. IMO the second you insert a non-deterministic entity in the middle of an ETL pipeline, it is no longer just ETL. To try to add value here, our focus has been on adding capabilities to the framework around data synchronization (which is actually more of a vector management problem), contextualization of data through metadata and retrieval (this part being were we have spent the least time to date, but are currently spending the most)

ddematheu · on Nov 22, 2023

LlamaIndex is pretty awesome.

There are a couple areas where we think we are driving some differentiation.

1. The management of metadata as a first class citizen. This includes capturing metadata at every stage of the pipeline.

2. Be infra ready. We are still evolving this point, but we want to add abstractions that can help developers apply this type of framework to a large scale distributed architecture.

3. Enable different types of data synchronization natively. So far we enable both full and delta syncs, but have work in the pipeline to bring in abstractions for real-time syncing. 3.

ddematheu · on Nov 21, 2023

Yeah, we were playing around with doing some semantic chunking. Works okay for some use cases. We have some ideas to go further on that.

Generally we have found that recursive chunking and character chunking tend to be short sighted.

hrpnk · on Nov 21, 2023

Don't you find it dangerous to just run the code w/o any sanitizing?

Why not capture a few strategies that the LLM returns as code that can be properly audited (and ran locally improving the overall performance)?

ddematheu · on Nov 21, 2023

It is dangerous, part of the reason that we haven't productized that further. One of the ideas we had to productize the capabilities further was to leverage edge / lambda functions to compartmentalize the code generated. (Plus it becomes a general extensibility for folks that are not using semantic code generation and simply want to write their own code.)

The idea of auditing the strategy is interesting. The flow that we have used for the semantic chunkers up to date has been along these lines where we : 1) Use the utility to generate the code snippets (and do some manual inspection) 2) Test the code snippets against some sample text 3) Validate the results

treprinum · on Nov 22, 2023

Why not use Stanford Stanza?

ddematheu · on Nov 21, 2023

Haven't connected.

ddematheu · on Nov 21, 2023

Co-founder here :)

Today, it is mostly about convenience. We provide abstractions in the form of a pipeline that encompasses a data source, embed and sink definition. This means that you don't have to think about embedding your query or what class you used to add the data into the vector DB.

In the future, we have some additional abstractions that we are adding that will add more convenience. For example, we are working on a concept of pipeline collections so that you can search across multiple indexes but get unified results. We are also adding more automation around metadata given that as part of the pipeline configuration we know what metadata was added and examples of it, so we can help translate queries into hybrid search. I think about it as a self-query retriever from Langchain or Llama Index but that automatically has context of the data at hand. (no need to provide attributes)

Are there any specific retrieval capabilities you are looking for?

ddematheu · on Oct 31, 2023

Lies or not lies, the point was the train on the authentic message that the candidate wanted to provide. Try to be as unbiased as possible.

verdverm · on Oct 31, 2023

are you doing any audio / video transcription?

ddematheu · on Oct 31, 2023

How real-time is it? Just app or API?

ddematheu · on Oct 10, 2023

Some engineers find it fun, other might not. Same as everything.

IMO the fun parts are actually prototyping and figuring out the right pattern I want to use for my solution. Once you have done that, scaling and dealing with robustness tends to be a bit less fun.

ddematheu · on Oct 10, 2023

What about then sucked?