We have some preliminary data with llama3.1 and we find that the smaller model gets to around 70% with BAML (+20% from base), but we'll update this dashboard with llama3.1 by end of week!
I don't disagree with all your points. That said, what we have built has proven useful for us as we have built pipelines for customers and think it might be useful for others.
Probably the main point I disagree with you is that RAG is just ETL. If that was the case, all of the AI apps people are building would be AMAZING because we solved the ETL problem years ago. Yet, app after app being released have issues like hallucinations and incorrect data. IMO the second you insert a non-deterministic entity in the middle of an ETL pipeline, it is no longer just ETL. To try to add value here, our focus has been on adding capabilities to the framework around data synchronization (which is actually more of a vector management problem), contextualization of data through metadata and retrieval (this part being were we have spent the least time to date, but are currently spending the most)
There are a couple areas where we think we are driving some differentiation.
1. The management of metadata as a first class citizen. This includes capturing metadata at every stage of the pipeline.
2. Be infra ready. We are still evolving this point, but we want to add abstractions that can help developers apply this type of framework to a large scale distributed architecture.
3. Enable different types of data synchronization natively. So far we enable both full and delta syncs, but have work in the pipeline to bring in abstractions for real-time syncing.
3.
It is dangerous, part of the reason that we haven't productized that further. One of the ideas we had to productize the capabilities further was to leverage edge / lambda functions to compartmentalize the code generated. (Plus it becomes a general extensibility for folks that are not using semantic code generation and simply want to write their own code.)
The idea of auditing the strategy is interesting. The flow that we have used for the semantic chunkers up to date has been along these lines where we :
1) Use the utility to generate the code snippets (and do some manual inspection)
2) Test the code snippets against some sample text
3) Validate the results
Today, it is mostly about convenience. We provide abstractions in the form of a pipeline that encompasses a data source, embed and sink definition. This means that you don't have to think about embedding your query or what class you used to add the data into the vector DB.
In the future, we have some additional abstractions that we are adding that will add more convenience. For example, we are working on a concept of pipeline collections so that you can search across multiple indexes but get unified results. We are also adding more automation around metadata given that as part of the pipeline configuration we know what metadata was added and examples of it, so we can help translate queries into hybrid search. I think about it as a self-query retriever from Langchain or Llama Index but that automatically has context of the data at hand. (no need to provide attributes)
Are there any specific retrieval capabilities you are looking for?
Some engineers find it fun, other might not. Same as everything.
IMO the fun parts are actually prototyping and figuring out the right pattern I want to use for my solution. Once you have done that, scaling and dealing with robustness tends to be a bit less fun.