In our experience building louie.ai for a continuous learning variant of text2query (and for popular DBs beyond SQL), getting syntax right via a symbolic lint phase is a nice speedup, but not the a correctness issue. For syntax, bigger LLMs are generally right on the first shot, and an agent loop autocorrects quickly when the DB gives a syntax error.
Much more time for us goes to things like:
* Getting the right table, column name spelling
* Disambiguating typos when users define names, and deciding whether they mean a specific name or are using a shorthand
* Disambiguating selection when there are multiple for the same thing: hint - this needs to be learned from usage, not by static schema analysis
* Guard rails, such as on perf
* Translation from non-technical user concepts to analyst concepts
* Enterprise DB schemas are generally large and often blow out the LLM context window, or make things slow, expensive, and lossy if you rely on giant context windows
* Learning and team modes so the model improves over time. User teaching interfaces are especially tricky once you expose them - learning fuzzy vs explicit modes, avoid data leakage, ... .
* A lot of power comes from being part of an agentic loop with other tools like Python and charting, which creates a 'composition' problem that requires AI optimization across any sub-AIs
We have been considering OSS this layer of louie.ai, but it hasn't been a priority for our customers, who are the analyst orgs using our UIs on top (Splunk, OpenSearch, Neo4j, Databricks, ...), and occasionally building their own internal tools in top of our API. Our focus has been building a sustainable and high quality project, and these OSS projects seem to be very different to sustain without also solving that, which is hard enough as-is..
Much more time for us goes to things like:
* Getting the right table, column name spelling
* Disambiguating typos when users define names, and deciding whether they mean a specific name or are using a shorthand
* Disambiguating selection when there are multiple for the same thing: hint - this needs to be learned from usage, not by static schema analysis
* Guard rails, such as on perf
* Translation from non-technical user concepts to analyst concepts
* Enterprise DB schemas are generally large and often blow out the LLM context window, or make things slow, expensive, and lossy if you rely on giant context windows
* Learning and team modes so the model improves over time. User teaching interfaces are especially tricky once you expose them - learning fuzzy vs explicit modes, avoid data leakage, ... .
* A lot of power comes from being part of an agentic loop with other tools like Python and charting, which creates a 'composition' problem that requires AI optimization across any sub-AIs
We have been considering OSS this layer of louie.ai, but it hasn't been a priority for our customers, who are the analyst orgs using our UIs on top (Splunk, OpenSearch, Neo4j, Databricks, ...), and occasionally building their own internal tools in top of our API. Our focus has been building a sustainable and high quality project, and these OSS projects seem to be very different to sustain without also solving that, which is hard enough as-is..