Prophecy.io | Think Cursor for Data Analysts | US (onsite SF, remote)
Prophecy is used by data engineers in the largest enterprises (SeriesB, $120M raised). Our new product for data analysts is agentic and multi-modal (docs, low-code, code).
- AI Research Engineer - AI Agentic Software Engineer
Looking for top tier AI engineers who are excited to ship products at tremendous pace
email me - raj at prophecy dot io
Yes, but in addition to being ACID compliant (serializable ANSI), you need to support a million transactions per second - it’s not just about data size.
Just saw the video you shared on the other comment using prophecy, very cool
Generally I don’t care much about the embedding and retrieval and connectors etc for playing with the LLMs, I imagined much more robust tools were available already indeed, my focus was more on the prompt development actually, connecting many prompts together for a better chain of thought kinda of thing, working out the memory and stateful parts of it and so on, and I think there might be a case for an “LLM framework” for that, and also a case for a small lib to solve it instead of an ETL cannon
However, I am indeed not experienced with ETLs, have to play more with the available tools to see if and how can I do the things I was building using them
It is pointless - LlamaIndex and LangChain are re-inventing ETL - why use them when you have robust technology already?
1. You ETL your documents into a vector database - you run this pipeline everyday to keep it up to date. You can run scalable, robust pipelines on Spark for this.
2. You have a streaming inference pipeline that has components that make API calls (agents) and between them transform data. This is Spark streaming.
Prophecy is working with large enterprises to implement generative AI use cases, but they don’t talk so much on HN.
We also do platform & customer work there (cool pipelines to feed louie.ai or real-time headless versions), and agreed those pipelines have simple uses of LLM where langchain is mostly useful just for a vendor neutrality. Think BYO LLM as it is now a zoo. Basically apache nifi or spark streaming with simple LLM & vector DB call outs. Our harder work here is more at the data engineering level.
But....a lot of our louie.ai work happens for less trivial scenarios where it isn't just the ETL NLP 2.0 tier . That logic is much more complicated, so structured programming abstractions matter a LOT more for AI-style business logic. Think talk to your data and generate on-the-fly analytics pushdown with an interactive data viz UI. That's.. a lot of code.
I agree that it's a little silly, but I mostly use it to abstract over BYO LLMs and extract information from documents. It's nice to be able to quickly prototype something and swap out the underlying language model than set up a whole pipeline with Apache Tika, ETL, etc. Once the idea is feasible, then sure.
That said, langchain is really inefficient and I often find I can re-implement the pieces I need much faster than dealing with langchain's bugs and performance issues.
That’s assuming you’re not using low-code. There are inbuilt connectors to read data, transform data, read/write to pinecone, make api calls to LLMs. It is much faster to prototype with Prophecy.io
Yes but this is just ETL - LlamaIndex and LangChain are re-inventing it - why use them when you have robust technology already?
1. You ETL your documents into a vector database - you run this pipeline everyday to keep it up to date. You can run scalable, robust pipelines on Spark for this.
2. You have a streaming inference pipeline that has components that make API calls (agents) and between them transform data. This is Spark streaming.
Prophecy is working with large enterprises to implement generative AI use cases, but they don’t talk so much on HN. Here’s our talk from Data+AI Summit:
Build a Generative AI App on Enterprise Data in 13 Minutes
Cool! Lets say I have thousands of documents that I want questions and answers for. Would your solution work for this? I wouldn’t know which documents to send with the prompts though as I want info on the aggregate (like trends and most mentioned phrases or words).
A lot of B2B startups can technically the cloud API to provide value added applications to Enterprises, but often the banks and healthcare companies will not want their data running through startups pipes to OpenAI pipes.
We provide a low code data transformation product (prophecy.io), and we’ll never close sales at any volume, if we have a to get an MSA that approves this. Might get easier if we become large :)
Your didn’t answer my question. If I had an account in SVB, and I put a 100k in it, why would SVB take my 100k and invest it? That was the whole issue with FTX. Investing it whatever asset class, my money shouldn’t have been invested unless I give permission to SVB period.
It’s how literally every bank has worked for all of time. They have to cover interest you are payed for keeping your money at the bank. Where do you think that money comes from? They take the money people deposit, and invest it, either through loans to others, or through other investment vehicles, like treasury bonds in this case. When you put your money in a bank, you are giving them permission to reinvest it somehow. If you just want your money to sit there, your only option is to start stuffing wads of bills under your mattress.