Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

We're working with an enterprise customer on exactly this problem. The hardest part is entity resolution - figuring out who "Acme Inc" actually is from messy transaction data and what they do.

We built an AI agent specifically for this that's backed by 265M legal entities. Last week it tested 160% better than our customer's existing system on their real data.

Still in stealth but happy to share our API docs if anyone's dealing with this: https://docs.savvyiq.ai/api-reference/#tag/entity-resolution

Open to chat about this problem if anyone wants to connect - email is in my HN profile.

(Disclosure: I'm the CTO)



We solved this at Ramp on the expenses/AP side with an agentic RAG implementation and a custom embedding model, backed by D&B/Google/user-submitted corrections.

If curious, details here:

https://engineering.ramp.com/post/transaction-embeddings

https://engineering.ramp.com/post/fixing-merchant-classifica...


Very cool. I read through those links - really sophisticated setup. We're experimenting with something similar on the embeddings side.

Having dealt with this challenge at my last 3 companies, it's easy to hack together something that works most of the time. The hard part is dealing with gnarly customer inputs, the long tail of private businesses globally, and getting close to 100% accuracy (important for legal and risk use cases).

We're building what's essentially an AI-powered version of D&B - combining government registrar data globally with real-time web data at scale. Much more accurate on obscure entities and way faster updates than the legacy providers.

I actually shot you an email - would love to chat more about this if you're up for it.


entity resolution is the killer feature. context engineering is the problem with this benchmark attempt. The agent plan seemed to one shot, and the fact that the LLMs could write their own tools without validation or specific multi shot examples is worrisome. To me way to much left to the whims of the llms - with out proper context.


Yes, none of the top LLMs can do entity resolution well yet. I constantly see them conflate entities with similar names - they'll confidently cite 3 sources about what appears to be one company, but the sources are actually about 3 different businesses with similar names.

The fundamental issue is that LLMs don't have a concept of canonical entity identity. They pattern match on text similarity rather than understanding that "Apple Inc" and "Apple Records" are completely different entities. It gets even worse when you realize companies can legally have identical names in the same country - text matching becomes completely unreliable.

Without proper entity grounding, any business logic built on top becomes unreliable.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: