We're working with an enterprise customer on exactly this problem. The hardest p...

yunyu · 2025-07-22T14:41:08 1753195268

We solved this at Ramp on the expenses/AP side with an agentic RAG implementation and a custom embedding model, backed by D&B/Google/user-submitted corrections.

If curious, details here:

https://engineering.ramp.com/post/transaction-embeddings

https://engineering.ramp.com/post/fixing-merchant-classifica...

mfrye0 · 2025-07-22T18:12:09 1753207929

Very cool. I read through those links - really sophisticated setup. We're experimenting with something similar on the embeddings side.

Having dealt with this challenge at my last 3 companies, it's easy to hack together something that works most of the time. The hard part is dealing with gnarly customer inputs, the long tail of private businesses globally, and getting close to 100% accuracy (important for legal and risk use cases).

We're building what's essentially an AI-powered version of D&B - combining government registrar data globally with real-time web data at scale. Much more accurate on obscure entities and way faster updates than the legacy providers.

I actually shot you an email - would love to chat more about this if you're up for it.

DrStartup · 2025-07-22T08:26:20 1753172780

entity resolution is the killer feature. context engineering is the problem with this benchmark attempt. The agent plan seemed to one shot, and the fact that the LLMs could write their own tools without validation or specific multi shot examples is worrisome. To me way to much left to the whims of the llms - with out proper context.

mfrye0 · 2025-07-22T18:19:48 1753208388

Yes, none of the top LLMs can do entity resolution well yet. I constantly see them conflate entities with similar names - they'll confidently cite 3 sources about what appears to be one company, but the sources are actually about 3 different businesses with similar names.

The fundamental issue is that LLMs don't have a concept of canonical entity identity. They pattern match on text similarity rather than understanding that "Apple Inc" and "Apple Records" are completely different entities. It gets even worse when you realize companies can legally have identical names in the same country - text matching becomes completely unreliable.

Without proper entity grounding, any business logic built on top becomes unreliable.