You'll get 80% of the benefit just by looking at word frequency, highlighting ou... | Hacker News

Hacker Newsnew | past | comments | ask | show | jobs | submit

		radicalbyte on Aug 6, 2013 \| parent \| context \| favorite \| on: SimpleLegal (YC S13) Reduces Legal Bills With Mach... You'll get 80% of the benefit just by looking at word frequency, highlighting outliers and then a weight based on factors such as length and secret-sauce weighting. Bonus points if you're using multiple categorizations (using different weights for different industries). NLP / statistical stuff is fun ;) Are you scanning / OCRing the documents? I never managed to get the OCR to be good enough for invoicing, there always had to be a manual process to fix the (machine-learning-flagged) errors. Or don't you need accurate-to-the-cent invoices?

nwenzel on Aug 6, 2013 [–]

Word frequency is in use at many larger insurance companies today. You can certainly find problematic bills with word frequency and the hours billed, but you end up with a lot of false positives so you still have to manually review everything. We get in deeper than word frequency.

And, yes! NLP + statistics is fun!

Consider applying for YC's Winter 2026 batch! Applications are open till Nov 10
Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact