Hacker Newsnew | past | comments | ask | show | jobs | submit | more helloguillecl's commentslogin

OCR, VLM or LLM for such important use cases seems like a a problem we should not have in 2025.

The real solution would be to have machine readable data embedded in those PDFs, and have the table be built around that data.

We could then we actual machine readable financial statements or reports, much like our passports.


The problem is, you're coming from paper for these PDFs, and this is the step where you add that data.

While the world became much more digitized (for example, for any sale, I get a PDF and an XML version of my receipt, which is great), but not everything is coming from computers and made for humans.

We have hand written notes, printed documents, etc., and OCR has to solve this. On the other hand, desktop OCR applications like Prizmo and latest versions of macOS already have much better output quality when compared to these models. Also there are specialized free applications to extract tables from PDF files (PDF files are bunch of fonts and pixels, they have no information about layout, tables, etc.).

We have these tools, and they work well. Even there's venerable Tessaract, built to OCR scanned papers and have neural network layer for years. Yet, we still try to throw LLMs to everyhting and we cheer like 5 year olds when it does 20% of these systems, and act like this technology doesn't exist, for two decades.


The funny thing is that sometimes we need to machine-read documents produced by humans on machines, but the actual source is almost always machine-readable data.

Agree on the hand-written part.


> The funny thing is that sometimes we need to machine-read documents produced by humans on machines, but the actual source is almost always machine-readable data.

Yes, but it's not possible to connect all systems' backends with each other without some big ramifications, so here we are. :)


A lot of times you are OCRing documents from people who do not care about how easy it is for the reader to extract data. A common example is regulatory filings - the goal is to comply with the law, not help people read your data. Or perhaps it's from a source that sells the data or has copyright and doesn't want to make it easy for other people to use in ways besides their intention. etc.


Most of the time, if I want to rely on reviews for a restaurant or hotel choice, I go straight to the 1 to 3 star reviews.

If those are complains like "waiter was not nice", "they took 30 minutes to bring my food", I'll assume there's nothing to worry about.

My feeling is that bad reviews are more likely to be more trustworthy, because they are less likely to be faked (unless its the competition).


I’ve always relied heavily on Maps review but for me trust was broken few weeks ago. I went in a restaurant with thousands of reviews averaging 4.6. Food was good but not amazing, service was super kind and proactive but nothing special to deserve a 4.6 in Paris. We understood the trick at the end of the meal. Waiters kindly ask to scan a QR to make a review. You end up on a third-party landing page and if you select 4 or 5 stars you were redirected to Maps, otherwise they simply take your random-internet stars and buried them forever in this landing page. Clever.


Interesting idea for another type of score - what is the average if just taking into account 1-3 ?


I used Sublime for coding before switching to VS Code for coding, but I could never leave Sublime text for some use cases. For example:

The possibility to edit large SQL dump files, which I cannot even open in VS Code.


Ha, I,m cannot use st for large file editing, in this case vscode is more performant.


So a few months ago, after a buggy change to a stock update sent to the channel manager (external system for updating pricing and availability of hotel rooms) we lost 21.000 EUR in just 2 hours of the bug being deployed. We exported low season rates (today’s rates) for the higher seasons.

Luckily a human saw the error, when he saw a big flow of reservation coming in at anomalous peace.

From there on, I’m too scared to make changes to the 20-line function that produces the updates to be sent.

We had not a chance to cancel the reservations made through the channels (websites were you book the reservations).


> From there on, I’m too scared to make changes to the 20-line function that produces the updates to be sent.

Given enough regression testing coverage, any load-bearing function can be changed.

This sounds to me like your project had insufficient or non-existent software quality practices to let this slip through. Hopefully your team has done a post-mortem analysis to make this a learning experience.


> From there on, I’m too scared to make changes to the 20-line function that produces the updates to be sent.

Isn’t this where automated testing shines?


Indeed.


You need much better testing. And probably that 20 line function needs better documentation and may even benefit from being further broken out. I hope that it at least is a function that is 'pure', in other words that it does not reference any globals.


I love git and I find no reason to look for anything else. Most of the complains I hear from git is from people that could not learn its logic.

I think a lot of things could be built on top of git.


Part of fossils's reasoning is that they *DO* understand git's logic but they disagree with it.

See most of the points labelled with links to 2.5 plus the comment about showing what you actually did I no rewriting and lying about history, tyhe errors you made are often just as important as the correct way of doing things.


Agreed. Over 10 years of development using Git and I perhaps ran into an issue like 10 times, which is once a year on average.

I can't think of any other tool that hasn't given me headaches at least once a day.

Git is actually remarkable tech and its unfortunate that the prevailing hivemind opinion is that its bad.


> prevailing hivemind opinion is that its bad.

I don’t think that’s true. I think the hive mind is that it’s good, or many “haha sucks but I use it all the time.”

I work sort of near “data science” and there’s lots of no code/low code people wanting to do data science (or at least bill for it). And I’d say the hive mind in non-coders is that fit is bad. But I think that’s more related to all coding is bad and git is like step 1 to coding so it’s the first hard step they hit.


This is incredible. Congrats.


Question: How do I prove to the authorities that a subscriber has given consent?

I imagine that an attribute on my "users" table is not enough?


In any reasonable law, you would prove that your procedures require consent before you start sending the emails. If you have to prove things about a specific user, you are already on unreasonable land.

(But then, I have no idea what places have reasonable rules. I have never seen any with this specific failure for email, but IANAL and I haven't looked much.)


Actually this sounds like common law to me. But yes, this should be enough to me.

However, if I consent to a User Agreement, do you really think they keep a copy of the specific version of the User Agreement I accepted?


They almost certainly keep a copy of that specific version of the UA. They also very likely keep a log of you agreeing to it. And probably none of those would matter in a court (what you actually say on your site and how reasonable the document is certainly matter a lot more).

Anyway, UA acceptance does not require and does not imply in opt-in to your marketing emails.


“Double Opt In” is the way to go.

They sign up, then you send them an email and track when they hit the “I approve” link.


Watch out, if you get a HTTP GET request on the approve link, it could be the mail provider scanning for malware, not the user. You may need triple opt in :-)


The CLI interface of git is very good at enforcing you to be aware of the underlying model (local state and the repo’s history). This allows for the user to draw an accurate user-model in his understanding of the system.

Adding an “undo” command would be convenient, but it would hide the underlying potential away. This is something that should belong to a GUI client, and I still would want to know what is it actually doing.

Instead of hiding the abstractions behind a “friendlier” CLI, Git shows you its real power and that of VCSs in general, by having a lower-level API.

Not even the multiple GUIs built on top of Git, adventure themselves into hiding or dumbing down the abstractions that allow you to do quite complex things with the code and its history.


> Instead of hiding the abstractions behind a “friendlier” CLI, Git shows you its real power and that of VCSs in general, by having a lower-level API.

But that’s simply not true. Git is just inconsistent in how it does things.

Sometimes it wants you to deal with its inner state (typically the undo where you just need to move HEAD to a previous commit which might be only visible in the reflog). Sometimes it introduces complex concepts which would be simpler if they were just exposed for what they are (stashes are just temporary commits).

Honestly Git is a mess. It works but the UX has always been awful.


> The CLI interface of git is very good at enforcing you to be aware of the underlying model (local state and the repo’s history).

Awareness of state is sometimes important, sometimes you just need awareness of operations. And awareness of git's data structure does not necessarily map 1:1 with understanding the state of the code base. Not saying you’re wrong, just saying that this may or may not be a good thing. I’m personally leaning towards this could be improved.

Really good advanced tools should work well for common use cases, so you can master it incrementally. Git is extremely frictionous for certain simple tasks, which is evident if you look at stack overflow. If a user wants to do something very simple in human terms (like fix a typo), and you get a lecture about merkle trees and reflogs, then I think there is room for improvement.

The truly important question though, is whether the UI can be improved enough without changing the data model (ie can replace the UI without replacing your repos).


Git can have a powerful API and also support higher-level workflows; it just doesn't.

> Adding an “undo” command would be convenient, but it would hide the underlying potential away.

It would not hide any underlying potential away; see https://blog.waleedkhan.name/git-undo/ for a general-purpose CLI implementation.


Great! I love the philosophy around open and clear formats. Like I have said before, a second brain should be as open and reliable as possible.


Trello boards export to JSON, would you consider it open? OneNote notebooks are also an open and well-documented specification, as well as local first and backed by a very reliable company, which makes them just as open as Obsidian by those standards.

https://learn.microsoft.com/en-us/openspecs/office_file_form...


The Canvas JSON is not an export format, it is the file that the app actually reads and edits. Being explicitly MIT licensed also gives permission to other people/companies to build their own tools using that format.


Exporting is different than "being stored in". Since it does not represent the full state of the data, no.


OneNote and this canvas format might be equally open and interoperable, but it's a hard claim to justify that onenote notebooks are as open as a folder of mostly standard markdown (the two exceptions being wikilinks and embeds)


I peeked at the onenote format standard [1] and the obsidian canvas standard. The difference is hilarious. The onenote standard is painfully complex, provided as a .pdf, and binary to boot. Compare to an example obsidian canvas - this is obvious, text-based (I could read it with notepad++) and easy to understand just by reading it:

  {
   "nodes":[
    {"id":"6c711bf8c24c4f5b","x":-226,"y":-62,"width":400,"height":400,"type":"file","file":"testin/2022-10-14.md"},
    {"id":"4dd7d04cdd0b379c","x":-530,"y":-209,"width":250,"height":60,"type":"text","text":"this is a note"}
   ],
   "edges":[
    {"id":"0c589a4d6bbb06aa","fromNode":"4dd7d04cdd0b379c","fromSide":"bottom","toNode":"6c711bf8c24c4f5b","toSide":"left"},
    {"id":"eda9f3edb3ec232a","fromNode":"6c711bf8c24c4f5b","fromSide":"top","toNode":"4dd7d04cdd0b379c","toSide":"right"},
    {"id":"abf404722ba48c3b","fromNode":"4dd7d04cdd0b379c","fromSide":"top","toNode":"6c711bf8c24c4f5b","toSide":"right"}
   ]
  }

[1] https://interoperability.blob.core.windows.net/files/MS-ONE/...


The Drive [0] by Peter Attia. Smart, educational and possibly life-changing.

[0] - https://peterattiamd.com/podcast/


Episode #226 - "The science of happiness" was the best one this year:

https://peterattiamd.com/arthurbrooks/


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: