Hacker Newsnew | past | comments | ask | show | jobs | submit | quinnjh's commentslogin

I’ve been utilizing this too but I’m not sure it gets closer to truth, rather gives me more stuff to skim over and decide for myself.

Do you find you generally get a well reasoned outcome or do you also find the model stretching to come up with a take that aligns with your skepticism?


Was looking for the DynaDraw shoutout. As a calligrapher, it’s the way to go for something more expressive than fixed lag.

Haeberli used a simple simulation of Hookes law, Where F=-kx F is the force applied to the spring. k is spring constant or stiffness. x is extension distance.

DynaDraw also added damping IIRC

Thx for the links


I don’t think people are suggesting : Build a renderer > build an ocr pipeline > run it on pdfs

I think people are suggesting : Use a readymade renderer > use readymade OCR pipelines/apis > run it on pdfs

A colleague uses a document scanner to create a pdf of a document and sends it to you

You must return the data represented in it retaining as much structure as possible

How would you proceed? Return just the metadata of when the scan was made and how?

Genuinely wondering


You can use an existing readymade renderer to render into structured data instead of raster.


Just to illustrate this point, poppler [1] (which is the most popular pdf renderer in open source) has a little tool called pdf2cairo [2] which can render a pdf into a svg. This means you can delegate all pdf rendering to poppler and only work with actual graphical objects to extract semantics.

I think the reason this method is not popular is that there are still many ways to encode a semantic object graphically. A sentence can be broken down into words or letters. Table lines can be formed from multiple smaller lines, etc. But, as mentioned by the parent, rule based systems works reasonably well for reasonably focused problems. But you will never have a general purpose extractor since rules needs to be written by humans.

[1] https://poppler.freedesktop.org/ [2] https://gitlab.freedesktop.org/poppler/poppler/-/blob/master...


There is also PDF to HTML, PDF to Text, MuPDF also has PDF to XML, both projects along with a bucketful of other PDF toolkits have PDF to PS, and there is many many XML, HTML, and Text outputs for PS.

Rastering and OCR'ing PDF is like using regex to parse XHTML. My eyes are starting to bleed out, I am done here.


It looks like you make a lot of valid points, but also have an extremely visceral reaction because theres a company out there thats using AI in a way that offends you. I mean fair still.

But im a guy who's in the market for a pdf parser service, im happy to pay pretty penny per page processed. I just want a service that works without me thinking for a second about any of the problems you guys are all discussing. What service do I use? Do I care if it uses AI in the lamest way possible? The only thing that matters are the results. There are two people including you in this thread ramming with pdf parsing gyan but from reading it all, it doesn't look like I can do things the right way without spending months fully immersed in this problem alone. If you or anyone has a non blunt AI service that I can use Ill be glad to check it out.


It is a hard problem, yes, but you don't solve it by rastering it, OCR, and then using AI. You render it into a structured format. Then at least you don't have to worry about hallucinations, fancy fonts OCR problems, text shaping problems, huge waste of GPU and CPU to paint an image only to OCR it and throw it away.

Use a solution that renders PDF into structured data if you want correct and reliable data.


pdftotext from poppler has that without doing juggling with formats.


Seems like you need a way to dictate structured workflows, in lieu of actually being able to train them up as soc analyst. Sounds like a fun problem!


This article is a great introduction to the topic of indoor (or rather in-wall) beehives, which I was curious about after seeing a father-son duo construct an impressive setup with hexagonal 3D printed enclosures. The authors voice is very enjoyable. Give it a read if you have a few mins


Forgive me if you’re already familiar, but if you’re interested in this metaphor you may like reading Stafford Beer’s work on organizational and system models. (1959, 1972)


The interesting thing to me about Beer's work, is that the complexity of control systems, is that it's not directed, it's inherent.

This leads me to assumptions about the inherent nature of language, in that it is not the contents of thought, but the vehicle.

Hence, different languages create their culture to perhaps a much larger degree than we tend to credit it.

Assumptions, but I tend towards this view.


This is, intuitively, a really exciting title. Looking forward to reading / seeing similar work.


Interesting question. I ran some quick numbers on Google, take last years profit for uber ($10B) and naively distribute it between all drivers (est 7M) and it comes out to a raise of 1,428. Most drivers would probably be enticed by that, but it’s not going to radically alter their situation. Also, Ubers board and investors are compelled via profit, as the purpose of a company is generating profit for its shareholders.

Would be great to see something like you suggest take off and gain national appeal, but I suspect that VC money helps with getting billboards/ newspaper ads/ etc. You would need to craft a new sort of coop model that can generate enough profits at the top to be an attractive investment but also share some. AFAIK this works best in small teams doing high value work.


That sounds like the number of worldwide drivers, and I imagine some of them would be super happy with a $1400 bonus.


> What’s to stop an attacker from using prompt injection against this firewall?

Clearly you need a firewall-firewall.

..defense in depth?


We'll soon be adding the ability to have multiple models perform the scan in parallel, so any attack would have to bypass all of the models.


So literally a firewall-firewall?


Super cool ! What OCR are you using for confidence ? Tesseract? Curious how you flag for Humam in the loop.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: