The town of Görlitz is truly charming. I visited after discovering that many scenes from my favorite movie, The Grand Budapest Hotel, were filmed there (https://www.discovergoerlitz.com/grandbudapest/)
How does it compare to Datalab/Marker https://github.com/datalab-to/marker ? We evaluated many PDF->MD converters and this one performed the best, though it is not perfect.
As anecdotal evidence, it serves my complex-enough purposes very well - mathematics and code interspersed together. One of my "litmus test" papers is this old paper on a Fortran inverse-Laplace transform algorithm [1] that intersperses inline and display equations, and monospace code blocks, while requiring OCR from scratch, and very few models currently do a satisfactory job, i.e. in the following page transcribed by Marker,
I assume you're using a PDF, and not the image you shared? You need to set force ocr or format lines to get inline math with a PDF (for images, we just OCR everything anyways, so you don't need any settings).
We're working on improving the playground generally now - expect a big update tomorrow, which among other things will default to format lines.
Thanks for the kind words! The team was just me until pretty recently, but we're growing quickly and will be addressing a lot of issues quickly in the next few weeks.
Perfect - it works!
Yes, I’m glad for all the time you’ve spent on this project: one of my ulterior goals is to make technical documentation for old systems and their programming environments accessible to LLMs, so that programming in retro computing can benefit from the advances in productivity that modern languages have. I’m sure you’ll find plenty of other user stories like that :)
In my experience Uber/Lyft/Bolt in their race to the bottom started tolerating cars in bad shape and drivers that don’t care about driving safely. Really hoping to see Waymo or any other robo-taxi in Europe soon.
" 3.1.3(a) “Reader” Apps: Apps may allow a user to access previously purchased content or content subscriptions (specifically: magazines, newspapers, books, audio, music, and video). ... Reader app developers may apply for the External Link Account Entitlement to provide an informational link in their app to a web site ...
It reminded me a part of the "The Tar Pit" esssay from The Mythical Man-month:
> Why is programming fun? What delights may its practitioner expect as his reward?
> First is the sheer joy of making things. As the child delights in his mud pie, so the adult enjoys building things, especially things of his own design.
...
> The programmer, like the poet, works only slightly removed from pure thought-stuff. He builds his castles in the air, from air, creating by the exertion of the imagination.
Great progress, but unfortunately, for our use case (converting medical textbooks from PDF to MD), the results are not as good as those by MinerU/PDF-Extract-Kit [1].
Also the collab link in the article is broken, found a functional one [2] in the docs.
Yes, i have. The problem with using just an LLM is that while it reads and understands text, but it cannot reproduce it accurately. Additionaly the textbooks I've mentioned have many diagrams and illustrations in them (e.g. books on anatomy or biochemistry). I don't really care about extracting text from them, I just need them extracted as images alongside the text, and no LLM does that.
Very sad article to read. I just expect the original Flutter will now die a slow death. I applaud the effort and hope they will be able to find a monetization model that works to support the development. There have been similar projects based on technologies that Microsoft has killed (Silverlight -> OpenSilver and various .NET-based cross-platform technologies) that serve their customers well. Unfortunately, none of them are in any way "mainstream" like Flutter is today.
No, it transitioned it to others after signaling it would for well over a year, so i don't think you could call it sudden:
"In 2011 with the introduction of the Dart programming language, Google stated that GWT would continue to be supported for the foreseeable future while also hinting at a possible rapprochement between the two Google approaches to structured web programming. However, they also mentioned that several of the engineers previously working on GWT are now working on Dart.[6]
In 2012 at their annual I/O conference, Google announced that GWT would be transformed from a Google project to a fully open-sourced project.[7]
In July 2013, Google posted on its GWT blog that the transformation to an open-source project was completed.[8]"
Google funded/helped for some number of years after that.
It still is going, afaik, with gwt 2.11 being released in january, 2024.
That might be the wrong way to interpret this. It actually validates interest in the framework. This change personally excited me and injects some change into the ecosystem that could result in a better future.
Don't you think it's also a problem of the corporate-kindergarten culture, where everyone and everything is considered great, even though, as a salesperson, they haven't closed any deals in the last four months?
We don't have any context to assess that. Depending on the complexity of the product and target customer sales cycle it can be perfectly normal to not close a deal in your first 4 months (one of which is December).