Hacker Newsnew | past | comments | ask | show | jobs | submit | yigitkonur35's commentslogin

i've heard the boom of 'vibe coded' apps seems like apple's just overwhelmed and responding by tightening up, which makes it harder for everyone, even legit projects yours. btw, product looks really nice - hope that you get a good traction.


Thank you :)


shows how bad embeddings are in a practical way


finally someone has made something like n8n that’s easy to observe but also offers coding flexibility, i was just about to dive into making an agent with a prompt chain that generates workflows on n8n when i found you. if llms.txt becomes a bit more practical, it’s written simply enough to have the potential to go from prompt to agent. shadcn level simplicity for writing agent is needed. if the right decisions are made, this product may be the new toolkit of the market is looking for.


Wow, thank you for this thoughtful message. You really got what we’re trying to build. We’re pushing hard to bring shadcn-level simplicity to agents. Your feedback means a lot!


great stuff! nice to see that remotion is becoming more popular on such projects.


I really appreciate you sharing your hands-on experience with a real-world scenario. It's interesting how people unfamiliar with traditional OCR often doubt LLMs, but having worked with actual documents, I know how inefficient classic OCR methods can be. So these minor errors don't alarm me at all. Your use case sounds fascinating - I might just incorporate it into my own benchmarks. Thanks again for your insightful comment!


I've found this method really useful for prepping PDFs before running them through AI. I mix it with traditional OCR for a hybrid approach. It's a game-changer for getting info from tricky pages. Sure, you wouldn't bet the farm on it for a big, official project, but it's still pretty solid. If you're willing to spend a bit more, you can use extra prompts to check for any context skips. It's a lot of work, though - probably best left to companies that specialize in this stuff.

I've been testing it out on pitch decks made in Figma and saved as JPGs. Surprisingly, the LLM OCR outperformed top dogs like SolidDocuments and PDFtron. Since I'm mainly after getting good context for the LLM from PDFs, I've been using this hybrid setup, bringing in the LLM OCR for pages that need it. In my book, this API is perfect for these kinds of situations.


You're absolutely right. I use PDFTron (through CloudCovert) for full document OCR, but for pages with fewer than 100 characters, I switch to this API. It's a great combo – I get the solid OCR performance of SolidDocument for most content, but I can also handle tricky stuff like stats, old-fashioned text, or handwriting that regular OCR struggles with. That's why I added page numbers upfront.


I ought to test this with Sonnet too and compare the results. I feel it might perform better on OCR tasks. While I went with Azure OpenAI due to fewer rate restrictions, you've got a point - Sonnet could really shine here.


You're spot on. We shouldn't lump all LLMs together. This approach might work wonders for Anthropic and OpenAI's top-tier models, but it could fall flat with smaller, less complex ones.

I purposely set the temperature to 0.1, thinking the LLM might need a little wiggle room when whipping up those markdown tables. You know, just enough leeway to get creative if needed.


Yes, you can customize this as you wish by adding it to your prompt.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: