I'm working on a new compontent for viewing PDFs in original format and structure but show text highlighting while a specific piece of the PDF is being played in the TTS engine.
This for my app (https://with.audio). Which already supports PDF parsing and TTS of PDF files. WithAudio currently converts the input PDF to Markdown and performs TTS and synchronized text highlighting on the Markdown content. I want to do this on the original rendered PDF content itself.
Initial results are promosing Extracting the text and figuring out which lines belong to the same paragraph and then try to map those to the original positions in the PDF...
this is not good but the point is this can be improved much easier than improving human accidents rate. Both are very difficult problems, but one is certainly harder.
That's not really true. There are huge discrepancies in human human driver accident data across different countries, which shows that there are clear practices one could deploy to significantly reduce driving incidents - people just choose to not implement them.
When it comes to cost rebuilding, we can't compare the software engineering with other industries(like cards or buildings). I think this makes it much more iterative compared to them.
Living in North America, if feel like 99% of aparatments and houses can be grouped into 5-10 floor plans. I think thats because when you are designing a new building or house you really can't do much risk. You do what has already worked. Software also have trends, but they change so often.
You also can't do A/B testing or targeting or measure every single interaction potential customer has with your product.
The nature of building "Software" really brings so many options to the table which increases the number of iterations by order of magnitude.
This is amazing. The audio feels very natural and it's fairly good at handling complext text to speech tasks.
I've been working on WithAudio (https://with.audio). Currently it only uses Kokoros. I need to test this a bit more but I might actually add it to the app. It's too good to be ignored.
A one time payment app - interesting (I'm also working on something with similar moneytization solution).
How are things going? I'd love to know the experience of another solopreneur, what stack are you using?
I wonder
- What are you using to parse PDFs and extract the text? I found that is a nightmare when was doing something similar for WithAudio (my app).
- Are you just extracting the text or you are doing any post processing to identify which lines belong to the same paragraph or not?
Things are going slow, but it is a passion project so it's ok :)
A few people have bought a licence and it seems most people who try the app are very happy with it so I'm happy too.
The app is entirely in Java, with javaFX for the UI and Lucene for the search engine. To read and render PDFs I use PDFium.
My project is WithAudio and is gonna be WithAudio for a while. Its a text to speech reader. Initially I decided to generate pargraph by paragaph. But that was not a great call as users sometimes might have to wait for the whole paragraph to be ready before they can listen.
Now I'm working on changing it to sentence by sentence. I think that + adding 2 new languages would take most of my January's budgeted time.
As its obvious in my profile, WithAudio is my personal project that started in 2025. Starting Q3 I decided to plan it quarterly and write a summary every month.
Looking at an empty screen is difficult for me, so I just did `git log --pretty=oneline --since="3 months ago" and gave the result to Claud and a reference to the other blog post to get a starting point. Then I read the whole thing 3 times and edited that. So AI is used in writing this but this is not written by AI.
2025 I'm proud of my 2025. I lost weight with a slow pace. I launched one of my projects and built on top of it and I'm currently at 99 items sold. Hopefull it will get to 100 by end of the year. Not gonna lie, bad stuff happened too, but I'm not gonna focus on those.
2026 If everything goes as planned, I'm hoping to see my my mother and brother after 4 years. I want live a bit healthier, and I want to grow the same side project. I really hope I scroll less shitty content.
This is the community I feel safe the most in and really helps me keep going everyday.
Hey! this looks cool. I'm curious how it works, can you share more?
you said it's not a subscription and it's "pay for what you use". So hows the pricing work? Do you charge me or I'll bring an API key to some other TTS provider?
Initial results are promosing Extracting the text and figuring out which lines belong to the same paragraph and then try to map those to the original positions in the PDF...