And sadly it is also not accessible to screen readers. VS Code for all its flaws is really, really good for screen reader accessibility. In fact, I'd go as far as to say that it's not only one of the most accessible code editors we have, but one of the most accessible electron apps overall. So losing it to this Microsoft stuff would be a huge deal to anyone who relies on screen reader or accessibility tools. :(
I just cracked open osx voice over for the first time in a while and hoo boy, you weren't kidding. I wonder if you could still "stun" an LLM with this technique while also using some aria-* tags so the original text isn't so incredibly hostile to screen readers. Regardless I think as neat as this tool is, it's an awful pattern and hopefully no one uses it except as part of bot capture stuff.
Do screen readers fall back to OCR by now? I could imagine that being critical based on the large amount of text in raster images (often used for bad reasons) on the Internet alone.
Sounds like a potentially useful improvement then.
I've had more success exporting text from some PDFs (not scanned pages, but just text typeset using some extremely cursed process that breaks accessibility) that way than via "normal" PDF-to-text methods.
no, it is not. simple ocr is slow and much more expensive than an api call to the given process. on the positive side, it is also error prone and cannot follow the focus in real time. no, adding ai does not make it better. AI is useful when everything else fails and it is word waiting 10 seconds for an incomplete and partially hallucinated screen description.
Huh? Running a powerful LLM over a screenshot can take longer, but for example macOS's/iOS's default "extract text" feature has been pretty much instant for me.
is "pretty much instant" true when jumping between buttons, partially saying what you are landing on while looking for something else? can it represent a gui in enough detail to navigate it, open combo boxes, multy selects and whatever? can it make a difference between an image of a button and the button itself? can it move fast enough so that you can edit text while moving back and forth? ocr with possible prefetch is not the same as object recognition and manipulation. screen readers are not text readers, they create a model of the screen which could be navigated and interacted with. modern screen readers have ocr capabilities. they have ai addons as well. still, having the information ready to serve in a manner that allows followup action is much better.
Oh, I don't doubt at all that it's a measure of last resort, and I am indeed not familiar with the screen reader context.
I was mostly wondering how well my experience with human-but-not-machine-readable PDFs transferred to that domain, and surprised that OCR performance is still an issue.
Just as a quick datapoint here in case people get worried; yes, it is absolutely possible to program as a blind person, even without language models. Obviously you won't be using your eyes for it, but we have tried and tested tools that help and work. And at the end of the day, someone's going to have to review the code that gets written, so either way, you're not going to get around learning those tools.
Source: Am a blind person coding for many years before language models existed.
Thank you for sharing your experience. It provides me a bit of comfort to know it's possible for me to keep coding in the event of vision loss, and I'm glad tools exist for people that are blind.
A part of me wants to start using the available tools just to expand my modalities of interfacing with technology. If you have the time, any recommendations? What do you use?
The core thing here is the "I'm sorry you feel this way". This immediately deflects all sense of wrong-doing from the people actually doing wrong to the people feeling hurt. There are so many other ways to phrase this that are either more neutral or even acknowledging of some kind of mistake being made that's not on the volunteer's side, but that's not what's happening here. Essentially this means "We did the right thing and now we need to figure out how to make you understand this", not "Something went wrong and we need to figure out how to come to an understanding which might include us having done something wrong".
I'd love to test this, however it seems to not be accessible with screen readers. I assume this is because of the GUI library not supporting accessibility. I found an open issue about this on the Iced GitHub where in 2024 it was mentioned that the version after next should support it, and the last comment was in february of this year (https://github.com/iced-rs/iced/issues/552)
I bookmarked this so hopefully once that effort gets further along I can give it a try!
I figured I'd leave this comment so that some folks can see that there are real people even on HN who require these features and that accessibility work is always appreciated. We definitely exist :)
Heh, that roadmap is also not accessible to screen readers, at least on FireFox. That's unfortunate. But I understand it's a big undertaking with little reward for most people. I do think there are UI libraries with AccessKit integration, egui I believe?
Ah well. I'll check back on it every now and then either way.
I feel the same way. It also appears to be a lot more difficult to actually find jobs, though that's probably just the general state of the job market and less specifically AI related. All of it is thoroughly discouraging, demotivating, and every week this goes on the less I want to do it. So for me as well it might be time to try to look beyond software, which will also be difficult since software is what I've done for all my life, and everything else I can do I don't have any formal qualifications for, even if I am confident I have the relevant skills.
It's not even just that. Every single thing in tech right now seems to be AI this, AI that, and AI is great and all but I'm just so tired. So very tired. Somehow even despite the tools being impressive and getting more impressive by the day, I just can't find it in me to be excited about it all. Maybe it's just burnout I'm not sure, but it definitely feels like a struggle.
I keep coming to the same conclusion, which basically is: if I had an LLM write it for me, I just don't care about it. There are 2 projects out of the maybe 50 or so that are LLM generated, and even for those two I cared enough to make changes myself without an LLM. The rest just sit there because one day I thought huh wouldn't it be neat if, and then realized actually I cared more about having that thought than having the result of that thought. Then you end up fighting with different models and implementation details and then it messes up something and you go back and forth about how you actually want it to work, and somehow this is so much more draining and exhausting than just getting the work done manually with some slight completion help perhaps, maybe a little bit of boilerplate fill-in. And yes, this is after writing extensive design docs, then having some reasoning LLM figure out the tasks that need to be completed, then having some models talk back and forth about what needs to happen and while it's happening, and then I spent a whole lot of money on what exactly? Questionably working software that kinda sorta does what I wanted it to do? If I have a clear idea, or an existing codebase, if I end up guiding it along, agents and stuff are pretty cool I guess. But vibe coding? Maybe I'm in the minority here but as soon as it's a non trivial app, not just a random small script or bespoke app kind of deal, it's not fun, I often don't get the results I actually wanted out of it even if I tried to be as specific as I wanted with my prompting and design docs and example data and all that, it's expensive, code is still messy as heck, and at the end I feel like I just spent a whole lot of time actually literally arguing with my computer. Why would I want to do that?
I’ve written a full stack monorepo with over 1,000 files alone now. I’ve started with AI doing a lot of the work, but the percentage goes down and down. For me a good codebase is not about how much you’ve written, but about how it’s architectured. I want to have an app that has the best possible user and dev experience meaning its easy to maintain and easy to extend. This is achieved by making code easy to understand, for yourself, for others.
In my case it’s more like developing a mindset building a framework than to push feature after feature. I would think it’s like that for most companies. You can get an unpolished version of most apps easily, but polishing takes 3-5x the time.
Lets not talk about development robustness, backend security etc etc. Like AI has just way too many slippages for me in these cases.
However I would still consider myself a heavy AI user, but I mainly use it to discuss plans,(what google used to be) or to check it if I’ve forgotten anything.
For most features in my app I’m faster typing it out exactly the way I want it. (with a bit of auto-complete) The whole brain-coordination works better.
I guess long talk, but you’re not alone trust your instinct. You don’t seem narrow minded.
It’s nothing special. Not in the realm of anything technical outstanding. I just stated that to emphasize that it’s a slightly bigger project than default single-dev coded SAAS projects which are just a single wrapper. We have workers, multiple white-labeled applications sharing a common infrastructure, data scraping modules, AI-powered services, and email processing pipelines.
I’ve had an impossible learning curve the last year, but as I should rather be vibe-coded biased I still use less AI now to make sure it’s more consistent.
I think the two camps are different in terms of skill honestly, but also in terms of needs. Like of course you are faster vibe-coding a front-end then to write the code manually, but build a robust backend/processing system its a different kind of tier.
So instead of picking a side it’s usually best to stay as unbiased as possible and choose the right tool for the task
We just had a story last night about a Python cryptography maintainer using Claude to add formally-verified optimizations to LLVM. I think the ship has sailed on skepticism about whether LLMs are going to produce valuable code; you can follow Simon Willison's blog for more examples.
I don't understand people who are sceptical about whether LLMs can give value. We're way past that, now at the stage where we're trying to figure out how to extract the most value out of them, but I guess humans don't like change much.
They jury is still out, they have spent hundreds of billions, trillions. And want trillions in ROI.
It does really cool stuff now when it is given away for free, but how cool is it when they want you to pay what it actually costs? With ROI and profits on top.