Hacker Newsnew | past | comments | ask | show | jobs | submit | renmillar's commentslogin

GP’s car just isn’t trained well enough


That last part is a real one though, mine tried to debug a Dockerfile by poking around my local environment outside of Docker today.


I’ve had it make some pretty obvious mistakes. I have to hold back the impulse to “unstick” it manually. In my case, it’s been surprisingly good at eventually figuring out what it was doing wrong - though sometimes it burns a few minutes of tokens in the process.


Claude's willingness to poke outside of its present directory can definitely be a little worrying. Just the other day, it started trying to access my jails after I specifically told it not to.


On a Mac, I use built-in sandboxing to jail Claude (and every other agent) to $CWD so it doesn’t read/write anything it shouldn’t, doesn’t leak env, etc. This is done by dynamically generating access policies and I open sourced this at https://agent-safehouse.dev


By any chance, do you know what Claude Code's sandbox feature uses under the hood and how that relates to your solution ? From what I remember it also uses the native MacOS sandbox framework, but I haven't looked too deep into it and don't trust it fully


Claude Code sandboxing uses the same basic OS primitive but grants read access to the entire filesystem and includes escape hatches (some commands bypass sandboxing). Also, I wanted something solid I can use to limit every agent (OpenCode, Pi, Auggie, etc).


On Linux in a pinch you can use bubblewrap to hide and replace directories for a given process


for anyone reading this later, claude code's sandbox code is at https://github.com/anthropic-experimental/sandbox-runtime/


This is great !

Did you have any thoughts about how to restrict network access on macos too ?


I haven’t found an easy way, but I have a working theory -

sandbox-exec cannot filter based on domain names, but it can restrict outbound network connections to a specific IP/port (and drop the rest). If I can run a proxy on localhost:19999, I can allow agents to connect through it and filter connections by hostname. From my research, most agents support $HTTP_PROXY, so I'll try redirecting their HTTP requests through my security proxy. IIRC, if I do this at the CONNECT level, I don't need to MITM their traffic nor require a trusted root cert.

Recently, Codex CLI implemented something like DNS filtering for their sandbox, so I'd investigate their repo.


Some commercial firewalls will snoop on the SNI header in TLS requests and send a RST towards the client if the hostname isn’t on a whitelist. Reasonably effective. If there’s a way with the macos sandboxing to intercept socket connections you might find some proxy software that already supports this.

the HTTP_PROXY approach might be simpler though.


For the moment it’s best practice to run it and all of your dev stuff in a VM.


> Hiding applications is a pretty key concept in MacOS. Shortcuts are pretty straightforward? Cmd+H to hide, Cmd+Q to quit. Spaces aren’t hidden- there’s lots of ways to access them, but it seems you haven’t bothered to learn them.

They're not talking about Cmd+H hiding or virtual desktops - those exist on Windows too. The issue is how macOS handles window placement with zero visual feedback.

For example, when you open a new window from a fullscreen app, it just silently appears on another space. No indicator, no notification. You're left guessing whether it even opened and where it went. The placement depends on arcane rules about space layout, fullscreen ordering, and external displays - and it's basically random half the time. You either memorize the exact behavior or manually search through all your spaces.


Did you use metallic nail polish? Or is your skin just barely not making contact with the screen?


I'm someone who never puts any sort of product on my finger nails, and I can confirm that my nails work on my iPhone screen, both when I've lazily forgotten to cut them, letting them get annoyingly long, and if I turn my finger over so that my nail is just about flat on the screen - I checked while writing this, and confirmed the screen was responsive despite my skin definitely not touching it. (I'm a man who doesn't have any experience growing my nails to extend more than a few mm beyond my finger tips so I can't speak to that scenario.)


I got anxious about autocorrect potentially inserting the wrong words and what kind of social fallout that could cause, so I just disabled it entirely. Takes longer to type everything manually but at least my anxiety has gone down.


I'm running Tahoe on an M1 Air with 16GB RAM and it's been smooth for me. Might be worth trying a fresh OS install? Something seems off with your setup.


>Might be worth trying a fresh OS install?

Very "Windows-y", no?


Yup. That's where macOS is now.


It really isn’t like that for me though. The bugs are consistently there regardless of how old the OS install is. I don’t get more when I haven’t done a fresh install in a couple years.


In my experience, this only holds true for small scripts. When you're doing scientific computing or deep learning with data flowing between different libraries, the lack of type safety makes development much slower if you don't maintain strict discipline around your interfaces.


> what do you mean a latitude and longitude doesn’t mean anything without a bunch more info?!

Is the more info just the coordinate system like WGS84, or am I missing something else?


There are many different coordinate systems that aren’t WGS84, and there are different epochs associated with them as well. Some of the use latitude and longitude, some of them use easting/northing, etc.

The worst part is that if you don’t get it exactly right, you’ll still get answers that look right but are shifted by maybe 1-3m. As an example, we had a field team out with a Trimble survey stick with RTK (nominal accuracy 1-2cm) that they were using to cross-check data from our aerial survey platform. We had laid out a bunch of targets on the ground, which they surveyed the corners for. Most of the time there was a fantastic match between the aerial survey data and the ground truth data, but occasionally there was a pretty large offset. As I discovered WAY too late, exactly one of the cellphones that ran the Trimble app had its coordinate system set to one of the Canadian CSRS frames instead of WGS84: https://natural-resources.canada.ca/science-data/science-res...

Edit: naturally, they just handed me the coordinates in a CSV file that they’d captured. The Trimble app + whatever data collection app didn’t actually record the reference frame.


Step 4: it's someone else's problem, win


If they didn't want to keep maintaining it, they could just skip to step 4 and just not maintain it in the first place. The problem is that they do want to, and it's not like they haven't done it successfully for years now . Between "put up with whatever bullshit comes your way" and "give up entirely" is a wide spectrum of "try to find ways to reduce the bullshit to get to focus on the important parts", and most of the ways probably don't boil down to handful of pithy steps that could fit in an (original size) tweet.


If someone competent wanted to take over my important work projects (deployment systems, core code maintenance, etc.), I'd gladly hand them over. I could orphan them right now claiming I need time for immediate tasks, but I don't want to dump unmaintained code on my team. I'd guess open source maintainers feel even more responsible since they see their work as community service. Maybe dropping the project is what's needed to trigger a well-funded fork or get corporate attention, similar to how Heartbleed affected OpenSSL.


If you flip one upside down and attach it to another one, you end up with a stable phone that has twice the battery life.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: