I like that you've started keeping your prompts in the repo [1]. Why have you deleted them, later on? What I find curious is how can something AI generated can be licensed AGPL?
At some point, writing these files became a hassle and I felt it was too chaotic, so I gave up on it. Regarding licenses, given that LLMs are using other people's work without their consent anyway, and most of the code produced rn is AI, these licenses are to me really just suggestions. I would treat them as an expression of intent, not a strictly legal document.
A lawyer's opinion will be directly proportional to their fee and will have little to do with reality, because the law hasn't kept up with what's happening right now. If you don't code with AI, you'll be left behind. Everyone is coding with AI, even in places where it's not immediately obvious, there's AI. You open the refrigerator and AI comes out. I don't know how this will be regulated, I'm not an expert in this field, but for now, it is what it is. The most important thing for me is that I don't derive any financial benefit from this and I give attribution where it is due.
No, a lawyers opinion will reflect the law. From a legal standpoint it doesn't matter how you feel about applying a license to your work, it matters what the law says about that. I'm not a lawyer, but my expectation is that the license you choose will bind anyone who chooses to use your software, and in particular will have an effect on companies that wish to use it. That may be fine and exactly what you want, but my point is just that it's not necessarily a whimsical 'statement of intent'.
From personal experience, SW that was developed with agent does not hit the road because:
a) learning and adapting is at first more effort, not less,
b) learning with experiments is faster,
c) experiencing the acceleration first hand is demoralising,
d) distribution/marketing is on an accelerated declining efficiency trajectory (if you want to keep it human-generated)
e) maintenance effort is not decelerating as fast as creation effort
Yet, I believe your statement is wrong, in the first place. A lot of new code is created with AI assistance, already and part of the acceleration in AI itself can be attributed to increased use of ai in software engineering (from research to planning to execution).
1999-2000, the company I worked with gave a smallish number of key users full read rights to the SAP minus HR, briefly after introducing SAP to the global supply chain of that company. The key users came from all orgs using SAP, basically every department had one or two key users.
I was part of this and "saw the light". We had such a great visibility into all the processes, it was unreal. It tremendously sped-up cross-org initiatives.
Coding agents let me build and throw away prototypes extremely fast. A major value, for me, is that they help me understand early what users truly want and need — rather than relying on assumptions or lingering in abstraction. They help me discover and reduce my ignorance.
I've been building an EMR as a solo dev for a few years now while working at a healthcare agency. The main stakeholders are the CEO (super tech friendly, business type), the owner (nurse for longer than I've been alive) and the other clincians I work with.
What I learned very early on after having direct access to the users was how difficult it was to describe a future state of the application (or fish for pain points) without having something tangible to show/compare. A lot of them have a hard time thinking abstractly about software (and I don't blame them).
A few weeks back, I showed Bolt.new to the CEO and ever since then, our workflow has sped up tremendously. He has the technical know-how and desire to sketch out ideas he thinks will be useful (in lieu of me spending a week to build something up, getting it knocked down, and repeating over and over again). I told him to instruct it to use mock data and it's already using the exact same stack I use (React/Tailwind/React Aria). He knows enough about the process that it's not as simple as building it in Bolt, but also knows how valuable it's been to me.
I'm constitutionally incapable of building a decent UI. So bad that I can take a well designed system and completely screw it up. I just can't extrapolate on designs well (and even got into an argument during an interview with a designer because I mentioned that as one of my weaknesses). Having the ability to go back and forth with a "designer" and not get angry that I'm asking for EXACT examples has been insanely refreshing.
Our goal is to get enough of the app together (while also being mindful of stuff like accessibility) and then bring in the professionals at the end. We've burnt so much money bringing in designers too early, and now we can get to a baseline before asking for help.
I truly believe that we are witnessing another renaissance in software dev. Instead of development being relegated to the big dev companies and FAANG's, the economics of a small company bringing on a software developer are changing enough that it could turn the tide. Instead of one-size-fits-all behemoths, we can now tailor software to the client.
As a heavy user of OpenAI, Anthropic, and Google AI APIs, I’m increasingly tempted to buy a Mac Studio (M3 Ultra or M4 Pro) as a contingency in case the economics of hosted inference change significantly.
Don't buy anything physical, benchmark the models you could run on your potential hardware on (neo) cloud provider like HuggingFace. Only if you believe the quality is up to your expectation then do it. The test itself should take you $100 and few hours top.
the thing is GLM 4.7 is easily doing the work Opus was doing for me but to run it fully you'll need a much bigger hardware than a Mac Studio. $10k buys you a lot of API calls from z.ai or Anthropic. It's just not economically viable to run a good model at home.
You can cluster Mac Studios using Thunderbolt connections and enable RDMA for distributed inference. This will be slower than a single node but is still the best bang-for-the-buck wrt. doing inference on very-large-sized models.
True — I think local inference is still far more expensive for my use case due to batching effects and my relatively sporadic, hourly usage. That said, I also didn’t expect hardware prices (RTX 5090, RAM) to rise this quickly.
FWIW the M5 appears to be an actual large leap for LLM inference with the new GPU and Neural Accelerator. So id wait for the Pro/Max before jumping on M3 Ultra.
Plenty of home runs all
electric heating systems. Running inference on a H100 could be dual-purpose and also heat your home! (albeit less efficient than heat pumps, but identically as efficient as resistive heating)
The 8-10kW isn’t a big deal anymore given the prevalence of electric vehicles and charging them at home. A decade ago very few homes have this kind of hookup. Now it’s reasonably common, and if not, electricians wouldn’t bat an eye on installing it.
You'd want to get something like a RTX Pro 6000 (~ $8,500 - $10,000) or at least a RTX 5090 (~$3,000). That's the easiest thing to do or cluster of some lower-end GPUs. Or a DGX Spark (there are some better options by other manufacturers than just Nvidia) (~$3000).
Yes, I also considered the RTX 6000 Pro Max-Q, but it’s quite expensive and probably only makes sense if I can use it for other workloads as well. Interestingly, its price hasn’t gone up since last summer, here in Germany.
I have MacStudio with 512GB RAM, 2x DGX Spark and RTX 6000 Pro WS (planing to buy a few of those in Max-Q version next). I am wondering if we ever see local inference so "cheap" as we see it right now given RAM/SSD price trends.
Good grief. I'm here cautiously telling my workplace to buy a couple of dgx sparks for dev/prototyping and you have better hardware in hand than my entire org.
What kind of experiments are you doing? Did you try out exo with a dgx doing prefill and the mac doing decode?
I'm also totally interested in hearing what you have learned working with all this gear. Did you buy all this stuff out of pocket to work with?
Yeah, Exo was one of the first things to do - MacStudio has a decent throughput at the level of 3080, great for token generation and Sparks have decent compute, either for prefill or for running non-LLM models that need compute (segment anything, stable diffusion etc). RTX 6000 Pro just crushes them all (it's essentially like having 4x3090 in a single GPU). I bought 2 sparks to also play with Nvidia's networking stack and learn their ecosystem though they are a bit of a mixed bag as they don't expose some Blackwell-specific features that make a difference. I bought it all to be able to run local agents (I write AI agents for living) and develop my own ideas fully. Also I was wrapping up grad studies at Stanford so they came handy for some projects there. I bought it all out of pocket but can amortize them in taxes.
Building AI agents for a living is what I hope to become able to do, too, I consider myself still in learning phase. I have talked with some potential customers (small orgs, freelancers) and learned that local inference would unlock opportunities that have otherwise hard to tackle compliance barriers.
That you are writing AI agents for a living is fascinating to hear. We aren't even really looking at how to use agents internally yet. I think local agents are incredibly off the radar at my org despite some really good additions as supplement resources for internal apps.
What's deployment look like for your agents? You're clearly exploring a lot of different approaches . . .
My commercial agents are just wrappers on top of GPT/Claude/Gemini so the standard deployment ways on Azure/AWS/GCP apply with integrations to whatever systems customers have like JIRA, Confluence etc. Some need to automate away some folks with repetitive work, some need to improve time to delivery with their people swamped by incoming work, hoping to accelerate cognitively-demanding tasks etc.
M3 Ultra with DGX Spark is right now what M5 Ultra will be in who knows when. You can just buy those two, connect them together using Exo and have M5 Ultra performance/memory right away. Who knows what M5 Ultra will cost given RAM/SSD price explosion?
I have researched a bit more and think your recommendations are spot on. The 256 GB M3 Ultra is probably the best value right now even though it's 2k EUR more expensive than the 96 GB version.
yes, I'm using smaller models on a Mac M2 Ultra 32GB and they work well, but larger models and coding use might be not a good fit for the architecture, after all.
I'm exploring adding a firewall to my home network to detect if apps are using my network as residential proxy.
My daughter likes to install random games on iOS that have been advertised to her on other apps, and I wonder if some of those work as residential proxy behind the scenes.
I once installed a private DNS with advertising block lists on a home network level. My SO was not amused, as her Android based games with "watch an advertisement for ingame credits" now did not work anymore.
Nowadays only the TV sets and my own devices are set to use this (pihole) DNS server. So that I can at least watch Disney+ without ads.
> Now imagine you're one of these customers. [...] Then you find out: 90% of the engineering team is leaving for Nvidia, The CEO and President are leaving for Nvidia, All the IP is licensed to Nvidia, Your point of contact is now... a CFO and 10% of the original workforce?
Beginning of 2024, my employer was planning to buy GroqRacks, I knew someone from the CTO board and we had a chat about it. I was sceptical and din't hold back my opinion:
lock-in on niche technology, risk if the specific HW could serve upcoming/larger models, new model architecture implementations require proprietary knowledge and software, and we would need GPU clusters for training, anyway.
I don't know if that influenced the decision, but the project was stopped, in the end. I'm glad we didn't buy GroqHardware.
[1]: https://github.com/c0m4r/kula/commit/ae3f8a8483c91fe8bd4ea2c...
reply