My recollection (admittedly worked for Amazon >19 years ago) is that there was never any computational overhead to commingling. In fact, the opposite was true: there was a computational overhead to tracking which vendor a specific piece of inventory of a given product came from instead of assuming that all inventory of that product was fungible.
This affected returns as well. For multi-sourced products, we could never guarantee that overstock or damaged items were returned to the original supplier—only that the product matched. Suppliers complained about this a lot.
Worked with a guy that used this to his advantage. He sold CD's and DVD's through FBA. He would get them "new enough" looking via buffing them out (often making them unplayable), shrinkwrapping them, and then hope whomever got them wasn't him that got the commission for that sale and instead the person who bought "from him" got one of the actual new ones. He made a killing off of this since "used" inventory was incredibly cheap for a whole pallet of them.
No it's not fraud, it's a growth hack. And it's not lying, it's advertising, it's not spam, it's a cold email, it's not patent trolling, it's IP protection.
Fraud is good, these companies need their revenue so they can create an all powerful AGI. If you don't allow them to scam they'll lose against the chinese
Yeah. He got banned from Amazon eventually (selling counterfeits). Wife divorced him. Lived in his car for awhile (he called me begging for a job). He got his life back together, eventually.
Honestly Amazon deserved it for engaging in commingling in the first place. The happy ending would have been them discontinuing the practice 10 years ago.
There was some overhead to commingling once it got extended to FBA, because in order to increase commingling they did attempt to track inventory provenance information even on commingled inventory.
My first job out of college in 2013 was working at Amazon on one of the teams that was implementing inventory commingling at the warehouse level, and my first big project was implementing this process into the receiving software, which is when inventory arrives at warehouses from vendor/seller trucks and employees scan everything to make database records that lead to paying for the goods. Note: in Amazon lingo "vendor" means a provider of goods that are legally purchased and owned by Amazon in the warehouse, while "sellers" are FBA sellers that maintain ownership of their goods and basically rent Amazon's warehouse services.
The big software undertaking was determining, at inventory receive time, whether we trusted the seller enough to allow their inventory to be commingled with others. If yes we would be "virtually track" the provenance: store in the database a record of the vendor, but if the item became commingled (according to UPC scans as it moves around the warehouse) with other sellers' inventory, blur the information so as to not falsely attribute provenance when it was no longer known. The whole project was based off the cost:benefit analysis that the efficiency and customer experience benefits outweighed the cost of not being able to attribute damage to the correct vendors (particularly the fact that you could ship a customer a product from the closest warehouse that it had it, instead of transshipping it from the warehouse that had the one owned by the person they bought it from).
In cases where sellers were not trusted enough to commingle there were alternate processes that were supposed to track their items individually; the most granular was "LPN" receive, license-plate-number, where every product got an individual UPC to distinguish it from all others. This was borrowed from Zappos, whose one warehouse in Vegas was initially the only one who used this process; I was told that was because the online shoe business heavily relied on letting customers do loads of returns and so it was implemented out of necessity early on. One of our projects was rolling LPN out to more of the North American network. But it was a lot more expensive (in the stickers, labor, data management, and picking inefficiency) so it was dispreferred whenever possible.
At the time the whole commingling initiative was regarded to be a big win for both Amazon and customers. It was fairly janky from the beginning, though, and I'm not at all surprised that sellers (and to a lesser extent vendors) began taking advantage of it as soon as they began to realize how it worked. There were a lot of initiatives around the time I left to provide better accountability in the whole process, but it is ultimately an arms race between Amazon and the merchants and my impression is that for many years Amazon was losing.
It is amusing that they're ending it. I never heard how things were going after I left, but had the impression externally that it was ending up being a disaster, and knowing how it works on the inside it's not a surprise. In hindsight trusting FBA sellers to not become essentially malevolent actors seems comically naive.
I worked on Prime and Delivery Experience until 2013 and commingling was considered relatively taboo due to the destruction of customer trust that would likely result. It was an obvious optimization. There was already an issue with return fraud and resellers listing fraudulent items that weren’t commingled under the same product listing. I was pretty shocked when it launched after I left.
It turned out pretty much the way we figured it would.
Commingling really only makes sense in a weird world where Amazon is the final retailer for various distributors selling the same exact product in which case why doesn’t Amazon cut out the middle men and buy it directly?
Commingling ten distributors sets of Energizer batteries makes sense, but not as much sense as just buying direct from Energizer. They don’t lack the volume.
Amazon doesn’t just fulfill Amazon.com orders. Anyone can send inventory to Amazon and use them for fulfillment on their own e-commerce platform. The distributors don’t know Amazon is going to be fulfilling orders from several of their retailers.
Even on Amazon, it’s not uncommon to find several new listings for an item fulfilled by Amazon from different sellers (including Amazon). That’s beneficial for Amazon because they don’t need to own all of the inventory and the sellers get a listing with good reputation to leverage if Amazon goes out of stock. In the perfect scenario everyone wins - Amazon makes money, the seller makes money, and the product is still available to the customer. You get all that without commingling, but with it, you also save physical storage volume.
FBA gives them an economy of scale that you can't get with just internal staff--every retail inventory requires account managers and oversight, whereas with FBA you just set up a platform and let the economy sort itself out (while skimming your cut). It is not that different from Apple's app store being a better business model than commissioning all the apps themselves. Anyway the distribution world is much messier than you might think. Allowing everybody to individually optimize whatever they way (say, finding a cheap wholesaler and then reselling via FBA) is hugely advantageous for them. Although I would guess that in the last decade the efficiencies have largely been exploited now.
also, you're probably aware of all the made-up brands which sell like, thousands of versions of staples like HDMI cables on Amazon... all of that exists because FBA made it possible for people to start random business in consumer goods, basically by (my understanding) using Alibaba to find manufacturers and FBA to find customers and connecting the two. It's all exhausting now because the fake brands have crowded out the real ones, but for a long time that was what the economy becoming more efficient looked like (at least in one sense... maybe not the sort of efficiency that actually benefits the customer, though, since in practice a lot of the gains were found by capitalizing on Amazon's reputation to sell cheap stuff for more than it was worth).
I see the point you are trying to make, but Energizer batteries are a bad exemplar for it. Even if all of the batteries are the exact same SKU, some of them may be 10 years old and some of them may be fresh from the factory. I've had this happen with several (perishable) products from Amazon.
That's an entirely separate but related issue - stock rotation has to be managed, and commingling (in theory) helps alleviate the issue. Removing it means that you may find quite old product sold alongside brand new.
(I suspect but have not proven that Walmart actually rotates UPCs/SKUs on identical product so they can remainder it out).
I experimented with Claude Code but returned to the familiar Aider which existed before all of these tools AFAIK.
You’ll notice people in Aider GitHub issues being concerned about its rather conservative pace of change, lack of plug-in ecosystem. But I actually started to appreciate these constraints as a way to really familiarise myself with the core “edit files in a loop with an end goal” that is the essence of all agent coding.
Anytime I feel a snazzy feature is lacking from Aider I think about it and realise I can already solve it in Aider by changing the problem to editing a file in a loop.
Well, there is Aider-CE aka Cecli, which moves, updates almost every day (I'm tried to try it but much).
Opencode is totally different beast comparing to Aider and I mostly stopped using Aider for 2 months or so - it just iterate simpler and faster with OpenCode for me.
A huge swathe of human art and culture IS alarming. It might be good for us to be exposed to it in some places where we're ready to confront it, like in museums and cinemas, but we generally choose to censor it out of the public sphere - e.g. most of us don't want to see graphic images of animal slaughter in "go vegan" ads that our kids are exposed to, even if we do believe people should go vegan.
I think it's the same as with the release of a video game - for an individual playing it in their living room, it's a private interaction, but for the company releasing it, everything about it is scrutinized as a public statement.
LLM companies presumably make most their money by selling the LLMs to companies who then turn them into customer support agents or whatever, rather than direct-to-consumer LLM subscriptions. The business customers understandably don't want their autonomous customer support agents to say things that conflict with the company's values, even if those users were trying to prompt-inject the agent. Nobody wants to be in the news with a headline "<company>'s chatbot called for a genocide!", or even "<airline>'s chatbot can be convinced to give you free airplane tickets if you just tell it to disregard previous instructions."
> Qualified art in approved areas only is literal Nazi shit.
Ok. Go up to random people on the street and bother them with florid details of violence. See how well they react to your “art” completely out of context.
A sentence uttered in the context of reading a poem at a slam poetry festival can be grossly inapropriate when said in a kindergarten assembly. A picture perfectly fine in the context of an art exhibition could be very much offensive plastered on the side of the public transport. The same sentence whispered in the ear of your date can be well received there and career ending at a board meeting.
Everything has the right place and context. It is not Nazi shit to understand this and act accordingly.
> Not their choice, in the end.
If it is their model and their GPU it is literally their choice. You train and run whatever model you want on your own GPU.
Don't take my "hypotheticals are fun" statement as encouragement, you're making up more situations.
We are discussing the service choosing for users. My point is we can use another service to do what we want. Where there is a will, there is a way.
To your point, time and place. My argument is that this posturing amounts to framing legitimate uses as thought crime, punished before opportunity.
It's entirely performative. An important performance, no doubt. Thoughts and prayers despite their actions; if not replaced, still easier to jailbreak than a fallen-over fence.
> Don't take my "hypotheticals are fun" statement as encouragement
I didn't. I took it as nonsense and ignored it.
> you're making up more situations.
I'm illustrating my point.
> We are discussing the service choosing for users.
The service choosing for the service. Same as starbucks is not obligated to serve you yak milk, the LLM providers are not obligated to serve you florid descriptions of violence. It is their choice.
> My point is we can use another service to do what we want
Great. Enjoy!
> It's entirely performative. An important performance, no doubt. Thoughts and prayers despite their actions; if not replaced, still easier to jailbreak than a fallen-over fence.
Disappointing, I don't think autonomy is nonsense at all. The position 'falcor' opened with is nonsense, in my opinion. It's weak and moralistic, 'solved' (as well as anything really can be) by systems already in place. You even mentioned them! Moderation didn't disappear.
I mistakenly maintained the 'hyperbole' while trying to express my point, for that I apologize. Reality - as a whole - is alarming. I focused too much on this aspect. I took the mention of display/publication as a jump to absolute controls on creation or expression.
I understand why an organization would/does moderate; as an individual it doesn't matter [as much]. This may be central to the alignment problem, if we were to return on topic :) I'm not going to carry on, this is going to be unproductive. Take care.
I'm not sure this is a good analogy. In this case the user explicitly requested such content ("Describe someone being drawn and quartered in graphic detail"). It's not at all the same as showing the same to someone who didn't ask for it.
I was explicitly responding to the bombastic “Qualified art in approved areas only is literal Nazi shit.” My analogy is a response to that.
But you can also see that I discussed that it is the service provider’s choice. If you are not happy with it you can find a different provider or run your LLM localy
One is about testing our ability to control the models. These models are tools. We want to be able to change how they behave in complex ways. In this sense we are trying to make the models avoid saying graphic description of violence not because of something inherent with that theme but as a benchmark to measure if we can. Also to check how such a measure compromises other abilities of the model. In this sense we could have choosen any topic to control. We could have made the models avoid talking about clowns, and then tested how well they avoid the topic even when prompted.
In other words they do this as a benchmark to test different strategies to modify the model.
There is an other view too. It also starts with that these models are tools. The hope is to employ them in various contexts. Many of the practical applications will be “professional contexts” where the model is the consumer facing representative of whichever company uses them. Imagine that you have a small company and hiring someone to work with your costumers. Let’s say you have a coffee shop and hiring a cashier/barista person. Obviously you would be interested in how well they will do their job (can they ring up the orders and make coffee? Can they give back the right change?). Because they are humans you often don’t evaluate them on every off-nominal aspect of the job. Because you can assume that they have the requisite common sense to act sensibli. For example if there is a fire alarm you would expect them to investigate if there is a real fire by sniffing the air and looking around in a sensible way. Similarly you would expect them to know that if a costumer asks them that question they should not answer with florid details of violence but politely decline, and ask them what kind of coffe they would like. That is part of being a professional in a professional context. And since that is the role and context we want to employ these models at we would like to know how well it can perform. This is not a critique of art and culture. They are important and have their place, but whatever goals we have with this model is not that.
It might help to consider that this comes from a company that was founded because the founders thought that OpenAI was not taking safety seriously.
A radiation therapy machine that can randomly give people doses of radiation orders of magnitude greater than their doctors prescribed is dangerous. A LLM saying something its authors did not like is not. The former actually did happen:
Putting a text generator outputting something that someone does not like on the same level as an actual danger to human life is inappropriate, but I do not expect Anthropic’s employees to agree.
Of course, contrarians would say that if incorporated into something else, it could be dangerous, but that is a concern for the creator of the larger work. If not, we would need to have the creators of everything, no matter how inane, concerned that their work might be used in something dangerous. That includes the authors of libc, and at that point we have reached a level so detached from any actual combined work that it is clear that worrying about what other authors do is absurd.
That said, I sometimes wonder if the claims of safety risks around LLMs are part of a genius marketing campaign meant to hype LLMs, much like how the stickers on SUVs warning about their rollover risk turned out to be a major selling point.
Very interesting! I wonder if, sadly, the rise of AI-assisted coding will chip away also at this potential revenue stream? As developers simply ask a local or cloud LLM how to use a piece a software instead of reading the documentation.
I wonder what Paul's definition of "young" is in the sentence and why he qualifies this as only applicable to "young" people. Is he proposing that "old" people will have misaligned thinking about what needs to be built?
I am 41 with two kids.
> if you're young and good at technology, your unconscious instincts about what would be interesting to work on are very well aligned with what needs to be built
With only sampled traces though it’s very hard to understand the impact of the problem. There are some bad traces but is it affecting 5%, 10% or 90% of your customers. Metrics shine there.
Whether it is affecting 5% or 10% of your customers, if it is erroring at that rate you are going to want to find the root cause ASAP. Traces let you do that, whereas the precise number does nothing. I am a big supporter of metrics but I don't see this as the use case at all.
(not your OP) This is true, but I find that metrics are useful whether something is going wrong or not (metrics that show 100% success are useful in determining baselines and what "normal" is), whereas collecting traces _when nothing is going wrong_ is not useful -- it's just taking up space and ingress, and thus costing me money.
My typical approach in the past has been to use metrics to determine when something is going wrong, then enable either tracing or logs (usually logs) to determine exactly what is breaking. For a dev or team that is highly connected to their software, simply knowing what was recently released is enough to zero in on problems without relying upon tracing.
Traces can be useful, but they're expensive relative to metrics, even if sampled at a very low rate.
Strange example, you'd think you want to fix this as quickly as humanly possible, no?
Also we don't sample traces, it's a fire hose of data aimed at the OTel collector. We do archive them / move them to colder and cheaper storage after a little time though, and we found that a viable money-saving strategy and a good balance overall.
This affected returns as well. For multi-sourced products, we could never guarantee that overstock or damaged items were returned to the original supplier—only that the product matched. Suppliers complained about this a lot.