Hacker Newsnew | past | comments | ask | show | jobs | submit | more halflings's commentslogin

Agree re:hallucinations/safety issues, that was likely one of the main blockers.

And here's the sad part: they had this back in 2019... see this paper released in Jan 2020: https://blog.research.google/2020/01/towards-conversational-...


This (innovator's dilemma / too afraid of disrupting your own ads business model) is the most common explanation folks are giving for this, but seems to be some sort of post-rationalization of why such a large company full of competent researchers/engineers would drop the ball this hard.

My read (having seen some of this on the inside), is that it was a mix of being too worried about safety issues (OMG, the chatbot occasionally says something offensive!) and being too complacent (too comfortable with incremental changes in Search, no appetite for launching an entirely new type of product / doing something really out there). There are many ways to monetize a chatbot, OpenAI for example is raking billions in subscription fees.


Google gets much more scrutiny then smaller companies so it's understandable to be worried. Pretty much any small mistake of theirs turns into clickbait on here and the other tech news sites and you get hundreds of comments about how evil Big Tech is. Of course it's their own fault that their PR hews negative so frequently but still it's understandable why they were so shy.


Sydney when initially released was much less censored and the vast majority of responses online were positive, "this is hilarious/cool", not "OMG Sydney should be banned!".


You have clearly not heard about Tay and Galactica.


It's understandable that people at Google are worried because it's likely very unpleasant to see critical articles and tweets about something you did. But that isn't really bad for Google's business in any of the ways that losing to someone on AI would be.


Google is constantly being sued for nearly everything they do. They create a Chrome Incognito mode like Firefox's private browsing mode and they get sued. They start restricting App permissions on Android, sued. Adding a feature where Google maps lets you select the location of your next appointment as a destination in a single click, sued (that's leveraging your calendar monopoly to improve your map app).

Google has it's hands in so many fields that any change they make that disrupts the status-quo brings down antitrust investigations and lawsuits.

That's the reason why Firefox and Safari dropping support for 3rd party cookies gets a yawn from regulators while Google gets pinned between the CMA wanting to slow down or stop 3rd party cookies deprecation to prevent disrupting the ads market and the ICO wanting Google to drop support yesterday.

This is not about bad press or people feeling bad about news articles. Google has been hit by billion dollar fines in the past and has become hesitant to do anything.

Where smaller companies can take the "Elon Musk" route and just pay fines and settle lawsuits as just the cost of doing business, Google has become an unwieldy juggernaut unable to move out of fear of people complaining and taking another pound of flesh. To be clear, I don't agree with a strategy of ignoring inconvenient regulations, but Google's excess of caution has severely limited their ability to innovate. But given previous judgements against Google, I can't exactly say that they're wrong to do so. Even Google can only pay so many multi-billion dollar fines before they have to close shop, and I can't exactly say the world would be better off if that happened.


That's true for google, sure. But what about individual workers and managers at google?

You can push things forward hard, battle the many stakeholders all of whom want their thing at the top of the search results page, get a load of extra headcount to make a robust and scalable user-facing system, join an on-call rota and get called at 2am, engage in a bunch of ethically questionable behaviour skirting the border between fair use and copyright infringement, hire and manage loads of data labellers in low-income countries who get paid a pittance, battle the internal doubters who think Google Assistant shows chatbots are a joke and users don't want it, and battle the internal fearmongers who think your ML system is going to call black people monkeys, and at the end of it maybe it's great or maybe it ends up an embarrassment that gets withdrawn, like Tay.

Or you can publish some academic papers. Maybe do some work improving the automatic transcription for youtube, or translation for google translate. Finish work at 3pm on a Friday, and have plenty of time to enjoy your $400k salary.


>There are many ways to monetize a chatbot, OpenAI for example is raking billions in subscription fees.

Compared to Google, OpenAI's billions is peanuts, while costing a fortune to generate. GPT-4 doesn't seem profitable (if it was, would they need to throttle it?)


> GPT-4 doesn't seem profitable (if it was, would they need to throttle it?)

Maybe? Hardware supply isn’t perfectly elastic


Wouldn't Google be better able to integrate ads into a "ChatGoogle" service than OpenAI is into ChatGPT?


The cost per ad is still astronomically different between search ads and LLMs


There could be an opposite avenue: ad-free Google Premium subscription with AI chat as a crown jewel. An ultimate opportunity to diversify from ad revenue.


There's not enough money in it, as Google's scale.

Especially because the people who'd pay for Premium tend to be the most prized people from an advertiser perspective.

And most people won't pay, under any circumstances, but they will click on ads which make Google money.


The low operating margin of serving a GPT-4 scale model sounds like a compelling explanation for why Google stayed out of it.

But then why did Microsoft put its money behind it? Alphabet's revenue is around $300bn, and Microsoft's is around $210bn which is lower but it is the same order of magnitude.


YouTube does it, at Google scale. And these same people do pay $20/mo for ChatGPT anyway.


YouTube isn't comparable - YouTube revenue is roughly 30B/year, while Search revenue is roughly 175B/year.

Advertisers are willing to pay far more than $20/mo per user, combined with the fact that search costs way less per query than inference.


Monetizing a chatbot is one thing. Beating revenues every year when you are already making 300b a year is a whole different ball game There must be tens of execs who understand this but their payout depends on keeping status quo


uv has been really awesome as a replacement for pip: https://github.com/astral-sh/uv

So fast it finally made virtual environments usable for me. But it's not (yet) a full replacement for conda, e.g. it won't install things outside of Python packages


How about prefix then? https://prefix.dev/blog/uv_in_pixi


This looks pretty cool! What's the catch? e.g. why isn't this already implemented in accelerators, is it really just a forgotten algorithm, or this has some implications on the cost of building the accelerator or else?


It's not just a software algorithm. It's a hardware architecture optimization. To benefit, you have to build hardware that matches the dimensions of the algorithm. That's an expensive commitment.


> you have to build hardware that matches the dimensions of the algorithm

Yes the benefits are realized in custom hardware designs as opposed to software, however, the hardware architectures work for multiplying matrices of arbitrary dimensions by splitting up larger matrices into smaller tiles, then summing up the tile products to form the final larger matrix products (i.e. GEMM)


Not so much in FPGA ... although I'm not sure top end FPGAs would beat Nvidia TPUs even with this algorithm, and even if cost were not a consideration.


IMHO, for fixed-point MM accelerators, there is no catch, I think it's an overlooked algorithm. It's based on an algorithm by Winograd who coincidentally also proposed another unrelated algorithm that later became very popular for CNN acceleration which would take some visibility away from this other algorithm by Winograd... But that is speculative


On the other hand, if you tried it with floating point, you'd lose significant digits. Since the approach is to sum (a[i] + b[i+1])(a[i+1] + b[i]) and subtract the sums of a[i]a[i+1] and b[i]b[i+1] in the end to get a[i]b[i] + a[i+1]b[i+1], you may be taking the difference of two large values to get a small value, losing precision.


[dead]


On a tangent, go is so elegant.


LLM hype and this submission in particular keep making me think of a lecturer I had for Topics in Large Dimensional Data Processing, circa 2016: as I recall he was enthusiastically adamant that the most important thing, breakthroughs etc., in years/decades to come was going to be faster matrix operations. Anyway, I'm pretty sure I recognise FIP (not FFIP of course) from that course.

I wish I could remember his name, I believe he left academia after my year and went to work in industry, I'd just be curious to see what he's up to now. I'm not saying it was a particularly novel or prescient comment/attitude, we may not have had quite such ML hype but certainly 'big data' was all the rage at the time, it's just something that's stuck in my mind. One of those areas I always meant to study more, just realistically probably never had the mathematical chops for and certainly those I did have atrophied.


Maybe I’m joking, but: our society is just a vehicle for economics at this point, our economy is built around science, our science has mostly been turned into observations about engineering, some time ago we changed all of engineering into differential equations, and differential equations can be solved by discretizing them and doing linear algebra, and most of linear algebra can be done with matrix multiplications (triangular solves and orthonormalizations if you are fancy). All you need is matmul.


> our science has mostly been turned into observations about engineering

You may be joking but that in particular seems pretty astute.

Superficially it seems accurate, and reasonably ascribable to economic forces, fewer concentrations of capital in people (aristocrats) spending it on a hobby interest or academic pursuit of their own - today's equivalents mostly prefer philanthropy (Musk is, I suppose, for whatever else you might think of him, a notable exception - preferring to explore space, AI, etc. not really it seems for personal monetary gain). But I wonder if that is fair, to modern scientists, or is it just that 'all the low-hanging stuff's been done'?


For life sciences need grad students / postdocs to do the grunt work of pipetting, dissected, plating etc. And whatever the equivalent is in chemistry (titration/GC/mass transfer I guess)?

But those tools created by engineers are pretty darn important, and allow plenty of experiments/observations to be performed that were previously out of reach.


So, what you're saying is ...

... that the Matrix creates the world around us.

Thanks.


Unless he was a guest lecturer, if the course was for credit, wouldn't his name appear on your official transcript?


I don't think so, this may be the case in your country of course. I may well have recorded it in my notes if I dig them out, but this was a fourth year course and they certainly degraded over the years.


There are a lot of matrix multiplication algorithms out there with a lot of pluses and minuses. It's always a balance of accuracy, runtime, and scaling. This one probably has bad accuracy in floating point.


For everyone discussing the reduced accuracy/numerical stability of the algorithms in floating-point, this is true. But note that the application of the algorithms in the work is explored for fixed-point MM/quantized integer NN inference, not floating-point MM/inference. Hence, there is no reduction in accuracy for that application of it compared to using conventional fixed-point MM.


"Conventional fixed-point MM" is a large suite of algorithms. It is correct that this is a 2x reduction in MULs compared to naive fixed-point matrix multiply, but there is a large body of literature out there with other circuits. This is a cool trick to add to the group.


Inference world is gradually switching from INT formats to FP formats. FP8 is already supported in modern hardware, and FP4 support is coming. In my experiments I get better perplexity in language models with FP4 than with INT4.


How is FP fundamentally different than integers? I've done FPGA programming and it just seems like the programmer has to decide where/when to do the shifting based on the expected range of the data. I'm not sure how this is "natively supported" in hardware.


If you have designed FPUs you should know that FP computation involves a lot more additional operations than just shifting (e.g. rounding, subnormals, and special value handling). That’s why, for example, CPUs use different hardware blocks for INT vs FP computation.

But that’s not the point. The point is, this particular method to speed up matmul is not suitable for FP.


I'm no expert but I suspect this is wrong. To me, this is like saying you don't need to worry about integer overflow because your operations are only working on fixed integers. Really? You don't care if you multiply or add two large numbers and they spill over?

The more appropriate answer, I suspect, is that the numerical precision and stability sacrifices are more than adequate for normal usage.

If I'm wrong about this, I would certainly like to know.


In hardware, you control your integer widths completely, so if you add two 32-bit ints to a 33-bit int, there is no chance of overflow. The same goes for multiplications, etc.


Yeah with shifts you can guarantee no overflow, but you have to decide under what circumstances is avoiding overflow/clipping worth the loss of precision.

Fixed point typically requires alot more programming, but sometimes its worth it if you know what the data ranges are.


I don't know why this answer is getting downvoted. This is absolutely correct.

W. Miller has a paper discussing, under conditions of numerical stability, O(n^3) multiplications is necessary [0]. Any algorithm that gets sub cubic runtime for matrix multiplication, like Strassen's or Coppersmith's, must sacrifice some amount of precision or stability.

[0] https://epubs.siam.org/doi/10.1137/0204009



The document said it outputs the exact same values as the conventional method. There is no accuracy trade off here.


The paper cited is about hardware, where there is no accuracy tradeoff because you control the numerical precision completely and use fixed point. In a software implementation, neither is true. There is no chance that you will get the exact same values out of this method that you do out of other FP matmuls.


For floating point? Are you sure?


Opening statement of README

    This repository contains the source code for ML hardware architectures that 
    require nearly half the number of multiplier units to achieve the same 
    performance, by executing alternative inner-product algorithms that trade 
    nearly half the multiplications for cheap low-bitwidth additions, while still 
    producing identical output as the conventional inner product.


I just looked at the paper: the answer is no, floating point is not supported.


It's not quite forgotten. It kind of lives on in the pseudo-dot product Wegman-Carter authenticators like UMAC. See Section 3 of [1] for context.

[1] https://cr.yp.to/antiforgery/pema-20071022.pdf


I’ve only glanced at it so someone correct me if I’m wrong, but IIUC this is not a replacement for matrix multiplication but rather an approximation that only gives decent-ish results for the types of linear systems you see in AI/ML. But for that use case it is totally fine?


It produces identical/bit-equivalent results as conventional/naive matrix multiplication for integer/fixed-point data types


Perhaps it's less of a hidden gem and more of a spotlight moment.


This is the definition of strawman. "Advocate for killing all humans" sounds like someone advocating for a genocide, but instead it's just the same transhumanist thinking (which Yudkowsky also believes in, FYI)


Commenter: "If AI replaced us, it's fine because they're a worthy descendants?"

Beff Jezos / Verdon: "Personally, yes."

Yudkowsky is transhumanist in that he is hopeful people could voluntarily extend their biological self, but isn't advocating for the elimination of all biological selves in pursuit of other artificial intelligences.

Voluntary /= involuntary eradication


Your comment (+ username) reads like what I would have written once upon a time when I was fully in the EA bubble.

Truly no offense meant, as I was deeply into the EA movement myself, and still consider myself one (in the original "donate money effectively" sense), but the movement has now morphed into a death cult obsessed by things like:

* OMG we're all going to die any time now (repeated every year since circa 2018)

* What is your pDoom? What are your timelines? (aka: what is your totally made up number that makes you feel like you're doing something rational/scientific)

I'm deep in the weeds w/ LLMs, e.g. I probably finetune an average of 1 model a day, and working with bleeding edge models... and AI safety just sounds so silly. Wanting to take drastic measures today to prevent an upcoming apocalypse makes as much sense as taking the same drastic measures when gradient descent was invented.


My username was created before I knew anything about EA or adjacent. I'm not in any EA movement, though I am sympathetic. I've spent 100x the time on HN, with people mostly in denial, than I do in EA or adjacent forums, nor have I met any of them.

It's sadly twisted how mentioning that -- the majority of leaders doing the cutting edge research on AGI think it has a significant chance that it kills humanity -- is considered being part of a "cult" movement.

Your analogy is the same as early Intel engineers completely unaware that those chips would bring on the ramifications of social media. "In the weeds" and yet unable to foresee the trajectory and wider impact. Same with the physics that led to nuclear weapons.

> Wanting to take drastic measures today to prevent an upcoming apocalypse makes as much sense as taking the same drastic measures when [nuclear fission] was invented.


> Your analogy is the same as early Intel engineers completely unaware that those chips would bring on the ramifications of social media

Exactly! As they should be. (for both Intel engineers developing chips, and physicists developing nuclear research)

There were a billion more potential dangers from those technologies that never materialized, and never will.

I'm glad we didn't stop them in their track because a poll of 10 leaders in the field thought they were too dangerous and progress should stop. (note that no one is against regulating dangerous uses of AI, e.g. autonomous weapons, chemical warfare; the problem is regulating AI research and development in the first place)


The importance (e.g. attention) needs to be dynamic, e.g. one token will be important to some other tokens but not others.

tf-idf and similar heuristics are what we were using before attention came along, e.g. tf-idf weighted bag-of-words representation of word2vec embeddings. That approaches fails in so many cases.


Attention in transformers works because over time the model learns token importance based on frequency and context.

If you don’t have attention and need a fast substitute for “forgetting” non important tokens, then BM25 is an intuitive hypothesis.


To use your metaphor, TF-IDF will result in ‘fixed’ weights.

Attention makes it so that the weights of each token can be different in each sequence of tokens. Same token gets different weights depending on who its ‘neighbors’ in the sequence end up being.

This property allows the models to solve a variety of natural language problems and gets ‘used’ by the model to express context-aware dependencies.


Given that GP explicitly said “if you don't have attention”, and we're in a thread about a language model whose main characteristics is not to use attention, I don't understand why you insist in talking about attention …


I mean, if we are going to get past attention (very much on board with the idea!), then it might help to know what it is really contributing to a model.

My response was trying to clarify some confusion.

I am all for alternatives to attention. I don’t think BM25 cuts it. I don’t think anything that samples tokens based on BM25 weights (the idea in this subthread) would cut it.


What confusion? I know exactly how BM25 works and how Transformers work. I stated a hypothesis and asked if anyone has tried it. You say it won’t work. That’s just your opinion. Do you have proof or evidence? This is science. Dismissal of ideas without evidence goes against scientific principles.


Just catching up to this thread again. You had said:

"I was wondering if anyone has tried setting importance of a token as a TF-IDF or BM25 lookup."

So, I take it back. This is not a confusion. You are right to call it out. :)

I like this idea directionally. A lot of energy (literally) would be saved if we could get to the model accuracy outcomes with static weights like this.

However, I do think that this (as stated in your original message) would not work as well as transformer or SSM and I explained my reasoning as to why, already. I don't have an empirical proof (not having run the experiment) but if you believe in it, you should try it and share your findings.


Yep that's pretty much it! That's what they call needle in a haystack. See: https://github.com/gkamradt/LLMTest_NeedleInAHaystack


Not if you account for the increase in ad revenue from the company paying them.


> The technology looks at hundreds of signals that could indicate a booking is higher risk for this type of incident, like the duration of the trip the guest is trying to book, how far the listing is from their location, the type of listing they’re booking, and if the reservation is being made at the last-minute, among many more.

This is most likely a (simple) ML model trained on previous reports of such bookings.

Not newsworthy, but probably not a bunch of if-statements.


I'm betting it probably doesn't even exist or if does won't be implemented. Imagine the lawsuits from people who will be rejected? No, as someone mentioned above, it's pure PR to hop on the hype train, à la Q* a few weeks back


> Airbnb brought in anti-party measures last NYE that saw thousands of people globally blocked from booking an entire home listing on the platform, including approximately 63,550 people in the United States, 13,200 in the UK, and 5,400 in Australia.

Generally speaking, unless you're in the EU, you won't be successful in suing a company that blocks you and doesn't provide a reason.


Decision tree learning is a part of machine learning, which is a field of AI.

The result is a bunch of if-statements. ;)


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: