More

ulber · on Oct 12, 2023

>I think we need something that at least captures the general mechanism by which LLMs predict the next token.

This bit varies a lot since the capabilities involved in prediction depend on the data. If the text is a math book and the prompt is "... three plus five apples is a total of " the crucial capability is arithmetic (plus of course NL capabilities). On the other hand, if you're completing a post from /r/relationship_advice the capabilities involved will be (vaguely) maintaining literary tone, theory of mind, psychology, etc. Within a text the capabilities needed will also vary a lot, where you might need theory of mind at some crucial inflection points, but most of the time its already clear what is going to be said and the model just has to get the wording right.

So, my take would be to really think hard about the data to understand how predictions might be made.

cowb0yl0gic · on Oct 12, 2023

The way LLMs process text is a lot like a typical human (top-of-the-head answer vs actually analyzing the question in depth). This is a recurring pattern with the current LLMs I've interacted with (ex., A: "x" Q: "but what about y?" A: "oh, you're right, now that I [actually] think about it, really z"). Example (Anthropic's Claude):

    Q: if one basket contains 5 apples and another basket contains 3 plums, and another basket contains 2 walnuts how many fruits are there combined?
    A: The first basket contains \*5 apples\*. The second basket contains \*3 plums\*. The third basket contains \*2 walnuts\*. Adding up the numbers of fruits in each basket, there are a total of \*5 + 3 + 2 = 10 fruits combined\*.
    Q: is a walnut a fruit?
    A: You're right, a walnut is not technically a fruit. A walnut is a type of tree nut. To correctly answer the original question:
    The first basket contains 5 apples. The second basket contains 3 plums. While the third basket contains 2 walnuts, walnuts are not fruits. So the total number of fruits combined is 5 apples + 3 plums = 8 fruits.

PeterisP · on Oct 13, 2023

An interesting fundamental limitation of transformer LLMs is that their architecture doesn't allow for the option for the model to choose to think a bit more or to choose to review their answer, as the quantity and structure computations they'll do is fixed in the architecture without any conditionals - but you repeating the same prompt means that the model gets to run a second iteration while having the result of the first iteration as the input data, and that can enable it to do more.

Of course, if we'd want to generalize it, we should have a structure where the model itself can actually make the decision to continue work for one more iteration, and also ensure that it can retain some of the temporary work-in-progress notes (currently it can see only what the previous iteration output as part of the answer).

simonw · on Oct 13, 2023

There's an interesting new paper about this problem: https://arxiv.org/abs/2310.02226

"Think before you speak: Training Language Models With Pause Tokens"

Basic idea is to teach the LLM to occasionally insert a "pause" token, which outputs nothing but gives it a chance to perform another round of operations on the way to the answer.

simonw · on Oct 12, 2023

"three plus five apples is a total of..." is a really interesting example, because it doesn't actually require arithmetic at all.

A language model trained on enough text will be able to complete this just based on having encountered the pattern "three plus five SOMETHING is a total of..." enough times in its training data.

This becomes even more apparent when you work with smaller models - the 7B etc models which can run on a laptop. They can often solve small arithmetic problems like this while having no chance at all of working with larger numbers that they haven't ever encountered in their training data.

I really like using those smaller models as tools to better understand how this technology works.

ulber · on March 19, 2023

I bet they forgot to include that it can generate that 770MWh every hour.

varenc · on March 19, 2023

770MWh every hour is just 770MW

The hours cancel outs!

rzzzt · on March 19, 2023

But then the next sentence goes like this: "For reference, the city of New York City now uses about 100,000 kilowatt-hours per minute."

So...

stametseater · on March 19, 2023

minutes and hours still cancel out, you just have to do a little bit more arithmetic in the process.

rzzzt · on March 19, 2023

I forgot for a moment where I was going with this, but now back on track: if they wanted to make the two values comparable, Edison's plant should also produce 770 MWh every minute.

marcosdumay · on March 19, 2023

> if they wanted to make the two values comparable

When was the last time you saw a journalism piece try to make units comparable?

That number can mean absolutely anything, there is no telling what the people could be thinking on the telephone game from transcribing the source all the way into a finished and edited design.

rzzzt · on March 19, 2023

You are not wrong, and now I'm more confused. Unfortunately the linked report 404s, but an old copy was available through the Wayback Machine (it is exploring market needs wrt. photovoltaic systems in NYC). The introduction states that the city's total electrical consumption in 2015 was 52836 GWh.

Math time: (52836 × 1000 × 1000) / (365 × 24 × 60) = 100525 kWh of energy consumed in a minute. So that checks out.

On the other end of the comparison, by the early 1900s AC largely won and plants were appearing left and right like flowers in a field. I can't find the exact station nor its capabilities just by searching for the 1920 date.

Edison's first commercial station in Pearl Street from 1882 (still DC, I think) had 6 dynamos producing 100 kW of power each: https://en.wikipedia.org/wiki/Pearl_Street_Station Which is... let's see... 600 kWh every hour! :) Or 10 kWh per minute.

If the author suggests that Edison's plant produced an amount of electricity that is enough to cater for present day NYC's consumption a mere seven times over, that doesn't seem quite right. 770000 kWh in an hour is 12833 kWh per minute, in which case you need to build 10 Edison-plants to match the demand.

(I divided so many numbers in this comment, I sincerely hope that I did them right)

varenc · on March 19, 2023

I think the 2015 electric consumption was 10000x of what the 600kW edison plant could generate?

This is a great example of how things get simpler if we drop the over time part of the units and simplify it to just the average power draw.

So in 2015 NYC consumed 52836 GWh. So the average power draw is 52836 GWh / 365 days = 6031510 kW . As in, at any given moment in 2015 NYC was on average pulling 6031510 kW or 6.03 GW.

The edison Pearl Street station could output 600kW. (and that's the theoretical peak of all 6x dynamos, probably less output in practice)

6031510 kW / 600 kW = 10052.5 so I think our current consumption is about 10000x higher not 7x-10x higher than the Pearl St station's output!

rzzzt · on March 20, 2023

These are two different power plants. I could not find more info about the one that opened in 1920 and is actually in the article.

varenc · on March 19, 2023

And 100,000kWh / minute is just 6 gigawatts or 6,000,000 kilowatts. Google is great at unit math like this: https://www.google.com/search?q=100000+kWh+%2F+1+minute+in+G...

Journalist consistently use silly or incorrect units when discussing power usage. At least for this article the units aren't flat out wrong, just silly and I can see how "kilowatt-hours per minute" could be a bit more intuitive to readers.

(And don't get me started on how USB battery manufacturers advertise capacity in obtuse units like 27000 mAh @ 3.7 volts instead of just using 99.9 watt-hours or 27 amp-hours.)

greesil · on March 19, 2023

That's the joke

ulber · on Feb 27, 2023

Maybe we don't, but we should let market forces sort it out by doing our best to tax the externalities of power generation too.

AtlasBarfed · on Feb 27, 2023

Let me know when you sign up for a 5$/gallon carbon tax on gasoline.

imtringued · on Feb 28, 2023

That is something like $600 per ton of CO2. For that amount you can literally capture the CO2 from the air today.

AtlasBarfed · on Feb 28, 2023

Alas there is a backlog of a century of emissions to handle. You know, externalizing costs on other people?

At this point we'll need active carbon capture not for neutrality, which is its own political boondoggle, but for actual active mitigation of some of the damage.

Dont worry, nothing like this will come to pass. Pollute away.

ulber · on Nov 10, 2022

The argument the article makes is indeed a bit blunt.

The idea here is that tipping enables better price discrimination, which in turn allows a higher "quantity" of coffee to be produced. With coffee shops that "quantity" will translate to things like 1) higher density or 2) more attractive placement of coffee shops, both of which increase coffee consumption by making it easier to pick one up, or 3) higher quality (think small batch) coffee being produced, which increases the quantity of labor being sold in a cup.

I still don't enjoy this tipping culture, but the argument being made makes sense when you fill in the details.

KptMarchewa · on Nov 10, 2022

Places without enforced tipping culture still seem to have decent amount of coffee places.

ulber · on July 14, 2022

As an aside, Universal Paperclips is the best clicker game I've played and has zero monetization.

https://www.decisionproblem.com/paperclips/

ulber · on July 12, 2022

As a crude point of comparison, you'd be buying this service at about 917 euros/hour. With a pretty lax 1600 hours/year you'd have to make almost 1.5M per year to match the rate.

ulber · on July 10, 2022

This was a great read. I kept expecting it to reference the "Curse of Dimensionality" [1] as an explanation for why we see no close-to-average samples in high-dimensional spaces. I'm guessing the book this is an excerpt from might. In any case, I enjoyed this as a relatable instance of the concept!

[1]: https://en.m.wikipedia.org/wiki/Curse_of_dimensionality

ulber · on July 10, 2022

I get the point but also fully disagree. At the point where computer cognition is good enough for humans to truly fall in love, build friendships or come to hate as an enemy, I personally think not recognizing it as AI will become a fringe belief.

ulber · on May 31, 2022

I feel this description overstates the freedom. Academia is full of rather hard deadlines, as the conference you're targeting a paper for might be the only fitting one for many months. Missing these deadlines can delay the whole "pipeline" of research, as you'd like to build on your existing work and properly cite it, but publishing subsequent work might make the original result harder to publish. Grant application deadlines are similarly hard.

In contrast, deadlines in engineering are often even not expected to be hit.

The "freedom" to travel for conferences is an integral part of your job as a researcher: either you network and sell your ideas, or you stay in obscurity. Of course travelling on someone else's dime can be fun, but the same is true of all business travel. It stops being fun the moment it becomes a chore and you'd rather be home putting time into your hobbies or family. Then it's just more work.

If writing grant applications is your hobby and you're married to your research, academia can be great, but the freedom doesn't include a balance with all the other parts of life. I know I'm not fully contradicting you, there is indeed a lot of freedom to choose what you work on. I just think it's important that people considering academia understand what the job actually consists of.

j7ake · on May 31, 2022

I agree with you. Academia gives freedom with how you structure your time, but you still need to put in the 60+ hours a week. Compared to other careers that require 60 hour weeks, however, academia is the most flexible.

Importantly though, within those 60 hours, you’re free to choose which conference to go, what kind of grant you want to write, and what kind of people you want to work with. I think this autonomy and self imposed goals is what leads in part to workaholism.

ulber · on May 12, 2022

You could also train for this kind of interconnectedness by designing tasks that are explicitly multi-modal. For example, you could:

- Stack boxes collaboratively by controlling your own arm and communicating with another agent helping you.

- First produce a plan in text that another agent has to use to predict how you're going to control the arm. You'd get rewarded for both stacking correctly and being predictable based on the stated plan.