Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Great, thanks for the clarification.

And how does the NN represent the token at the output layer? Is it a binary representation of the token number?

Or does it have a neuron for each token it knows and ChatGPT takes the most activated neuron as the answer?



Tokens are integers that map to text tokens.

Tokens are part of words, approx 4 characters or 75% of word.

It gives a list of tokens with their probabilities on output.

It's a short list with highest probabilities.

Temperature controls which tokens to pick - usually 0% = top one only (consistent results), closer to 100% means more randomness (more "creativity").


Since we’re here: Does a “resused” token count as a second token?

For example: if you limited all inout/output to the same 100 words, could you stay within the token limit permanently?


so a glorified Markov chain?


Yes, in the same sense a modern digital camera is a glorified photodiode. In both cases, light comes in, voltage comes out, and we can use it to count how much light came in.


Why stop there, it's just ones and zeroes.

It's "glorified markov chain" in the same sense that sqlite is just "glorified bubble sort".


Don't you know that "attention is all you need"? Attention is non-markovian. It's all-to-all with some masking, not a chain.


Basically the latter


The tokenization algorithms I encountered all had around 50000 tokens, which fits nicely into (and makes good use of) a 16-bit number. Is this just a coincidence or does it have advantages for the token to be a 16-bit representable number?


I suspect it being 16 bit instead of 32 bit means more of them can get packed more tightly. Some instructions can operate on them in parallel.

But I personally think it's a coincidence, and it just so happens that 50k tokens are enough for the level of complexity the models have right now.


Probably a coincidence. The GPT-4 and GPT-3.5 tokenizer has 100k tokens.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: