Yes, in the same sense a modern digital camera is a glorified photodiode. In both cases, light comes in, voltage comes out, and we can use it to count how much light came in.
The tokenization algorithms I encountered all had around 50000 tokens, which fits nicely into (and makes good use of) a 16-bit number. Is this just a coincidence or does it have advantages for the token to be a 16-bit representable number?
And how does the NN represent the token at the output layer? Is it a binary representation of the token number?
Or does it have a neuron for each token it knows and ChatGPT takes the most activated neuron as the answer?