> For instance, why not use whole words as tokens?
Word-only tokenizers what people did in the RNN/LSTM days. There's no functional improvement over tokenization schemes like BPE or even WordPiece/SentencePiece, and it results in worse quality since you can't use meaningful semantic hints such as punctuation.
Word-only tokenizers what people did in the RNN/LSTM days. There's no functional improvement over tokenization schemes like BPE or even WordPiece/SentencePiece, and it results in worse quality since you can't use meaningful semantic hints such as punctuation.