what if i want to finetune with long documents? say AI papers that are ~10 pages long on average? how would they be tokenized given that max_seq_length is 512?
> What is the "output" when the input is just a (logically coherent) chunk of text?
It probably won't change much if it's just a single sample. If you put in a large corpus of samples that repeat on the same theme, then the model will be "tuned" to repeat that theme. If you increase the number of epochs, you can overtrain it, meaning that it will just spit out the training data text.