Also, the different version of the same og Colab didn't make a 135M model to learn the XML tags, so do you think 8 billion should be the minimum for use this?
Yep - bigger models should help! Definitely give Llama a try!
Also, the different version of the same og Colab didn't make a 135M model to learn the XML tags, so do you think 8 billion should be the minimum for use this?