There is a fairly brief technical report discussing the training etc [0].
The abstract: "This preliminary technical report describes the development of GPT4All, a chatbot trained over a massive curated corpus of assistant interactions including word problems, story descriptions, multi-turn dialogue, and code. We openly release the collected data, data curation procedure, training code, and final model weights to promote open research and reproducibility. Additionally, we release quantized 4-bit versions of the model allowing virtually anyone to run the model on CPU."
The abstract: "This preliminary technical report describes the development of GPT4All, a chatbot trained over a massive curated corpus of assistant interactions including word problems, story descriptions, multi-turn dialogue, and code. We openly release the collected data, data curation procedure, training code, and final model weights to promote open research and reproducibility. Additionally, we release quantized 4-bit versions of the model allowing virtually anyone to run the model on CPU."
[0] https://tinyurl.com/yj4yj8xe