Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Ask HN: Training a model on all HN data?
3 points by tmaly 9 months ago | hide | past | favorite | 4 comments
I just had a thought, maybe dang could chime in. Has anyone considered training a model or fine tuning a model on all of hacker news discussions?


It's relatively straightforward to download all HN submissions/comments via BigQuery and then finetune an LLM, there's just not much point to it.

You can safely assume all modern LLMs have been trained in part on HN data.


HN was part of the training set for ChatGPT. But it might be interesting to train/fine tune on HN alone. You could weight by karma or conversely you might identify shortcomings in the karma system.


Comment vote data is not public, which is the data you would need to make such a system useful.


To what end?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: