Hacker News
new
|
past
|
comments
|
ask
|
show
|
jobs
|
submit
login
Ask HN: Training a model on all HN data?
3 points
by
tmaly
9 months ago
|
hide
|
past
|
favorite
|
4 comments
I just had a thought, maybe dang could chime in. Has anyone considered training a model or fine tuning a model on all of hacker news discussions?
minimaxir
9 months ago
|
next
[–]
It's relatively straightforward to download all HN submissions/comments via BigQuery and then finetune an LLM, there's just not much
point
to it.
You can safely assume all modern LLMs have been trained in part on HN data.
anigbrowl
9 months ago
|
prev
|
next
[–]
HN was part of the training set for ChatGPT. But it might be interesting to train/fine tune on HN alone. You could weight by karma or conversely you might identify shortcomings in the karma system.
minimaxir
9 months ago
|
parent
|
next
[–]
Comment vote data is not public, which is the data you would need to make such a system useful.
pavel_lishin
9 months ago
|
prev
[–]
To what end?
Guidelines
|
FAQ
|
Lists
|
API
|
Security
|
Legal
|
Apply to YC
|
Contact
Search:
You can safely assume all modern LLMs have been trained in part on HN data.