Hacker News
new
|
past
|
comments
|
ask
|
show
|
jobs
|
submit
login
prats226
78 days ago
|
parent
|
context
|
favorite
| on:
Building better AI tools
Interestingly, deepseek paper mentions RL with process reward model. However they mentioned it failed to align model correctly due to subjectivity involved in defining if the intermediate step in process is right or wrong
Consider applying for YC's Winter 2026 batch! Applications are open till Nov 10
Guidelines
|
FAQ
|
Lists
|
API
|
Security
|
Legal
|
Apply to YC
|
Contact
Search: