[Home]
[Parent directory]
[Search]

unimportant/software_and_ai/rl_for_llm.html


2025-04-14

RL for LLM

I'm writing this more for my own understanding than to teach anyone.

LLM solves following problem

RL for LLM solves following problem

Major doubt I have:

In order to do RL for LLM, you have to train the following:

How to train these models:

RL for safety

Subscribe

Enter email or phone number to subscribe. You will receive atmost one update per month

Comment

Enter comment