GPT

Precursor

  1. Proximal Policy Optimization (PPO) - an RL algorithm, PPO is better than state-of-the-art approaches while being much simpler to implement and tune and is the default reinforcement learning algorithm at OpenAI.

  2. Learning from human preference (human in the loop) - a method used to infer what humans want by being told which of two proposed behaviors is better.

  3. instructGPT - arguably better at following user intentions than GPT-3 while also making them more truthful and less toxic, using human in the loop.

Articles

  1. what is chatGPT doing and why does it work? explaining next word prediction in detail.

Competitions

Tools

Last updated