What Is ChatGPT Doing … and Why Does It Work?

Snicklefritz

"I go among trees and sit still."
Supporter
Joined
Feb 3, 2023
Messages
151
Location
Lancaster, Pennsylvania, USA
This is an excellent, if a bit wonky, article on how AIchat does what it does.

.. purpose here is to give a rough outline of what’s going on inside ChatGPT—and then to explore why
it is that it can do so well in producing what we might consider to be meaningful text


 

CultureCitizen

Well-Known Member
Joined
Feb 14, 2023
Messages
120
This is an excellent, if a bit wonky, article on how AIchat does what it does.

.. purpose here is to give a rough outline of what’s going on inside ChatGPT—and then to explore why
it is that it can do so well in producing what we might consider to be meaningful text


And he is just covering part of the architecture. As good as that explanation is it doesn't cover the "Reinforcement learning" part.


Methods
We trained this model using Reinforcement Learning from Human Feedback (RLHF), using the same methods as InstructGPT, but with slight differences in the data collection setup. We trained an initial model using supervised fine-tuning: human AI trainers provided conversations in which they played both sides—the user and an AI assistant. We gave the trainers access to model-written suggestions to help them compose their responses. We mixed this new dialogue dataset with the InstructGPT dataset, which we transformed into a dialogue format.


To create a reward model for reinforcement learning, we needed to collect comparison data, which consisted of two or more model responses ranked by quality. To collect this data, we took conversations that AI trainers had with the chatbot. We randomly selected a model-written message, sampled several alternative completions, and had AI trainers rank them. Using these reward models, we can fine-tune the model using Proximal Policy Optimization. We performed several iterations of this process.
 

Harpo

Please revive in 2035
Joined
Sep 23, 2006
Messages
2,687
Location
HS2 0RL
GPT4:

1679905586678.jpeg
 

Top