I. Transformers
II. How Feedforward Transformers Work Compared to the Brain
III. Predictive Coding in Neuroscience
IV. Vector Context and Tokens
V. Model Reasoning and Basins of Attraction
VI. AI–Neural Correspondence
VII. The Role Emotions Play in Decision Making
VIII. Memory
IX. AI History: The Neural Network
X. Symbolists vs Connectionists
XI. Miscellaneous
LLM transformer architectures share almost no design heritage with biological neural circuits. The convergences were empirical surprises.
A Transformer is a type of neural network architecture designed to process sequences (text, code, audio, DNA, etc.) by letting every element in the sequence look at every other element and decide what matters. The key mechanism is attention to determine which parts of the input matter for predicting the next output.
Instead of reading information strictly left-to-right like older models, a Transformer can consider the entire context at once and weigh which parts are relevant.
A Transformer does two main things: