Thousand-Brain Systems: Sensorimotor Intelligence for Rapid, Robust Learning and Inference: A Plain-Language Explainer
Read the paper here: https://arxiv.org/abs/2507.04494
Meet Monty: The Future of Machine Intelligence Rooted in Neuroscience
Imagine a toddler exploring a living-room table. She reaches out, taps a mug, runs a finger along its rim, peeks inside, and very quickly learns this mysterious new object without anyone showing her a giant photo dataset of mugs. Monty works the same way.
Where Monty comes from
For two decades Jeff Hawkins and colleagues studied how the neocortex learns by moving and sensing. In 2025 those ideas took shape in the Thousand Brains Project’s first working system, named Monty.
How Monty learns
Monty doesn’t stare at still pictures like existing AI technologies. It moves (turns a camera, slides a fingertip sensor), senses tiny “patches” of the world, and stitches them together, until it learns an entire representation of the object.
A learning module learning a new object.
Reference frames – Monty’s maps
Each object gets its own invisible 3-D grid, a “you-are-here” map that only exists for that object. As Monty moves, it updates where the sensor is on the grid, so it can predict, “If I slide left 2 cm, I’ll feel the handle.”
An object stored in a reference frame and a path showing how one sensor patch might be moving over that object.
The Core Idea: Learning and Recognizing by Sensing and Moving
Monty learns through interaction. It moves over objects, senses small patches of them, and builds up structured internal models. These models aren’t pixel maps or statistical correlations. They’re 3D representations grounded in space, tied to how the system moves and what it senses at each location.
Each sensory observation comes from a small patch of the object. On its own, that patch is almost always ambiguous. Monty therefore integrates information through movement. By combining many of these patches over time with their relative locations, Monty forms a structured model of the sensed object inside a reference frame. That frame is like a 3D map tied to the object itself.
A learning module learning a visual model of a cup.
Inference also works through sensorimotor interaction. Monty maintains multiple hypotheses about what object it might be sensing and where it is in that object’s reference frame. Each new observation gets compared against those models. If the observation matches what’s expected at a given location, that hypothesis gets stronger. If it’s inconsistent, it weakens.
Movement is essential as local sensory input isn’t enough to identify an object. But once Monty starts moving, it can rapidly eliminate incorrect hypotheses.
This is a core difference from most modern AI systems. Monty acts and tests its hypothesis space against reality. It can have multiple possible hypotheses if it hasn’t seen enough disambiguating input yet. It also has a most likely hypothesis and associated confidence at any point in time, allowing it to act even on incomplete information. And because its knowledge is organized by reference frames and updated locally, it can learn new things rapidly without forgetting old knowledge. It doesn’t need to retrain, and it doesn’t forget when the input distribution changes.
A learning module in Monty disambiguating what it is sensing through hypothesis testing.
Monty in Action: What the Figures Reveal
Robust by Design
Even when we distort the visual feed with heavy noise, or flip an object to an orientation Monty has never seen, the system can still accurately classify the object’s ID and orientation. It builds a 3‑D reference‑frame model from only 14 single‑colour views, yet still recognises the same shape in new colors and from novel points of view. Out-of-distribution generalisation is something conventional vision networks stumble on.
A Human‑Like Shape Bias
Monty groups similar objects together based primarily on morphology.
Monty cares more about the shape of an object than the surface paint. That mirrors the shape bias we see in human cognition and stands in contrast to the texture‑driven bias of deep‑learning vision transformers that leave them open to adversarial attacks.
Spotting Symmetry on the Fly
Apart from random rotations (“rand”, rightmost bar)) – the cups along the bottom look identical, but are in fact rotated. Monty has identified the first three orientations as being symmetric. The low Chamfer distance proves that these orientations are in fact very similar to the ground truth orientation.
Without ever being told an object has mirror symmetry, Monty automatically infers this from its observations. This is a practical advantage for real‑world interaction. While recognizing when a rotation of an object is symmetric or not seems self-evident to us as humans, this is a surprisingly difficult property to bake into deep-learning systems.
Moving to Know Faster
Monty can perform principled actions using model-free and model-based policies. Using those increases accuracy and reduces the number of steps required for a confident classification.
Monty uses its internal models and hypotheses to disambiguate between a fork and a spoon by moving to the head of the spoon.
Disambiguating the pose of a cup by moving to where Monty’s most likely hypothesis expects the handle to be.
Monty can leverage its learned models to intelligently move in the world. It can use its hypotheses to perform movements that test them, disambiguating them for faster inference. In the same way, models could be used for many other kinds of policies.
Many Sensors, One Neocortex
Monty can leverage multiple sensors for faster inference. Due to the common language between learning modules, even LMs that receive input from different modalities could communicate with each other. While Monty can recognize objects and their poses reliably with just one small sensor patch, it can do it faster when using more sensor patches.
What’s Broken in Today’s Deep-Learning Systems?
Deep Learning and Thousand Brains Systems are built using very different architectures. Deep Learning started with the simplistic idea of a point neuron, a way of simulating a neuron and then connected tens of millions of them into what we now know as services such as ChatGPT and Midjourney.
The approach that deep learning is founded upon has led to some impressive results but is intrinsically limited in a number of ways.
Requires massive, often supervised datasets
Training is offline, centralized, and compute-intensive, often taking days or weeks on large GPU clusters
They suffer from catastrophic forgetting in continual learning settings, meaning they constantly overwrite past knowledge they have internalized
Model parameters are therefore typically fixed post-training, and incorporating significant amounts of new knowledge requires retraining on the entire dataset
Generalization is fragile, with distributional shift leading to unusable or hallucinated outputs
Relies on static feedforward mappings rather than interactive, closed-loop learning
Lacks an internal sense of space, objectness, or movement, treating perception as flat input rather than something grounded in a body moving through a world
Learning and representation of knowledge is fundamentally different from how it is done in our brains, which leads to them making mistakes humans would not make (just Google ‘adversarial attacks’)
Conversely, Thousand Brains systems are built upon high level principles that have been deduced from the structure of the neocortex. Here we show how that can solve all the above problems.
Monty vs. Vision Transformers: David Beats Goliath
Monty is a new paradigm of intelligence and this shines through in our comparison with deep learning vision transformers. In the following charts we’re measuring floating point operations or FLOPs. This is a standard way of looking at the compute efficiency of a system.
Lets look at the results.
Training Efficiency
Monty requires thirty-three thousand times less computations than a vision transformer, and if you compare it against pretraining + finetuning, it requires five hundred and twenty seven million times less computations, that is eight orders of magnitude!
Figure comparing the amount of FLOPs required for training Monty vs a Vision Transformer network.
Monty can also learn quickly from very little data. Just seeing each object once leads to performance significantly above chance (Monty ~50%, ViT at ~1% accuracy). After seeing the object in 8 orientations, Monty already has an accuracy of around 90%. A ViT that saw the same amount of data still has a performance close to chance (~1-2%). If the ViT repeatedly trains on the same data 75 times, it can perform a bit better, but its accuracy increases at a rate ~4x slower than Monty. The ViT can only reach comparable accuracy to Monty when pretrained on millions of images and finetuned for several epochs on the small dataset.
Here we see Monty compares favourably to a highly tuned ViT with 25 epochs worth of data from the specific task and millions of images shown during pre-training. If we give a ViT the same amount of data as Monty is getting (1 epoch) it performs at chance.
Efficiency and Accuracy During Inference
What good is training on a tiny amount of data if you can’t then recognise the objects you have learned? The below chart shows that inference with Monty still uses less FLOPs than a vision transformer performing the same task, while also being markedly better in terms of accuracy.
Chart showing the average amount of FLOPs used in recognizing an object.
Here we see Monty accurately and efficiently identifying previously unseen rotations of an object. ViTs, even those pretrained on millions of examples, struggle to identify object rotations as the task is too far out of their pre-training distribution.
Continual Learning and Catastrophic Forgetting
An amazing property of a Thousand Brain system is that the objects learned are stored in their own reference frames. Updates to an object’s model are local to that object’s reference frame and the location within it. This means that as you learn more objects, they do not interfere with the memory of other objects. This is not the case with deep learning networks, as the gradient backpropagation algorithm performs global updates to all weights.
The chart below shows a comparison between Monty and a Vision Transformer and how their accuracy changes as new objects are given to the system to learn. Without continued exposure to previously learned objects, deep learning systems will experience “catastrophic forgetting”. As you can see, Monty holds its accuracy at near 100%, while the Vision Transformer drops drastically after only a few new objects are added to its training set.
A chart showing catastrophic forgetting in a Vision Transformer and continual learning in Monty.
What Next? How to use Monty, contribute to, and follow the project.
Monty exists because we believe that Thousand Brains Systems will be the future of AI and robotics. If you’ve ever wondered what a neocortex-inspired learning system performs like on your own machine, you’re in exactly the right place.
The Thousand Brains Project is more than an implementation of an idea. It’s a collaborative, open research, open-source, non-profit dedicated to bringing about this new future. Our roadmap lays out the next set of challenges. Discussion on Discourse, detailed RFCs in the repo, and our documentation give you a voice in design debates and performance deep-dives and everything in between. You can also jump in and try a tutorial to get started right away.
So whether you’re here to test Monty in a different setting, write documentation, or help steer the future of AI, the links that follow will guide you to getting involved.
Follow the Project
Our Bluesky and Twitter accounts for important updates – https://bsky.app/profile/thousandbrains.org & https://x.com/1000brainsproj
Discourse Forum – ask questions, share experiments, show your work – https://thousandbrains.discourse.group
GitHub – star & watch the repo to get commit and release notifications – https://github.com/thousandbrainsproject/tbp.monty
The website for info, socials and newsletter sign up – http://thousandbrains.org/
YouTube playlists for our quick-start series, deep-dive seminars, and recorded research meetings – https://www.youtube.com/@thousandbrainsproject
Read more of our published papers here: https://thousandbrainsproject.readme.io/docs/further-reading#our-papers
Roadmap Highlights
The live roadmap tracks everything that we plan to work on over the next few years; tasks are tagged by Monty component and capability so you can jump in where you add the most value.
Tip: grab the planning spreadsheet linked at the top of the roadmap page for a view of what’s done, in-progress, and up next.
https://thousandbrainsproject.readme.io/docs/project-roadmap