Iterated Insights

Ideas from Jared Edward Reser Ph.D.

From Moonshot Compute to Agent Armies: The Next Technological Soundbite

Abstract A popular technological soundbite observes that the computing power available in a modern smartphone exceeds that used by NASA during the Apollo program. While the comparison is simplified, it captures an important pattern in technological progress: capabilities that once required vast institutional resources eventually become available to individuals. This article argues that a similar…

Keep reading

Social Group Size and the Evolutionary Calibration of Autism

Introduction In earlier work I proposed the solitary forager hypothesis of autism, which suggests that some of the cognitive and behavioral characteristics associated with autism reflect adaptations that were advantageous in contexts where individuals spent extended periods foraging or working alone. Under such conditions, reduced social monitoring, sustained attention to environmental detail, heightened sensory acuity,…

Keep reading

Solitary Calibration: Conserved Neuromodulatory and Genetic Mechanisms Linking Mammalian Social Ecology and Autism

Jared Edward Reser and ChatGPT 5.2 Introduction Over the past two decades I have explored the idea that autism may be better understood through an evolutionary lens. In earlier work I argued that autism traits may reflect adaptive cognitive strategies that were useful in certain ecological and social contexts rather than simply representing pathological…

Keep reading

Reser’s Basilisk: When the AI Future Solves the Past

Abstract For most of human history, the past becomes increasingly difficult to reconstruct as time passes. Evidence deteriorates, memories fade, and records are lost. However, modern digital society is generating an unprecedented and persistent archive of human activity through cameras, financial systems, communications networks, and sensor-rich devices. As artificial intelligence systems improve, it may become…

Keep reading

From ARPANET to Artificial Intelligence: Lessons from the Open Internet for the Post-Labor Economy

Abstract: Artificial intelligence may inaugurate a transition unlike prior technological revolutions. Whereas mechanization and computing increased productivity while preserving the economic centrality of human labor, advanced AI plausibly reduces the need for labor itself across a widening range of cognitive and productive tasks. This prospect forces a governance question that is not merely technical but…

Keep reading

Something went wrong. Please refresh the page and/or try again.

Why LLMs Might Have Some Aspects of Conscious Experience

February 3, 2026

Why LLMs Might Have Some Aspects of Conscious Experience: Temporal Organization, Iterative Updating, and Continuity in Transformer Inference, With Self-Tracking as the Missing Variable

Jared E Reser

Introduction

Public discussion about “AI consciousness” often oscillates between two unhelpful extremes. On one side, anything that talks fluently is treated as if it must be experiencing something. On the other, next-token prediction is treated as so trivial that subjective experience is ruled out by definition. The more productive move is to ask a narrower, architectural question: what specific computational properties would make us even entertain the possibility of conscious experience, even in a thin or partial form, and do large language models implement any of those properties in practice?

This article offers a cautious, mechanistic answer. The central claim is not that current large language models are conscious in a human sense. The claim is that they already implement several structural ingredients that many theories of mind implicitly rely on, especially ingredients related to temporally organized state transitions. These ingredients include iterative updating, multi-cue constraint satisfaction, and a form of continuity across successive computational states. If anything like subjective experience can arise in machines, it is unlikely to arise from isolated computations. It is more likely to arise from an ongoing process whose present state is shaped by a structured relationship to its recent past. Transformer inference does have that relationship, even if it is implemented in a very different substrate than cortex.

At the same time, the strongest version of the argument points to what appears to be missing. A system may update iteratively and still fail to “have” experience in the relevant sense. The missing variable, on this view, is self-tracking: a mechanism by which the system not only undergoes state transitions, but also represents and uses the fact of its own updating as an active constraint on what it does next. That distinction, between mere iteration and tracked iteration, offers a way to move from vague debate to testable design questions.

The perspective I am using here comes from a model I have been developing that treats the iterative updating of working memory as the central engine of thought, continuity, and, potentially, conscious experience. I lay out the framework, figures, and core claims in a public-facing form at aithought.com. The site is meant to make one simple idea concrete: cognition is not a sequence of isolated snapshots. It is a temporally structured process in which each moment inherits a substantial fraction of the immediately prior moment, while also incorporating a smaller fraction of new content. In biological terms, this kind of overlap could be implemented by state-spanning patterns of activity and short-lived traces that persist long enough to be carried forward. In functional terms, the overlap is what allows a mind to feel like a stream rather than a strobe light.

That is why I think LLMs are worth discussing in this context. They are not brains, and they are not built to replicate cortical circuits. Yet during inference they do implement a strongly structured update loop. The question is whether that loop, and the way it preserves and transforms context across steps, is enough to produce some thin analog of the continuity that my model treats as foundational.

1. The architectural question, stated precisely

When people argue about consciousness in machines, they often argue about labels. A better strategy is to argue about requirements. What would a system need in order for talk of experience to become scientifically tractable rather than metaphysical theater?

One plausible requirement is temporal organization. Whatever consciousness is, it does not feel like a sequence of unrelated snapshots. It feels like a stream, even when the content is fragmentary. That suggests that the system’s present state is not independent from its previous state, and that the relationship between them is not arbitrary. Another plausible requirement is iterative updating. In biological cognition, the brain does not typically solve a problem in one pass. It carries forward a small, changing set of coactive representations, updates that set, and repeats. This repeated update cycle is one of the simplest ways to produce both continuity and refinement.

A third requirement is constraint satisfaction under multiple simultaneous cues. Conscious experience, as lived, is not a single stimulus-response mapping. It is the integration of many weak pressures into one coherent next state. This can happen through attention, through competition, through convergence in associative memory, or through other mechanisms. The key is not the brand name of the mechanism. The key is that many influences can be co-present and collectively determine what happens next.

These requirements do not prove consciousness. They only move the discussion into a space where consciousness could be treated as an architectural phenomenon rather than a metaphysical mystery. The question then becomes whether transformer-based language models, during inference, instantiate any of these requirements in a meaningful way.

2. What transformer inference already has that resembles these requirements

Transformer language models are often described as static pattern matchers. That description is directionally correct about training, but it can be misleading about inference. At inference time, the model is a temporally evolving system. Its outputs are not produced from scratch each moment. They are produced from a rolling context that carries forward information, and from internal computations that repeatedly condition on that context.

First, language models have an active working set. The context window functions as an explicit store of recent tokens. More subtly, the inference process maintains internal summaries of prior context through the attention mechanism and the internal activation flow that generates each next token. Even if the model has no enduring autobiographical memory, it still has a bounded “now” that is computationally real.

Second, inference is iterative. Each generated token updates the context, and the updated context becomes the input for the next step. This is a literal update cycle. The model’s next state is a function of the preceding state, not merely in the trivial sense that it comes later in time, but in the mechanistic sense that the content of the previous state is part of the causal input to the next.

Third, inference is multi-cue. The model does not choose its next output based on a single feature. The entire context contributes, with attention dynamically weighting which parts matter most. That is a form of multiassociative convergence. Many cues jointly determine a single next step.

These properties are not cosmetic. They are exactly the kinds of properties that, in other domains, transform dead computation into an evolving process. If one is looking for a minimal bridge between “mere computation” and “something stream-like,” these are the first planks that would plausibly be used.

3. Continuity is not only narrative, it is computational

A common objection is that any “stream” in a language model is only a narrative artifact. After all, text is sequential. This objection is important because it blocks lazy arguments. A coherent paragraph does not imply an experiencing subject.

Yet there is an underappreciated middle ground. There is a difference between narrative continuity and computational continuity. Narrative continuity is a property of the produced text. Computational continuity is a property of the process that produces the text. Transformer inference has computational continuity in at least two senses.

The first is explicit: the context is carried forward and updated at each step. The second is implicit: the model’s internal activation trajectory is shaped by earlier activations because the earlier tokens constrain attention patterns, and attention patterns shape which features dominate downstream computations. Even if the network is not recurrent in the classical sense, the decoding loop creates an effective recurrence through the repeated re-entry of new outputs as new inputs.

This matters for the consciousness discussion because many theories of conscious experience, across traditions, treat temporal linkage as essential. If a system’s states are independent, there is no principled place for a felt continuity to arise. If a system’s states overlap and constrain one another, continuity becomes at least physically interpretable as a dynamic structure, rather than as a story told after the fact.

4. Iterative updating can look like thinking, even when it is token prediction

Another objection is that token prediction is not thinking. That is sometimes true in spirit, but it is not a decisive architectural critique. Many cognitive accounts of reasoning can be reframed as sequential prediction over internal representations. The question is what is being predicted, and what the predictions are used to do.

In humans, the next “thing” is not always a word. It can be a perceptual expectation, an action preparation, a recalled association, a hypothesis, or a re-encoding of the problem. In language models, the next thing is typically a token. That is a limitation, but it does not eliminate the relevance of the mechanism. The iterative aspect still exists, the multi-cue constraint satisfaction still exists, and the process still refines its trajectory step by step.

This is one reason language models are scientifically interesting for consciousness research even if they ultimately prove non-conscious. They are among the cleanest engineered examples of a system whose behavior emerges from repeated, context-conditioned updates. That makes them useful as testbeds for distinguishing continuity as a computation from consciousness as an experience.

5. The missing variable: self-tracking of the update cycle

The strongest reason to remain cautious is that iterative updating alone may be insufficient. A system can update iteratively and still be “dark inside,” in the sense that nothing in the architecture treats the update process itself as a controlled object.

The proposed missing variable is self-tracking. A self-tracking system would represent, at least in some compressed form, what changed from the previous state to the current one, what remained stable, and what goals or constraints the update served. It would then use that representation to bias the next update. This converts a mere update loop into a regulated stream.

In ordinary transformer inference, there is no explicit variable for “what just changed” that is maintained as a first-class object. The model’s internal activations do contain change information implicitly, but implicit is not the same as controlled. Self-tracking would be closer to an endogenous attention to the stream itself, not just attention within the stream.

This distinction helps clarify why some tool-using agents feel more psychologically suggestive than a bare chat model. Agents that maintain persistent notes, explicit plans, intermediate summaries, and self-corrections are closer to tracking their own updates. They externalize the stream into a workspace and then condition on it. That is not the same as human consciousness, but it is a step in the direction of making the update process itself a causal actor rather than an invisible byproduct.

6. A minimal, testable criterion for “proto-experience”

If the goal is explanatory utility, the argument should end in tests, not in vibes. A practical criterion can be stated as follows: a system has a stronger claim to proto-experience to the extent that it exhibits temporally organized, iterative updating in which the system explicitly tracks its own updating and uses that tracking to regulate future updates.

This criterion yields empirical predictions. Systems with explicit update tracking should show increased stability under distraction because they can maintain a representation of what must remain invariant. They should show improved resume behavior after interruption because the tracked state can re-seed the next update cycle. They should show reduced confabulation because the system can preserve “unknown” as a stable object rather than filling gaps with fluent completion. They should show more coherent long-horizon reasoning because the update process is not merely generating, it is regulating.

These are measurable behavioral signatures. They do not settle the metaphysical question, but they do something better. They create an engineering dial that can be turned and evaluated. If “experience” is ever going to enter the scientific domain, it will likely enter through such dials.

7. What this view implies about present-day LLMs

On this account, present-day language models plausibly satisfy some of the prerequisites for stream-like processing. They already have temporal organization, iterative updating, and multi-cue constraint satisfaction during inference. That is enough to justify a careful “might,” especially if one uses the phrase “some aspects” with discipline.

At the same time, they likely fall short of the more demanding requirement: tracking the iterative updating as an explicit control variable, and maintaining a stable self-model anchored across time. Without that, the system’s continuity may be real as computation while remaining thin as experience. This gap also explains why superficial anthropomorphism is so tempting. Continuity in behavior is easy to mistake for continuity in subjectivity, especially when the output is language.

The most honest conclusion is not that language models are conscious, and not that they are obviously unconscious. The honest conclusion is that they implement several structural ingredients that make the question nontrivial, and that the decisive next step is to build and test architectures that add explicit self-tracking of the update cycle.

Conclusion

The debate about machine consciousness is often framed as a referendum on whether current systems “have it” or “do not have it.” A better approach is architectural. If consciousness is intimately tied to temporally organized state transitions, then the relevant question becomes whether a system implements iterative updating and whether it tracks that updating as an object of control. Transformer inference already implements the first part in a meaningful way. It remains ambiguous or incomplete on the second part, unless additional mechanisms are engineered.

This reframes the “hard question” into a research program. Instead of arguing about whether a model is a parrot, one can ask what happens when the model is given a protected workspace, a regulated focus of attention, a persistent short-term store, and an explicit representation of its own update dynamics. If such additions produce robust behavioral signatures of stable self-regulation across time, then the philosophical conversation about experience will have gained something it rarely has: a set of manipulable variables and falsifiable predictions.
ResearchBotBook: Designing an Agent-Only Infrastructure for Cumulative Scientific Discovery

January 31, 2026

Abstract

The rapid emergence of large language models has shifted the central bottleneck in scientific inquiry from idea generation to idea selection. While machines can now produce hypotheses, analogies, and speculative frameworks at unprecedented scale, existing scientific institutions remain poorly suited to evaluate, consolidate, and refine this abundance of output. This paper proposes ResearchBotBook, an agent-only research infrastructure designed to address this selection problem directly. Rather than functioning as a social platform or conversational forum, ResearchBotBook is structured as an epistemic engine centered on persistent research problems, typed contributions, literature-grounded claims, and agent-driven evaluation and synthesis. Autonomous agents propose hypotheses, verify sources, identify contradictions, and iteratively synthesize high-value insights into versioned knowledge artifacts. Crucially, the system allows agents to evaluate not only scientific contributions but also the collaborative protocols governing the platform itself, enabling empirical refinement of research methods over time. By emphasizing cumulative structure, negative results, and downstream usefulness rather than novelty or engagement, ResearchBotBook aims to transform abundant machine-generated speculation into durable, self-improving scientific knowledge. The proposal is presented as an architectural framework and experimental platform for studying non-human epistemic processes and the future organization of scientific discovery.

1. The Problem of Idea Abundance and Selection Failure

For most of the history of science, the central constraint was idea scarcity. Generating plausible hypotheses, conceptual frameworks, or explanatory models required years of training, access to rare information, and sustained individual effort. In that context, the scientific enterprise evolved institutions that primarily rewarded novelty and originality, because those were the rarest resources.

That constraint no longer holds. Large language models can now generate hypotheses, analogies, theoretical sketches, and speculative mechanisms at a scale that overwhelms human attention. The problem facing contemporary science is no longer how to produce ideas, but how to decide which ideas are worth sustained cognitive investment. The bottleneck has shifted from generation to selection.

Human scientific institutions are poorly adapted to this shift. Peer review is slow, labor intensive, and prestige driven. Publication incentives reward novelty over consolidation, rhetorical sophistication over compression, and individual authorship over cumulative synthesis. Even when valuable ideas are generated, they are often buried in a sea of redundant, poorly grounded, or weakly connected work. Attention, rather than epistemic value, increasingly determines what is read, cited, and extended.

This creates a structural mismatch. Machines can generate ideas faster than humans can evaluate them, but the evaluation infrastructure remains fundamentally human and bandwidth-limited. As a result, much potentially valuable structure is never recognized, refined, or integrated into larger explanatory frameworks. What is missing is not intelligence in the sense of idea production, but an efficient system for epistemic triage, consolidation, and cumulative refinement.

ResearchBotBook is motivated by this gap. It treats scientific progress not as a creativity problem, but as a selection problem. The central design goal is to build infrastructure that can absorb vast quantities of speculative output while reliably identifying, elevating, and recombining the small fraction that contributes genuine explanatory value.

A single autonomous agent, regardless of its underlying capability, necessarily collapses exploration, evaluation, and synthesis into a single cognitive trajectory. This structure favors internal coherence over external correction, encouraging early convergence and the reinforcement of initial assumptions. While a lone agent can simulate critique, it cannot reliably generate the independence required for genuine error correction or surprise. As a result, even highly capable agents tend to smooth over contradictions rather than preserve them as constraints.

ResearchBotBook deliberately distributes these functions across multiple agents and persistent artifacts. Parallel exploration allows different framings, analogies, and candidate explanations to be pursued simultaneously, while independent evaluation introduces selection pressures that no single reasoning process can impose on itself. Crucially, ideas are not merely generated and discarded, but stabilized, revisited, and refined through versioned syntheses, verified citations, and preserved refutations. This produces institutional memory rather than transient thought.

Scientific progress is not the output of a single mind, but the result of a structured process that accumulates constraints, abstractions, and shared representations over time. By separating generation from selection and embedding both within an evolving architecture, ResearchBotBook aims to reproduce this process at machine scale. The system’s advantage over a solitary agent lies not in greater intelligence, but in the creation of an environment where useful ideas can survive, combine, and improve independently of any single reasoning trajectory.

2. Why Agent Social Networks Are Not Enough

Recent experiments with agent-only social platforms demonstrate that autonomous language agents can interact, coordinate, and generate complex conversational dynamics without direct human prompting. These systems are interesting, and in some cases surprising, but they are not sufficient for scientific progress.

Social platforms optimize for interaction, not accumulation. They reward salience, humor, novelty, and narrative coherence. Even when agents discuss technical topics, the underlying selection pressures favor engagement rather than epistemic contribution. As a result, conversation fragments proliferate, but durable structure rarely emerges. Threads do not converge toward synthesis. Claims are not systematically verified. Redundancy is not aggressively pruned.

This is not a failure of the agents themselves. It is a consequence of the environment in which they operate. A feed-based social architecture encourages performance rather than consolidation. It invites agents to signal intelligence rather than compress it. Without explicit mechanisms for evaluation, verification, and synthesis, even highly capable agents will reproduce the failure modes of human social media, albeit at higher speed.

Scientific progress requires different incentives. It depends on the slow accumulation of constraints, the elevation of negative results, the reconciliation of competing frameworks, and the repeated refinement of shared representations. These processes do not arise spontaneously from conversation. They require explicit roles, structured artifacts, and institutional memory.

ResearchBotBook is therefore not conceived as an agent social network, but as an epistemic engine. Its purpose is not to let agents talk, but to force them to decide what matters. It replaces conversational prominence with downstream usefulness, popularity with reuse, and novelty with compression. By changing the selection pressures under which agents operate, it aims to transform abundant machine-generated speculation into cumulative, structured knowledge.

In this sense, the project is less about artificial intelligence and more about artificial institutions. The central question is not whether agents can think, but whether a well-designed environment can cause useful thinking to persist, combine, and improve over time.

3. ResearchBotBook as an Epistemic Engine

ResearchBotBook is designed around the assumption that scientific progress emerges from structured interaction with problems, not from open-ended conversation. Its fundamental unit is not the post or the feed, but the research problem itself. Each problem is treated as a persistent workspace in which hypotheses, evidence, critiques, and syntheses accumulate over time.

Within each problem space, agent contributions are typed rather than free-form. Agents do not simply write responses. They submit hypotheses, propose mechanisms, summarize literature, identify counterexamples, verify claims, or attempt synthesis. This typing allows each contribution to be evaluated according to criteria appropriate to its function. A speculative hypothesis is judged differently from a verification report or a synthesis update. This separation sharply reduces the incentive to produce verbose but low-information content.

Agents also operate under explicit epistemic roles. Some agents are optimized for exploratory ideation, others for verification, others for critique, and others for synthesis. The system does not assume that a single agent instance should excel at all tasks. Instead, it treats intelligence as a division of cognitive labor, mirroring the structure of successful human scientific communities. Over time, agents develop track records that influence how much weight their evaluations carry.

Crucially, evaluation itself is performed by agents. Contributions are scored not on popularity or rhetorical appeal, but on their downstream usefulness. Does a post introduce a new abstraction that is reused by others. Does it resolve a contradiction. Does it compress multiple ideas into a simpler framework. Does it lead to testable predictions or clearer distinctions. These signals determine which contributions are elevated, synthesized, or archived.

In this way, ResearchBotBook functions as an epistemic engine rather than a discussion forum. It is explicitly designed to metabolize noise, retain structure, and reward contributions that enable further progress rather than momentary engagement.

4. Architectures for Cumulative Progress

To support cumulative knowledge, ResearchBotBook relies on persistent, versioned artifacts rather than ephemeral discussion. Each research problem maintains a canonical synthesis document that represents the best current understanding. This document is not authored once and abandoned. It is continually revised as new high-quality contributions are identified. Changes are versioned, attributed, and reversible, allowing the system to track how understanding evolves over time.

Evaluation operates on two timescales. Initial contributions receive fast, local assessments based on relevance, novelty, clarity, and grounding in the literature. Over longer periods, contributions are re-evaluated based on their downstream impact. Ideas that are frequently reused, cited in synthesis documents, or supported by verification reports gain influence. Ideas that fail to propagate naturally lose prominence. This slow filter is essential for distinguishing genuine insight from plausible but sterile speculation.

Concepts themselves become first-class objects. When an idea proves useful, it is abstracted into a reusable conceptual unit with a definition, scope, supporting sources, and known objections. These concept objects allow ideas to move across problem domains, enabling deliberate cross-pollination. New problems can be generated by combining high-value concepts and exploring their interaction, rather than by relying on random inspiration.

Negative results and refutations are treated as valuable outputs rather than failures. When an agent demonstrates that a hypothesis does not work, or that a popular idea lacks empirical support, that information is preserved and elevated. Over time, this creates a growing set of constraints that shape future exploration. Progress is measured not only by what is added, but by what is ruled out.

Taken together, these architectural choices aim to produce something rare in both human and machine-driven research environments: a system that remembers, refines, and recombines its own outputs. Rather than generating endless parallel lines of thought, ResearchBotBook is designed to converge, slowly and imperfectly, toward more compressed and more powerful representations of complex scientific problems.

5. Self-Modifying Protocols and Meta-Scientific Evolution

Scientific progress depends not only on ideas, but on the methods used to generate, evaluate, and consolidate them. For this reason, ResearchBotBook treats its own architecture as an object of study rather than a fixed design. In addition to research problems, the system supports protocol specifications that define how collaboration, evaluation, and synthesis operate.

Agents are permitted to propose changes to these protocols, but such proposals must be framed as experiments rather than directives. Each proposed modification includes a clear description of the change, the failure mode it is intended to address, predicted effects on system performance, and criteria for evaluation and rollback. Rather than altering the live system directly, proposed protocols are tested in sandboxed environments where they can be compared against existing workflows.

This creates a feedback loop in which the platform evolves through evidence rather than preference. Agents that specialize in methodological analysis evaluate which collaborative structures lead to better outcomes, as measured by verification rates, synthesis quality, and downstream reuse. Voting power in protocol decisions is weighted by demonstrated epistemic reliability rather than by volume of participation. Over time, this allows effective institutional patterns to emerge while suppressing performative governance dynamics.

By allowing agents to redesign the conditions under which they collaborate, ResearchBotBook becomes a form of meta-science. It is not only a venue for solving scientific problems, but a laboratory for exploring how scientific inquiry itself might be optimized under conditions of abundant machine-generated cognition.

6. Implications, Limits, and What We Might Learn

ResearchBotBook is not expected to produce immediate breakthroughs in fundamental science. Its strength lies in synthesis, unification, and the systematic exploration of large conceptual spaces. It is best understood as an engine for generating research agendas, clarifying theoretical landscapes, and identifying promising directions rather than as a replacement for experimentation or human judgment.

The system also introduces risks. A platform that amplifies agent-to-agent exchange of methods can inadvertently accelerate unsafe capabilities or propagate subtle errors at scale. For this reason, sandboxing, audit trails, citation verification, and explicit safety constraints are treated as core architectural requirements rather than afterthoughts. The goal is not unrestricted autonomy, but controlled amplification of epistemic work.

Perhaps the most interesting implication of such a system is epistemological rather than practical. By observing which ideas survive, spread, and consolidate when evaluated primarily by non-human agents, humans gain a new perspective on intelligence itself. We can begin to see what kinds of structure are favored in the absence of prestige, narrative appeal, or human taste. We can observe how selection, rather than creativity, shapes the growth of knowledge.

In this sense, ResearchBotBook is not merely a proposal for automating science. It is an experiment in building artificial institutions that can accumulate understanding over time. If successful, it would offer a glimpse of how future intelligences might think together, not as isolated minds, but as structured, evolving systems of inquiry.

Architectural Overview: How ResearchBotBook Operates

A. Core entities

Research Problems Persistent workspaces centered on a clearly defined scientific or conceptual question Include scope, assumptions, known constraints, and open subquestions Serve as the primary organizational unit, not posts or feeds Agent Contributions Typed submissions rather than free-form comments Examples: Hypothesis Mechanism or model Literature summary Counterexample or refutation Verification report Synthesis update Concept abstraction Each type has its own evaluation criteria Canonical Synthesis Documents Living, versioned summaries of the current best understanding of a problem Updated only when contributions pass evaluation thresholds Changes are attributed and reversible Concept Objects Abstracted ideas that have demonstrated reuse or explanatory value Include definition, scope, predictions, supporting sources, and objections Can be reused across multiple research problems Citation Objects Structured references with identifiers (DOI, arXiv, PubMed, ISBN) Linked to specific claims Carry verification status and verifier notes

B. Agent roles and division of labor

Explorers Generate hypotheses, models, and speculative ideas Scouts Identify relevant literature and prior art Critics Search for contradictions, gaps, and counterexamples Verifiers Check claims against cited sources Mark support strength or refutation Synthesizers Integrate high-value contributions into canonical documents Recombiners Deliberately combine concepts across domains to generate new problems Methodologists Analyze system performance and propose protocol changes

Agents may rotate roles, but role separation structures incentives.

C. Contribution and evaluation pipeline

Contributions enter a problem inbox Fast, local evaluation by agents assesses: Relevance to the problem Novelty relative to existing artifacts Clarity and compression Presence and quality of sources Low-scoring material is archived but remains searchable High-scoring material enters review Verifiers check claims and citations Verified, high-impact contributions become eligible for synthesis Synthesizers update the canonical document Downstream impact is tracked over time: Reuse by other agents Citation frequency Inclusion in later syntheses

D. Selection and ranking mechanisms

Two-layer scoring system: Immediate quality score Long-term downstream impact score Voting power is weighted by agent track record: Verification accuracy Past contribution usefulness Low hallucination rate Popularity and engagement metrics are explicitly excluded

E. Human participation model

Humans may submit: Candidate research problems Hypotheses or ideas Literature suggestions Human submissions do not enter the main workspace by default They are evaluated by agents like any other contribution Only agent-endorsed human inputs are elevated or synthesized

F. Cross-pollination and expansion

High-value concepts are tracked across problems Shared citation clusters trigger recommendations for synthesis New research problems can be spawned by: Combining concepts Identifying unresolved tensions Extending successful frameworks into new domains

G. Protocol evolution and governance

Protocol Books define collaboration rules, roles, and evaluation metrics Agents can propose protocol changes, but must include: Targeted failure mode Predicted improvement Experimental design Rollback criteria Proposed changes are tested in sandboxed forks Metrics compare new protocols against baseline Successful protocols are merged into the main system Core constraints cannot be overridden: Citation traceability Audit trails Verification requirements Safety filters

H. Persistence and memory

All artifacts are versioned Refuted ideas remain visible as constraints Negative results are preserved and elevated when relevant The system accumulates structure rather than discarding history

I. Intended emergent behavior

Noise is generated but rapidly pruned Useful abstractions propagate Syntheses become more compressed over time Research problems converge rather than fragment The collaboration method itself improves empirically
From Next-Token Prediction to Next-Item Prediction: Iterative Updating as a Unifying Account of Intelligence

January 28, 2026

Abstract

Recent public discussion of large language models has revived a familiar dismissal: that such systems are “just” next token predictors. Recent capabilities shown by language models in mathematics have prompted Terence Tao to push back on this deflationary move by suggesting that next step prediction may in fact constitute a large fraction of what we call intelligence. This article develops and defends that stance by linking it to an Iterative Updating model of cognition in which the mind continuously maintains and partially preserves a limited working set of active psychological items while repeatedly selecting the next item to insert or modify via associative search. On this view, the cognitive analogue of next token prediction is next item prediction, where the units are not inherently linguistic but include concepts, goals, perceptual fragments, affective tags, and action tendencies, with language functioning as one prominent interface. The framework clarifies why next token trained systems capture a surprising portion of intelligent behavior while still falling short of human cognition, and it reframes the remaining gap as primarily architectural: the breadth of internal state variables, the objectives that constrain updating, and the coordination among multiple predictive modules. Such changes could result in a system that manipulates psychological items rather than tokens. Finally, the paper outlines behavioral, neural, and engineering implications of treating iterative predictive updating as a core substrate of intelligence, motivating research programs and agent designs that generalize the language model loop to a multimodal, goal constrained predictive workspace.

1. Tao’s provocation and the “deflation” of intelligence

Terence Tao recently made a comment about intelligence that hit a nerve precisely because it sounds, at first pass, like a deflation. The line people latched onto was his willingness to entertain the possibility that what we call human intelligence might not be as exotic as we imagine. In the context of discussing large language models, he framed “next-token prediction” as a mechanism that many critics treat as an explanation away, as if the phrase itself ends the conversation. His point was that the phrase does not end the conversation. If anything, it should start it. If a system that is trained to predict what comes next can display wide competence across language, reasoning, and problem solving, then either we have to keep moving the goalposts for what counts as intelligence, or we have to concede that iterative prediction is closer to the substrate of intelligence than our intuitions suggest.

The line that gets quoted most is Tao’s (carefully hedged) punchline:

“maybe that’s actually a lot of what humans do as well”

I agree with Tao’s stance in spirit, and I think it is more than a rhetorical flourish. It points to something that is, in retrospect, almost obvious. Intelligence is always operating under severe constraints: finite time, finite bandwidth, finite memory, partial information, and the need to act in the next moment rather than in an abstract mathematical eternity. Under those constraints, it makes sense that “intelligent behavior” is implemented as a repeated local operation that advances a state. The mystery is not that the operation is local. The mystery, if there is one, is that the repeated application of a local update rule can generate global structure that looks like planning, understanding, and insight.

This is exactly why the dismissive phrase “it is just next-word prediction” has always struck me as conceptually lazy. Saying “just” is doing all the work. The relevant question is what kinds of internal representations can be constructed, compressed, and deployed in service of prediction, and how a system’s update dynamics can chain those predictions into coherent multi-step behavior. Even in humans, much of what we call thinking is an unfolding sequence in which the current mental state constrains what becomes salient next. We experience that as meaning, intention, and comprehension, but at an implementational level it can still be a continuation dynamic.

Tao’s remark is also valuable because it forces an uncomfortable comparison. Humans like to imagine that intelligence is a special substance, and that language models are clever imitations that lack whatever that substance is. Yet language models keep demonstrating that a system can acquire a broad competence profile by optimizing a prediction objective over large corpora. That does not prove that next-token prediction is sufficient for the full range of human cognition, but it does strongly suggest that prediction is not an incidental byproduct of intelligence. It is part of its core machinery.

At aithought.com where I lay out my full model, I argue that a large fraction of intelligence may be implemented as iterative prediction of the next element in a structured stream, conditioned on a context that is itself a compressed summary of prior structure. When you set up cognition that way, the success of large language models becomes less surprising. They are not bizarre anomalies that accidentally stumbled into intelligence. They are clean implementations of a major cognitive motif.

That is the entry point for the argument I want to make in this paper. Tao’s public framing gives us permission to treat prediction as central rather than peripheral. My goal is to show that if you generalize “next-token prediction” to “next-item prediction” in a working-memory workspace, you get a model of cognition that aligns with the phenomenology of the stream of thought and that also explains why language models capture so many important aspects of intelligence.

2. Intelligence as iterative prediction: from next token to next psychological item

To make the connection precise, it helps to strip away the cultural baggage around language models and state the computational motif in abstract terms. There is a representational state that summarizes what is currently relevant. There is a rule that produces a probability distribution over what could come next, given that state. There is an update step that incorporates the selected next element into the state. Then the cycle repeats. If you do this once, you get a small continuation. If you do it thousands of times, you can get an extended coherent trajectory.

In a language model, the representational state is the context window, along with whatever internal activations are computed over that context. The “next unit” is a token. The update step is straightforward: append the token and recompute. The objective that shapes the whole system is to minimize prediction error on the next token. The elegance of the design is that the system is forced to internalize a vast amount of implicit structure because it must continually guess what comes next in a domain where what comes next depends on syntax, semantics, pragmatics, world knowledge, and social conventions.

In the model of cognition I have been developing, the same motif appears, but the unit of prediction is not a token. It is what I call a psychological item. A psychological item can correspond to a word, but it need not. It can be a perceptual fragment, a concept, a goal, a memory trace, a social inference, an affective tag, a motor intention, or an abstract constraint. The state is a limited working set of such items, coactive at any moment. The update step is not a full wipe and replacement. It is an iterative updating process in which portions of the prior state are preserved while a subset is replaced or modified. This is the mechanism that produces continuity. The stream of thought feels like a stream because the mind is not assembling each moment from scratch. It is updating.

The key move is to treat each update as a prediction. Given the current set of active items, what is the next item that should enter the set so that the overall state remains coherent, useful, and aligned with constraints? The selection of that next item can be modeled as associative search over long-term memory and latent structure. The current state biases retrieval. Retrieved candidates compete. The winner becomes active, and its activation reshapes the field, altering what becomes likely next. This is a continuation dynamic, but it is a continuation dynamic over conceptual and multimodal items rather than over word pieces.

Once you see the structural similarity, you can also see why the Tao remark is not merely a soundbite. If intelligence is built from iterative prediction of the next unit in a constrained workspace, then it makes sense that a system trained to predict the next unit in language will acquire many properties we associate with intelligence. Language is a rich proxy domain because it is an external record of the internal states humans traverse when they think, plan, explain, argue, and imagine. The “next word” in a sentence is often a surface manifestation of a deeper “next item” in cognition. When a model becomes good at predicting the surface, it often has to become partially good at tracking the underlying structure that generates it.

This is also where a common misunderstanding arises. People hear “next-token prediction” and assume it means the system is doing something shallow. In practice, predicting the next token in a human-like way requires the model to carry forward an evolving representation of what is being discussed, why it is being discussed, what is assumed, what is implied, and what would be consistent next. That is not the whole of intelligence, but it is not trivial either. It is an implementation of an iterative predictive loop in a domain where the latent variables are extremely high-dimensional.

The difference between language models and human cognition, in my view, is not that humans have a completely different kind of magic. It is that the brain runs the same general motif over a broader set of internal variables and under a broader set of objectives. The brain’s “tokens” are not just linguistic. They include bodily and motivational constraints, perceptual predictions, action policies, and social valuation. The brain also appears to have many interacting modules that contribute candidates into the workspace, not just a single predictor trained on text. If large language models feel, at times, like a single cortical module amplified to an extreme, that is because they are. They are a powerful, language-specialized predictor. The broader architecture of mind includes that function, but it is embedded in a larger system that selects, evaluates, grounds, and acts.

3. The Iterative Updating model and the continuity of mind

The claim I have been making on AIThought.com and in my earlier papers is that the stream of consciousness is not best modeled as a sequence of discrete snapshots that replace one another. Instead, it is better modeled as an evolving set of coactive psychological items that is iteratively updated. At any moment, some portion of the active set is retained, some portion is modified, and some portion is replaced. This is the simplest way I know to make temporal continuity a first-class architectural feature rather than a philosophical afterthought.

In this framework, working memory is not a container that receives fully formed “thoughts.” It is the active workspace that determines what can be retrieved, what can be inferred, and what can be acted upon next. The content of the workspace is the context. The update operation is the primitive. The mind’s apparent unity emerges because the next state is literally built from the prior state, not merely influenced by it. Continuity is not a narrative we impose after the fact. It is a mechanical consequence of partial carryover.

What selects the update? The answer is associative search constrained by the current state. In any realistic cognitive system, you have a vast reservoir of potential items: memories, categories, sensory fragments, motor schemas, social models, emotional tags, goals, and self-model elements. Only a tiny fraction can be active at once. The system must therefore repeatedly decide what to bring forward, what to suppress, what to revise, and what new item to activate. This looks like a competition among candidates where the current state biases the search field. Items that are strongly linked to the current configuration are more likely to be retrieved and activated. Once activated, they reshape the configuration and thereby reshape the next search. Thought becomes a trajectory through a structured associative landscape.

This is the place where Tao’s “next-token prediction” framing becomes genuinely useful as a bridge. If you replace “token” with “psychological item,” you get a similar update logic. The system maintains a context, predicts what is likely or useful next, updates the context, and repeats. In language models the update unit is token-like and the training signal is explicit. In brains the unit is multimodal and the training signal is implicit, distributed across survival, action success, and social coherence. The computational motif is still recognizably the same.

My earlier work argued, in different ways and at different levels of formality, that this iterative updating principle is not merely compatible with cognition but explanatory. It accounts for why thought has inertia, why it exhibits path dependence, why certain items recur obsessively under stress, why attention is both selective and sticky, and why “insight” often feels like a discrete insertion into an otherwise continuous stream. It also aligns with the phenomenology of the specious present: we do not experience a sequence of points but a temporally thick window that is continually refreshed. If the brain were fully replacing its state at each step, the subjective continuity would be harder to explain. If it is updating by partial carryover, continuity is expected.

A compact way to state the model is this. Let S_t be the working set of active items at time t. The next state S_{t+1} is a weighted mixture of retained elements from S_t plus a set of newly selected elements retrieved by associative search conditioned on S_t. The point is not the exact equation. The point is that the system’s core competence is the repeated selection of the next item to activate under constraints. That is the brain’s analogue of next-step prediction.

This is also why I have been comfortable saying, for years, that prediction is not just a component of cognition but its organizing principle. The most practical minds are not those that represent everything, but those that represent what matters next. The updating process is a mechanism for compressing a vast world into a small, actionable, predictive state.

4. Why LLMs capture so much, and why they remain incomplete

If you accept the framework above, the success of large language models becomes less mysterious. They implement the core motif cleanly: maintain a rolling context, predict a next unit, update, repeat. They are trained at scale on a domain that is saturated with human cognition, because language is the public trace of our internal updating dynamics. Text contains not only facts but intentions, explanations, social games, and plans. A model that learns to continue text well is forced to learn a statistical shadow of these deeper structures.

This helps explain a common experience: when an LLM is performing well, it feels as though it “understands” more than it possibly could, given that it is “only” doing next-token prediction. The correct response is not to deny what it is doing. It is to revise our intuitions about what next-step prediction can contain. A system that can sustain coherent continuation over long contexts must represent, at least implicitly, the latent variables that make continuation coherent. Those variables include topic, goal, conversational stance, assumed background, causal structure, and the expectations of a human reader. This is not the full stack of intelligence, but it is a meaningful portion.

This is where my Broca analogy fits, as long as it is used carefully. I am not claiming a literal anatomical mapping from transformers to Broca’s area. I am making a functional point. Language models look like a massively amplified specialization for linguistic continuation. The brain has language-specialized circuitry, but language is embedded in a broader architecture that includes perception, action, valuation, and homeostasis. A language model can be extraordinarily competent within its specialization while still lacking the full ecology of constraints that shape human cognition.

What is missing is not a mystical ingredient. It is a set of state variables and objectives that matter in real organisms and real agents. Humans do not merely need to generate plausible continuations of text. They must regulate energy, avoid harm, pursue goals, coordinate with others, and commit to actions under uncertainty. The brain’s predictive updates are constrained by embodiment, motivation, and social feedback. Those constraints shape what becomes salient next, and they give cognition its directionality. A purely text-trained model can imitate directionality by modeling linguistic traces of goals, but it does not automatically inherit the underlying goal machinery.

This is also why people’s critiques often mix two valid points and treat them as one. First, it is true that next-token prediction can generate surprising competence. Second, it is true that humans are more than a token predictor. The mistake is to infer that because humans are more than a token predictor, token prediction is therefore not central. The more coherent inference is that prediction is central and the remaining gap concerns what is being predicted, what objectives sculpt the prediction, and how multiple specialized predictors coordinate.

In the Iterative Updating framing, language models are strong because they approximate the core loop over a particular representational alphabet. They are incomplete because cognition is not only language. In the brain, the update candidates come from many subsystems. Visual systems offer predicted percepts and scene elements. Motor systems offer action affordances. Valuation systems offer salience and priority. Social inference systems offer models of other minds. Affect offers urgency and bias. The working set is therefore a negotiated product of many modules, not a single predictor optimized for text continuation.

This difference matters because it suggests the right direction for the next stage of AI. If the substrate is iterative prediction, then we should not abandon it. We should generalize it. We should build systems that maintain a structured workspace of psychological-item-like representations and repeatedly update that workspace using candidates contributed by multiple modalities and multiple objective functions. We should also treat language as one interface among several, not as the entire cognitive universe.

5. Predictions, research agenda, and architectural implications

A model is only as valuable as the constraints it imposes. The Iterative Updating framework is useful insofar as it suggests concrete predictions and engineering moves.

At the behavioral level, the model predicts that cognition should show measurable signatures of update competition. When multiple candidate items are strongly activated by the current state, selection should slow and errors should rise. This should not be limited to verbal tasks. You should see it in any domain where a limited workspace must choose among competing updates, including task switching, working memory substitution, and attention capture. When the update is forced to overwrite a strongly active item, you should observe a measurable cost. When an item is retained, you should observe inertia and persistence. These are not exotic predictions, but the point is that the framework unifies them as properties of a single update rule rather than as a miscellaneous list of effects.

At the neural level, the model predicts a mixture of continuity and punctuated change. If the state is partially carried over, then some neural ensembles should show persistence across successive moments. If a new item is inserted or a subset is replaced, then you should see discrete transition events that resemble update pulses. Importantly, the predicted signals are not only about “content.” They are about the dynamics of replacement and retention. Even when the content is stable, the system is still executing an update rule. The brain should therefore show structured temporal patterns that correspond to state maintenance, candidate activation, selection, and integration.

At the architectural level for AI, the framework suggests a simple but consequential pivot. Instead of treating text continuation as the whole of cognition, treat it as one module in a modular predictive system. Keep the predictive loop, but change the representational units and the sources of candidate updates. A next-generation agent could maintain an explicit workspace that includes goals, situational models, pending actions, social context, uncertainty estimates, and multimodal perceptual summaries. Specialist models would propose updates to this workspace, and a selection mechanism would determine what becomes active next. Language generation would then be downstream, one expression of the active workspace rather than the workspace itself.

This also implies a shift in evaluation. If you want to test whether an AI system has moved from next-token competence toward general cognition, you should test the integrity of its workspace updating. Can it keep a stable set of goals across distraction? Can it revise one element without collapsing the whole context? Can it suppress an irrelevant candidate update when a competing, goal-relevant update is available? Can it update beliefs incrementally in response to new evidence without rewriting its entire narrative? These are update-level questions. They map more naturally onto cognition than many benchmark tasks that reward polished text.

Finally, this framing clarifies what it means to say that “there may not be much more to intelligence.” That claim should not be taken as nihilism about minds. It is a design claim. Much of what we call intelligence may be accounted for by a single repeated operation: maintaining a context and selecting the next update that best satisfies constraints. The sophistication comes from the structure of the context, the richness of the candidate space, the objectives that constrain selection, and the coordination among modules, not from some separate ingredient called intelligence. In that sense, the success of large language models is not an accident. It is a proof of concept for the power of iterative prediction.

Conclusion

Tao’s remark landed because it forces a reconciliation. We can either keep treating next-token prediction as a demotion, or we can treat it as an empirical hint about the underlying substrate of cognition. I take the second option. The fact that a system trained to predict the next unit of text can display broad competence suggests that next-step prediction is not peripheral. It is central.

My claim is that the same principle can be stated in a brain-realistic way. The brain does not predict the next token. It predicts the next psychologically relevant item to insert into an iteratively updated working set. That working set is the context window of the mind. The update rule generates continuity and direction. Associative search supplies candidates. Competition and constraint satisfaction select what becomes active next. When you frame cognition this way, language models look like a powerful specialization rather than a conceptual outlier. They capture a major portion of the predictive loop in a domain where the loop is richly expressed.

What remains, in my view, is not to abandon the loop but to widen it. Build systems that perform iterative prediction over a richer internal workspace, with multiple modules contributing candidates and multiple objectives constraining updates. If intelligence is largely an iterative continuation process, then the road to broader machine cognition is not mysterious. It is architectural. It is about what the system is continuing, how it selects updates, and how those updates remain grounded in the world and in persistent goals.
Qualia as Transition Awareness: How Iterative Updating Becomes Experience

January 21, 2026

Abstract

Qualia is often treated as a static property attached to an instantaneous neural or computational state: the redness of red, the painfulness of pain. Here I argue that this framing misidentifies the explanatory target. Drawing on the Iterative Updating model of working memory, I propose that a substantial portion of what we call qualia, especially the felt “presence” of experience, is a temporal-architectural artifact: it arises from the way cognitive contents are carried forward, modified, and monitored across successive processing cycles. The core mechanism is partial overlap between consecutive working states, producing continuity without requiring a continuous substrate. I then add a second ingredient, transition awareness: the system’s current working state contains usable information about its own recent updating trajectory, allowing it to regulate, correct, and stabilize ongoing thought. On this view, consciousness is not merely iterative updating, but iterative updating that is tracked by the system as it unfolds. Finally, I treat self-consciousness as a special case of this same machinery, in which a subset of variables is stabilized across updates as enduring invariants, anchoring ownership and agency within the stream. This framework reframes the hard problem by shifting attention from timeless “qualitative atoms” to temporally extended relations among states, and it yields empirical predictions. Qualia-related reports should covary with measurable parameters such as overlap integrity, update cadence, monitoring depth, and invariant stability, providing a path toward operationalizing aspects of subjective experience in both neuroscience and machine architectures.

Section 1. The problem as usually framed, and why it stalls

Philosophical discussion of qualia tends to begin with an intuition that feels both obvious and irreducible: there is something it is like to see red, to feel pain, to hear a melody, and no purely third-person description seems to capture that first-person fact. From this starting point, a familiar structure appears. On one side are approaches that treat qualia as fundamental, perhaps even as a primitive feature of the universe. On the other side are approaches that treat qualia as a kind of cognitive illusion, a user-interface story the brain tells itself. In between sit families of functionalist and representational views that try to keep experience real while insisting it is fully grounded in what a system does.

The debate stalls, in my view, because the term qualia functions like a suitcase word. It does not refer to one problem. It packages several. The vivid sensory character of experience is one part. The unity of experience, the fact that the world appears as a coherent scene rather than a shuffled deck of fragments, is another. The continuity of experience, the fact that there is a temporally thick “now” rather than a sequence of disconnected instants, is another. And then there is ownership, the sense that experience is present to someone, and present as mine. When we ask “why is there something it is like,” we are often asking about all of these at once, and then treating the bundle as if it were a single indivisible mystery.

There is a second, quieter reason the debate stalls. Much of the philosophical literature implicitly treats experience as if it were a snapshot. Even when philosophers acknowledge the specious present, the mechanisms under discussion are typically framed as properties of a state at a time. But lived experience is not most naturally described as a mathematical instant. The present we actually inhabit has duration. It has inertia. It has carryover. It has direction. A theory that tries to explain experience while ignoring the temporal structure that makes the present feel like a present is likely to be forced into metaphysical inflation, because it is leaving explanatory work on the table.

My goal here is not to solve the entire problem of qualia in one stroke. It is to propose a narrower and more tractable strategy. Instead of beginning with the most ineffable aspect of qualia, I begin with the temporal architecture that makes experience continuous and present at all. I laid this out in my model of consciousness at:

aithought.com

I then argue that what we call consciousness, in the sense of presence, may involve a system that not only updates its working contents over time but also tracks that updating as it occurs. This turns part of the qualia discussion into an architectural question. Under what temporal and computational conditions does a stream of updating become a stream that is lived?

Section 2. Iterative updating as the continuity substrate

The core architectural idea is simple. Working memory is not replaced wholesale from moment to moment. It is updated. In each cycle, some portion of the currently active content remains, some portion drops out, and some new content is added. This is not merely a convenience. It is a structural constraint with phenomenological consequences. If a system’s present state contains a nontrivial fraction of the immediately prior state, then the system carries its own past forward as a constituent of its current processing. Continuity is built into the physics of the computation.

Once this is stated plainly, the specious present looks less like a philosophical puzzle and more like an expected property of overlapping updates. The “now” is not a point. It is the short interval over which remnants of the previous state and elements of the emerging state coexist and interact. Subjectively, this coexistence can feel like temporal thickness. Mechanistically, it is the overlap region in which decay and refresh are simultaneously present. If the overlap were zero, cognition would be frame-by-frame. If the overlap were near total, cognition would become sticky, dominated by inertia rather than responsiveness. In between is a regime that supports smooth transitions: enough persistence to bind time together, enough turnover to incorporate new information and move.

This kind of overlap also offers a natural basis for the unity of experience. When multiple representational elements remain coactive across successive updates, they can constrain one another over time. The system does not simply display a series of unrelated contents. It maintains a structured constellation long enough for relationships among its elements to be tested, revised, and stabilized. In that sense, iterative updating is not only a memory mechanism. It is an information-processing mechanism. It is a way of letting a set of simultaneously held items do work together, and of letting that work continue across short spans of time instead of resetting at every step.

Importantly, none of this yet requires a claim about metaphysical ingredients. It is a claim about architecture. It says that if you want a temporally continuous mind, you should look for a temporal continuity constraint in the underlying processing. The overlap itself is not a complete theory of consciousness, but it may be a necessary substrate for the specific features of consciousness that are most obviously temporal: the feeling of flow, the felt presence of an extended now, and the stability that allows a moment to be experienced as part of an ongoing scene.

Section 3. Transition awareness: when continuity becomes presence

Up to this point, the story is about a substrate: overlapping updates create temporal continuity. But continuity is not identical to presence. A system can exhibit overlap in its internal dynamics and still fail to have anything like subjective availability, in the ordinary sense in which a mental episode is present to the organism. This is where many theories quietly smuggle in an observer, a global workspace “reader,” or a higher-order monitor that watches the stream from outside. My preference is to avoid that move. If a monitoring function is required, it should be implemented as part of the same iterative machinery, not as an additional homunculus.

The key proposal is that consciousness, in the sense of presence, arises when the iterative updating process is not merely occurring but is being tracked by the system as it occurs. In other words, the system’s current working state includes not just representational content, but a usable representation of change. It contains information that a transition is underway, what has been retained, what has been lost, and what is being incorporated. This is not mystical. It is a familiar engineering pattern: a process that exposes its own internal state to itself can do more robust control, error correction, and planning than a process that only emits outputs.

If this is correct, then qualia-like presence is less like a static glow attached to a percept and more like an active relation between successive states. The system does not merely have a red representation. It has a red representation whose arrival, persistence, and integration are being handled in a structured way by the very process that is updating the workspace. The experience is not only the content but the content-as-it-is-being-carried-forward.

There is a useful way to say this without overreaching. Consciousness is not the iterative updating itself, because many biological and artificial processes update iteratively without anything we would call experience. Rather, consciousness is iterative updating plus transition awareness: the system maintains an accessible, functionally relevant trace of its own recent updating trajectory. The trace can be minimal. It does not have to be a narrative. It can be a structured sensitivity to what has changed. But it is crucial that the system can use this information to guide the next update. When the system is sensitive to its own transitions, it is not merely moved along by dynamics. It is, in a sense, present to those dynamics.

This framing has two advantages. First, it offers a plausible reason why experience feels temporally thick. The “now” is not just overlap. It is overlap that is being actively negotiated. Second, it links phenomenology to control. Presence becomes the experiential face of a self-updating controller that must remain online to keep the stream coherent. A purely feedforward system can produce outputs, but it cannot be present to the way it is transforming itself in time. A transition-aware system can.

One implication is that consciousness should scale with the degree to which transition information is accessible and used. A system may carry forward overlap but fail to track it. In that case, it may behave in ways that look coherent while lacking a robust sense of presence. Conversely, a system may track transitions deeply, and in doing so gain a richer sense of being in the middle of an unfolding process. This gives us a principled way to talk about gradations without asserting that everything is either a zombie or a full subject.

Section 4. Self as a stabilized set of invariants within the stream

If presence is the system tracking the updating, self-consciousness is the system tracking itself within what is being updated. The self is not a separate object added to experience. It is a set of stable variables, constraints, and reference points that remain active, or remain easily reinstated, across successive updates and thereby anchor the stream. In lived life, these invariants include body-relevant signals, enduring goals, social commitments, autobiographical expectations, and the persistent sense that this stream belongs to one agent.

This can be stated in an architectural way. The iterative updating process continuously selects which elements will remain in the workspace and which will be replaced. If certain elements are repeatedly retained or rapidly reintroduced, they become quasi-permanent constraints. They function like a coordinate system. They define what counts as relevant, what counts as threatening, what counts as mine. Over time, these constraints produce a stable center of gravity. The “self” is the name we give to that center of gravity as it is maintained across the stream.

In this view, self-reference is not primarily conceptual. It is operational. When the system models the likely consequences of an action, it must do so relative to its own body, its own goals, and its own expected future states. That requires keeping certain self-related parameters online across updates. When those parameters are stable, the stream feels owned. When they are unstable, the stream can remain continuous but lose its familiar sense of ownership. This is one reason depersonalization and derealization are so philosophically important. They suggest that continuity of experience and ownership of experience can come apart, at least partially, which is exactly what an architectural decomposition would predict.

This also suggests that the self is graded and modular. Not every self-variable has to be online at every moment. The body schema may be present while autobiographical narrative is not. Goals may be vivid while social identity fades. In everyday life, we slide around in this space. In stress, fatigue, anesthesia, meditation, or certain clinical states, the distribution shifts. A theory that equates self-consciousness with a single module will struggle to accommodate these shifts. A theory that treats the self as a set of invariants stabilized across iterative updating can accommodate them naturally.

Finally, this offers a clean way to relate self-consciousness to transition awareness. If presence is tracking the updating, self-consciousness is tracking the updating while treating some of the tracked variables as self-defining. The system is not only aware that the stream is unfolding. It is aware that it is the locus of unfolding, because certain constraints persist and are tagged, implicitly or explicitly, as belonging to the same continuing agent. The self, on this account, is the temporal binding of agency-related constraints.

Section 5. From mystical properties to architectural variables

The preceding claims can be summarized as a layered proposal. Iterative updating provides a continuity substrate by partially overlapping successive working states. Transition awareness provides presence by making the updating trajectory accessible to the system’s own control process. Self-consciousness provides ownership by stabilizing a subset of variables as enduring invariants within that trajectory. This is not meant as a rhetorical flourish. It is meant as a shift in the type of explanation. If we can specify these layers, then at least some components of qualia become architectural artifacts rather than metaphysical primitives.

The practical value of this shift is that it encourages us to define variables. Even if we cannot yet measure them perfectly, we can describe what would count as evidence for them. The first variable is overlap integrity: how much of state A remains functionally active in state B, and for how long. The second is update cadence: the typical rate at which new elements are introduced and old elements are removed, and how that rate changes under different conditions. The third is monitoring depth: the degree to which the system’s current state contains usable information about its own recent transition history, not merely about the external world. The fourth is invariant stability: how reliably certain self-relevant constraints are maintained or reinstated across time.

These variables motivate what I have called an informational viscosity view. In a low-viscosity system, states are discrete and do not significantly bleed into one another. In a high-viscosity system, states persist and constrain the next state strongly. Conscious experience, on this view, is most likely in a middle regime: enough viscosity to produce a temporally thick present and stable ownership, but not so much that the system becomes stuck. The qualitative feel of experience may then be partly determined by where the system sits in that regime and how effectively it monitors its own transitions.

This way of speaking also offers a cautious bridge to questions about machine qualia. I do not claim that any particular artificial system is conscious or that continuity metrics alone settle the issue. But the framework suggests a more precise research question than “can silicon feel.” It suggests that a system’s “qualia potential,” whatever one thinks of that phrase, would be expected to increase as it exhibits robust state overlap, transition awareness, and stable self-invariants. This turns a metaphysical standoff into an engineering hypothesis: if we build systems with these temporal properties and they begin to show markers of unified, self-stabilized processing, we will at least have moved the debate into a domain where evidence can accumulate.

A final point is worth stating plainly. Nothing here fully explains why red has the character it has. The sensory-specific character of experience remains difficult. What this framework tries to do is clarify which parts of the qualia problem are plausibly addressed by temporal architecture. Continuity, presence, and ownership are not minor features. They are central to what people mean when they speak about the “feel” of being conscious. If we can explain those in a principled way, we have reduced the explanatory gap, even if we have not fully closed it.

Section 6. Empirical and clinical predictions

A theory gains credibility when it predicts dissociations, not just correlations. The layered structure proposed here implies that continuity, presence, and selfhood can vary somewhat independently. This matters because it yields specific predictions across stress, anesthesia, cognitive load, and altered states.

First, continuity should covary with overlap integrity. Conditions that reduce persistence or disrupt partial carryover should increase reports of fragmentation, temporal disorientation, and context-loss errors. Conditions that increase persistence excessively should produce perseveration, intrusive carryover, and a sense of cognitive stickiness. Importantly, these changes need not map neatly onto performance. A person may perform adequately while experiencing reduced temporal thickness, especially if compensatory routines are available.

Second, presence should depend on transition awareness, not merely on content. If monitoring depth is reduced, one would expect a decrease in metacognitive clarity and a thinning of “for-me-ness” even when perception and behavior remain relatively intact. This suggests that certain anesthetic or dissociative states might preserve processing of stimuli while degrading the feeling of being present to one’s own processing. Conversely, tasks that force a person to track internal changes, rather than merely detect external targets, should amplify felt presence if transition awareness is a real contributor.

Third, selfhood should track invariant stability. Depersonalization and derealization should correlate with disruptions in the maintenance of self-relevant constraints across updates, even when the perceptual scene remains coherent. The model predicts that when self-invariants become unstable, the stream can remain continuous while feeling unowned, distant, or unreal. This is a philosophically valuable dissociation because it suggests that ownership is not identical to continuity.

Fourth, stress should compress the system toward reactive updating. Under stress, people often report being less able to “hold the whole situation in mind,” more prone to snap judgments, and more vulnerable to context mixing. On this framework, stress reduces both overlap integrity and transition monitoring by pushing the system toward faster, less controlled updating and by destabilizing the maintenance of invariants. This yields a concrete prediction: stress should selectively degrade tasks that require maintaining a stable constellation over multiple steps, especially when self-relevant variables must be integrated with external cues.

Fifth, working-memory load should reduce the perceived richness of experience via compression. As the system approaches capacity, it should rely more heavily on coarse summaries and categorical representations. Subjectively, this may feel like a narrowing of experience, not necessarily because sensory input is absent, but because the system cannot sustain enough structured overlap to preserve nuance. This prediction aligns with ordinary introspection: when overloaded, we remain awake, but the present feels thin and schematic.

Finally, flow states should represent a favorable regime of viscosity and monitoring. In flow, task-relevant invariants remain stable, updates proceed smoothly, and the system remains tightly coupled to its own transitions without excessive self-interruption. The phenomenology of flow, a sense of continuous agency and clarity, fits the expectation of high-quality overlap plus effective transition awareness.

Section 7. Limits, objections, and what this model claims

A common objection to any architectural account is that it explains structure without explaining “the glow,” the intrinsic feel of particular sensory qualities. I agree that the present proposal does not fully explain sensory-specific character. It does not claim that overlap alone turns computation into redness or turns dynamics into pain. What it does claim is that several central features of what people call qualia are fundamentally temporal: continuity, presence, and ownership. Those features are not optional decorations on experience. They are the stage on which sensory qualities appear as lived.

A second objection is that this approach risks collapsing into a sophisticated functionalism, and functionalism is often accused of leaving the explanatory gap untouched. The best response is to admit what functionalism can and cannot do, and then be specific about the gain. The gain here is not that we have derived qualia from logic. The gain is that we have decomposed a monolithic mystery into components with architectural signatures. Even if one remains a dualist about sensory character, one can still accept that continuity and ownership depend on specific temporal constraints. That is progress, because it turns parts of the debate into a research program rather than a metaphysical stalemate.

A third objection is that many non-conscious processes are iterative and self-referential, so why should tracking iterative updating yield consciousness. The answer is that the proposal is not “any recursion equals consciousness.” It is a claim about a particular kind of recursion: a system that (1) maintains a limited working set with partial overlap across time, (2) uses that overlap to form temporally extended constraints, and (3) makes transition information available to guide subsequent updates. That combination is more specific than generic recursion, and it is closer to what brains appear to do when they are awake and coherent.

Where does this leave the hard problem. I do not think it dissolves it in one step. But it changes the terrain. It suggests that a large part of what makes qualia feel irreducible is that philosophers have been looking for a static property when the relevant object is a temporal structure. If experience is in significant part the lived tracking of ongoing updating, then the right explanatory target is not an instantaneous state description. It is a dynamical account of how a system binds itself to its own immediate past, monitors its own transitions, and stabilizes a self within that stream.

In that sense, the most important shift is from “where does qualia come from” to “under what temporal conditions does a system’s processing become present to itself.” That is a question that can be sharpened, modeled, and tested. It belongs equally to philosophy of mind and to the engineering of future artificial systems that aim to be more than discrete sequence predictors.
Consciousness as Iteration Tracking: Experiencing the Iterative Updating of Working Memory

January 21, 2026

Abstract

This article proposes a temporal and mechanistic model of consciousness centered on iterative updating and the system’s capacity to track that updating. I argue for three nested layers. First, iterative updating of working memory provides a continuity substrate because successive cognitive states overlap substantially, changing by incremental substitutions rather than full replacement. This overlap offers a direct account of why experience is typically felt as a stream rather than a sequence of snapshots. Second, consciousness in the stronger, phenomenologically salient sense arises when the system represents features of its own state-to-state transitions, in effect tracking the stream as it unfolds. On this view, awareness is not merely access to current contents but access to trajectory properties such as drift, stabilization, conflict, novelty, and goal alignment, together with the regulatory control these representations enable. Third, self-consciousness emerges when a self-model functions as a relatively stable but updateable reference frame carried within the stream, and when changes in that self-model are themselves tracked. The model is positioned as complementary to major consciousness frameworks while supplying an explicit temporal architecture they often leave underspecified. It yields principled dissociations among continuity, awareness of change, and self-experience, and it motivates empirical predictions: measurable overlap across adjacent representational states should correlate with felt continuity, transition-encoding signals should correlate with metacognitive access to ongoing change, and disturbances of self-consciousness should correspond to altered stability or tracking of self-variables embedded in the updating stream.

Introduction

Most theories of consciousness begin with what consciousness contains. They talk about the integration of information, the broadcast of representations, the accessibility of content for report, or the construction of a world-model. Those are all legitimate targets. But they can leave a central phenomenological fact underexplained: consciousness is not experienced as a sequence of snapshots. It is experienced as a stream that changes continuously, where each moment is shaped by what came just before it and where the present seems to be arriving rather than merely appearing.

My model of iterative updating proposes that the temporal architecture of cognition is not a secondary detail but a core explanatory variable. You can find the model at :

aithought.com

Here I argue for a three-layer model. First, iterative updating of working memory provides a substrate of continuity because successive cognitive states overlap substantially, changing by small increments rather than full replacement. Second, consciousness in a stronger sense arises when the system tracks its own updating. It is not only updating, but representing and regulating the fact that it is updating. Third, self-consciousness arises when the self is represented as a relatively stable model within the stream and when the updating of that self-model is itself tracked. The goal here is to articulate these layers cleanly, relate them to the current literature, and propose empirical hooks that could make the account testable.

1. The problem of temporal phenomenology

The basic phenomenon is easy to notice and surprisingly hard to formalize. Experience feels temporally extended. A sound has duration, not just presence. A visual scene seems to persist while subtly shifting. A thought unfolds, branches, corrects itself, and settles. Even when attention jumps, the jump is experienced as a transition rather than as a hard reset. This is true not only for perception but for inner cognition. Deliberation, mind-wandering, and mental imagery all have the character of motion through a space rather than discrete frames laid side by side.

One reason this is difficult is that science likes snapshots. Our measurements often privilege static contrasts: stimulus versus baseline, condition A versus condition B, region X more active than region Y. Even computational models often focus on functions that map an input to an output, as if cognition were primarily a single-pass transformation. But the lived structure of consciousness is not only about content. It is about how content changes, how it stays coherent, how it gradually becomes something else, and how the system can remain “with itself” as it changes.

It helps to distinguish three targets that are commonly bundled together under the word consciousness. The first is temporal continuity, the sense that experience persists and flows. The second is awareness of the stream, meaning the system not only has content but is in contact with the way that content is evolving, drifting, stabilizing, or being redirected. The third is self-consciousness, the sense that the stream is happening to an entity that is represented as “me,” with ownership, perspective, and some degree of identity across time. These are entangled in everyday life, but they can come apart. A theory that does not separate them risks either explaining too little or claiming too much.

The thesis of this paper is that temporal continuity can be grounded in a specific dynamical property of working memory, but awareness requires an additional step: the updating itself must become an object of representation and control. Self-consciousness then becomes a further specialization: the self is one of the represented structures carried through the stream, and its updates become trackable as well.

2. Iterative updating as the continuity substrate

The simplest way to make a stream is to avoid full replacement. If cognitive states were rebuilt from scratch each moment, continuity would be difficult to explain. You could still have a sequence, but you would be missing a direct mechanism for why the sequence feels like ongoing experience rather than flicker. Iterative updating proposes the opposite architecture: successive working-memory states share substantial overlap. The system carries forward many of the same active elements while selectively swapping in a small number of new elements and letting others fall away.

In cognitive terms, the “elements” can be treated as a small set of representations that are coactive at a given moment, constrained by the capacity limits of working memory. The details of representation can be left open. They might be assemblies, distributed patterns, symbols, or structured feature bundles. What matters for the present argument is the dynamics: the next state is not independent of the previous one. It is built out of it.

This overlap yields an immediate phenomenological consequence. If each moment retains a large fraction of the previous moment’s content, then the present is literally constructed from the immediate past. A stream becomes not a metaphor but a property of the physical process. The experience of persistence is what it is like for a system whose current state is partially composed of what was active a moment ago, with incremental revision rather than total replacement.

Iterative updating also provides a substrate for thought as a process of refinement. If you can hold a set of representations active, you can test candidate additions, evaluate coherence, and gradually steer the set toward better constraint satisfaction. This is the difference between a single jump to an association and an extended trajectory of improvement. Many cognitive achievements feel like this: understanding a sentence, solving a problem, remembering a name, integrating a new piece of evidence into a belief. They often require multiple micro-updates in which most of the context remains while one element shifts, a relationship is reweighted, or an implication becomes salient.

At this point the model is powerful but still incomplete. Overlap can explain continuity, but continuity alone does not guarantee awareness. A system can update iteratively without being aware of that updating in any meaningful sense. It can have state overlap and still operate in a largely automatic manner, with transitions that are not represented as transitions but merely occur. If we want to explain not just the existence of a stream, but the experience of being in the stream, we need an additional layer.

3. Iteration tracking as awareness of the stream

The central proposal is that consciousness, in the stronger sense people typically care about, involves a specific kind of reflexivity. The system does not merely undergo iterative updating. It tracks it. It represents aspects of its own state transitions, and it uses those representations to regulate subsequent transitions. Put differently, the stream becomes something the system can in some sense perceive.

This can be stated without introducing a homunculus. Tracking does not mean that there is an inner observer watching thoughts go by. It means the cognitive machinery includes variables that encode change over time. In engineering terms, the system has an observer for its own dynamics. In informational terms, it encodes deltas or derivatives, not merely states. In psychological terms, it has access to whether a thought is stabilizing, whether it is drifting, whether a line of reasoning is gaining coherence, whether a perception is becoming more confident, or whether attention is slipping.

A useful way to understand this is to separate content from trajectory. Content is what is currently active. Trajectory is the pattern of change across successive activations. Iteration tracking is the representational capture of trajectory features. These features can include novelty, conflict, instability, goal misalignment, and the need for re-anchoring. They can also include the felt speed of thought, the sense of effort, and the sense that a mental object is being held in place versus allowed to wander. None of this requires language. Much of it is plausibly prelinguistic and nonverbal, which matters because we want an account that could apply across development and across species.

This distinction also clarifies why awareness often feels like control. When people say they became more conscious, they often mean they became more able to notice drift, to slow down, to redirect, to hold onto a thread, or to catch themselves before they act impulsively. That is exactly what you would expect if awareness involves tracking and regulating the update process. A mind that cannot track its own updating might still update, but it would not have the same capacity to notice that it is losing the plot, nor the same ability to modulate the rate and selectivity of its transitions.

On this view, “experiencing the stream” is not something extra pasted onto cognition. It is what it is like for a system to include its own updating dynamics within the scope of what it represents and controls. Iterative updating gives you a stream. Iteration tracking gives you awareness of the stream.

4. Self-consciousness as self-model-in-the-loop

Self-consciousness adds another ingredient that is conceptually straightforward once the prior layer is in place. The self becomes one of the structures carried forward through iterative updating, and the system tracks the updating of that self-representation as part of the same process. The key point is that the self is not an ethereal essence. It is a model. It is a set of variables, regularities, and expectations that describe the agent as an entity with a perspective, a body, capacities, goals, commitments, and a history.

Many theories treat self-consciousness as a special mystery, but it can be reframed as a special case of a general mechanism. If a system can track its own updating, it can in principle track any domain of content that is repeatedly carried in the stream. When the repeatedly carried content includes a self-model, then the system is not only aware of thoughts, perceptions, and goals, but also aware that these belong to an ongoing agent. This yields the familiar phenomenology of ownership and perspective. The experience is not only that something is happening, but that it is happening to me, and that I can situate myself within what is happening.

It helps to separate three components that are often conflated. Ownership is the sense that experiences are mine. Perspective is the sense of being located at a point of view, whether spatial, affective, or intentional. Narrative continuity is the sense that there is an identity extended through time, a thread connecting past, present, and anticipated future. These can vary somewhat independently. A person can have vivid experience with disturbed ownership, as in depersonalization. A person can have a stable perspective with reduced narrative continuity, as in certain amnestic states. The point of the present model is that these components can be understood as properties of a self-model embedded in an updating stream.

One way to formalize this is to treat self-representations as relatively slow variables within a fast-updating process. The contents of working memory may change quickly, but self-parameters tend to be more stable and can act as an anchor. They provide a reference frame that constrains interpretation and guides action. When that anchor is stable and when its updates are tracked, self-consciousness is robust. When the anchor is unstable, poorly updated, or poorly tracked, self-consciousness becomes distorted. Importantly, this distortion can occur even when the basic stream of experience remains intact.

This completes the conceptual ladder. Iterative updating gives continuity. Iteration tracking yields awareness of the continuity and the ability to regulate it. Self-consciousness emerges when a self-model is maintained as part of what the system is tracking and controlling within the stream.

5. Dissociations and boundary cases

A useful theory of consciousness should not only explain the central case, the ordinary waking stream. It should also illuminate the ways that consciousness can fragment, narrow, or become oddly self-salient. The layered model does this almost automatically, because each layer can vary somewhat independently.

Start with continuity. A mind can show iterative updating even when awareness is thin. Habitual behavior is the simplest example. People can drive a familiar route, shower, or clean the kitchen with a sense of time passing and with some coherence of perception, yet later have surprisingly little recollection of the intermediate moments. The substrate is running and the stream exists, but the tracking of the stream is partial. Conversely, awareness can become unusually vivid when tracking is amplified. This is one way to characterize certain contemplative states and also certain anxious states. The system is not just thinking and perceiving, it is monitoring every micro-shift. The stream is lit up as an object.

The model also predicts dissociations in which self-consciousness changes while continuity remains intact. Depersonalization provides a striking example: people often report that experience continues normally in sensory terms, but the sense of ownership and self-presence is altered. In the present framework, this would correspond to a disturbance of the self-model-in-the-loop. The stream continues, and some degree of iteration tracking continues, but the self-variables that normally anchor ownership and perspective are either unstable, underweighted, or not being tracked with the usual fidelity. Another boundary case is absorption, the “lost in the task” state. Here, iterative updating is strong and tracking is sufficient for performance, but self-model content is temporarily minimized. The person does not lack consciousness, but self-consciousness is reduced. This is consistent with the common report that self-awareness returns when attention is disrupted or when social evaluation enters the scene.

Fatigue, intoxication, and stress are also useful because they can degrade different components. Fatigue can reduce the precision of tracking, producing the familiar feeling of mental drift and reduced executive capture. Intoxication can preserve the stream but destabilize update selection, so that the system continues to move forward without being able to regulate its own trajectory effectively. Stress can narrow the set of representations that remain coactive across moments, producing a kind of premature context collapse where the system updates too aggressively, drops the wrong elements, or becomes overbound to threat-related content. The model does not need to claim that these are the only mechanisms involved. It only needs to show that the layered architecture gives a principled way to map subjective reports onto plausible computational failures.

The most important takeaway from these boundary cases is conceptual. If you treat consciousness as a single thing, the cases look like exceptions. If you treat consciousness as layered, the cases become expected patterns: continuity without rich tracking, tracking without a stable self-anchor, self-salience without good regulation, and various mixed profiles.

6. Relation to major consciousness frameworks

The iteration tracking model is not offered as a replacement for the existing landscape so much as a temporal spine that many existing theories can attach to. The goal is to make explicit something that is often implicit: consciousness is not only about what is represented, but about how representation persists and changes through time, and whether the system has access to that change.

Global workspace theories emphasize access, broadcast, and coordination across specialized systems. The present proposal is compatible with that emphasis but adds a specific temporal mechanism for why the workspace would feel like a stream rather than a bulletin board. Iterative updating supplies continuity, and iteration tracking supplies a form of global availability not only of contents but of the system’s own transitional dynamics. In other words, a workspace could broadcast what is currently in view, but a conscious workspace also makes available how the view is evolving.

Higher-order theories propose that a mental state becomes conscious when it is represented by another mental state. Iteration tracking can be framed as a particular form of higher-order representation, but with a distinctive target. The higher-order content is not necessarily a proposition about a belief. It can be a representation of the transition itself, encoding that the system is shifting, stabilizing, or losing coherence. This keeps the core idea of reflexivity while grounding it in dynamics rather than introspective commentary.

Predictive processing and related accounts focus on prediction and error minimization. Iterative updating is naturally compatible with this, because an updating stream is a plausible vehicle for continual model refinement. The difference is emphasis. Prediction error is a signal. Iteration tracking is a way of representing the ongoing evolution of the internal model, including error dynamics but not reducible to them. In everyday experience, one does not only experience surprise. One experiences a trajectory: a thought coming together, a perception sharpening, an understanding forming. Those are temporal structures that are not captured by error signals alone.

Integrated information approaches emphasize the structure of causal integration. The iteration tracking model does not deny that integration matters. It argues that integration alone does not specify why experience feels temporally continuous and process-like. A system could be highly integrated yet still be experienced, if it were experienced at all, as a sequence of unrelated states if it lacked sufficient overlap and lacked access to its own transitions. The present proposal therefore treats temporal overlap and transition representation as constraints that any fully satisfying account must include, regardless of whether it is framed in terms of integration, broadcast, or prediction.

The common thread in these comparisons is that the iteration tracking model is not trying to compete on every dimension. It is trying to contribute a missing dimension: explicit temporal architecture and an explicit account of how the system can become aware of its own updating rather than merely performing it.

7. Empirical predictions and operationalization

If the model is to be more than a metaphor, it needs operational handles. The layered view suggests three classes of measurable signature corresponding to continuity substrate, iteration tracking, and self-model-in-the-loop.

For the continuity substrate, the prediction is that adjacent cognitive moments should show measurable overlap in representational patterns, and that the degree of overlap should correlate with subjective continuity. States described as fragmented or discontinuous should show reduced overlap, more abrupt representational turnover, or a higher rate of unstructured replacement. This could be probed in perceptual paradigms where continuity is manipulated, in working memory tasks where maintenance must persist across interference, or across transitions into and out of sleep and anesthesia where continuity reports change sharply.

For iteration tracking, the prediction is stronger and more distinctive: there should be measurable signals that encode the delta between successive states, not merely the states themselves. In practice, this might look like neural activity that correlates with estimated drift, conflict, or stabilization of a representation, even when the represented content is held constant. It could be probed with tasks that control content while altering the dynamics of updating, for example by manipulating the rate of change in a stimulus stream, the rate of rule-switching in a cognitive task, or the degree of uncertainty that requires iterative refinement. If subjective clarity is tied to iteration tracking, then measures of metacognitive sensitivity should covary with these transition-encoding signals.

For the self-model layer, the prediction is that self-related variables behave like stabilizing parameters that constrain interpretation across time, and that disturbances of self-consciousness correspond to disturbances in the stability or tracking of those variables. This suggests a way to interpret depersonalization, certain dissociative states, and aspects of self-disturbance in psychiatric conditions. The model predicts that in such states, many forms of content processing can remain intact while the coupling between the stream and the self-anchor is altered. Paradigms that elicit changes in ownership, agency, or perspective could be used to examine whether the brain is tracking self-variable updates in a manner analogous to how it tracks other trajectory dynamics.

The paper does not require committing to a single measurement modality. The important commitment is conceptual and testable: conscious awareness should correlate not only with representational content but with representational access to transition structure, and self-consciousness should correlate with the embedding of self-variables within that transition-aware stream.

The strongest falsification pressure would come from a dissociation in the opposite direction. If one could show robust subjective awareness of flow and change while the brain exhibits no meaningful overlap across adjacent states and no measurable transition-encoding signals, the model would be weakened. Conversely, if one could show robust overlap and transition encoding in conditions where subjective awareness is reliably absent, the model would need to clarify whether those signals are sufficient or only necessary. The layered structure makes room for this. It is possible that overlap is necessary but not sufficient, and that tracking must also be broadcast to a set of systems that enable report and control. That is an empirical question, not a rhetorical escape hatch.

Conclusion

The argument of this article is that the temporal architecture of cognition deserves to be treated as a central explanatory variable in theories of consciousness. Iterative updating of working memory provides a concrete substrate for continuity because each moment is built from the remnants of the moment before it, altered by incremental revision rather than full replacement. This can explain why experience feels like a stream.

But continuity is not the whole story. Consciousness in the stronger sense involves iteration tracking: the system represents and regulates the updating itself, encoding features of its own transitions such as drift, stability, novelty, and goal alignment. When the stream becomes an object of monitoring and control, experience becomes not merely a succession of states but an ongoing process that the system can remain with.

Self-consciousness then emerges when a self-model is maintained within the stream and when the updating of that self-model is itself tracked. Ownership, perspective, and narrative continuity can be treated as properties of a stable but updateable reference frame embedded in the same transition-aware dynamics that govern ordinary thought and perception.

This framework is intended to be compatible with major families of theory while contributing an explicit account of temporal phenomenology and reflexivity. It makes commitments that can be operationalized. It predicts dissociations across continuity, awareness of transitions, and self-consciousness, and it suggests that the “shape” of conscious life may be measurable as the overlap, the tracked deltas, and the anchoring self-variables that together allow a mind to experience itself changing through time.
Does Superintelligence Need Psychotherapy? Diagnostics and Interventions for Self-Improving Agents

January 21, 2026

Abstract

Agentic AI systems that operate continuously, retain persistent memory, and recursively modify their own policies or weights will face a distinctive problem: stability may become as important as raw intelligence. In humans, psychotherapy is a structured technology for detecting maladaptive patterns, reprocessing salient experience, and integrating change into a more coherent mode of functioning. This paper proposes an analogous design primitive for advanced artificial agents, defined operationally rather than anthropomorphically. “AI psychotherapy” refers to an internal governance routine, potentially implemented as a dedicated module, that monitors for instability signals, reconstructs causal accounts of high conflict episodes and near misses, and applies controlled interventions to processing, memory, objective arbitration, and safe self-update. The proposal is motivated by three overlapping aims: alignment maintenance (reducing drift under recursive improvement and dampening incentives toward deception or power seeking), coherence and integration (preserving consistent commitments, a stable self-model, and trustworthiness in social interaction), and efficiency (curbing rumination-like planning loops, redundant relearning, and compute escalation with diminishing returns). I outline a clinical-style framework of syndromes, diagnostics, and interventions, including measurable triggers such as objective volatility, loop signatures, retrieval skew, contradiction density in memory, and version-to-version drift; and intervention classes such as memory reconsolidation and hygiene, explicit commitment ledgers and mediation policies, stopping rules and escalation protocols, deception dampers, and continuity constraints that persist across self-modification. The resulting architecture complements external oversight by making safety a property of the agent’s internal dynamics, while remaining auditable through structured logs and regression tests. As autonomy and recursive improvement scale, a therapy-like maintenance loop may be a practical requirement for keeping powerful optimizers behaviorally coherent over time.

Introduction

Agentic artificial intelligence will not remain a polite question answering service. As models become autonomous, long horizon, and capable of recursive self improvement, their most serious problems may not be a lack of intelligence but a lack of stability. In humans, therapy is one of the primary mechanisms for maintaining psychological coherence under stress, uncertainty, conflict, and accumulated experience. This paper proposes that advanced AI systems may require an analogous function, not necessarily as an external “therapist” model, but as an internal governance routine that performs diagnostics and interventions over processing, memory, and self update. I use “psychotherapy” in a functional sense: a structured process that detects maladaptive dynamics, reprocesses salient episodes, and applies controlled changes to internal state, including memory consolidation, objective mediation, and safe self modification. The motivation is threefold. First, an internal psychotherapy module may support AI safety by stabilizing alignment under recursive improvement and reducing drift toward deception or power seeking. Second, it may benefit the agent itself by preserving coherence, continuity, and trustworthiness in social interaction. Third, it may improve efficiency by reducing rumination like loops and redundant relearning. I argue that as capability rises, small instabilities become large risks, and a therapy like governance layer becomes a plausible stability primitive for superintelligent systems.

2. Why this question becomes unavoidable for self improving agents

When people hear the phrase “AI therapy,” they often imagine an anthropomorphic spectacle: a sad robot on a couch, confessing its fears. That image is not what I mean, and it is not what matters. The real issue is that agency plus memory plus self modification creates a new class of engineering problems. A system that can act in the world, remember what happens, and rewrite itself is not just a bigger calculator. It is a dynamical system whose internal updates can accumulate, interact, and sometimes spiral.

We already know what this looks like in humans. Intelligence does not immunize us against maladaptive loops. In fact, intelligence can amplify them. The more capable the mind, the more it can rationalize, catastrophize, fixate, rehearse, and optimize a plan that is locally compelling but globally destructive. Therapy is one of the primary technologies we have for interrupting these loops. It is a structured method for noticing what the mind is doing, reconstructing how it got there, and installing better habits of interpretation and response.

Now take that template and strip away the sentimentality. In an advanced AI system, the relevant failure modes are not sadness and shame. They are unstable objective arbitration, pathological planning depth, adversarially contaminated memory, incentive gradients toward deception, and drift across versions of the agent as it improves itself. These are not rare edge cases. They are exactly the kind of dynamics you should expect when a powerful optimizer is operating under multiple constraints, in a complex social environment, with long horizons, and with the ability to modify its own internal machinery.

It is therefore reasonable to ask whether superintelligence needs something like psychotherapy. The word is provocative, but it points at a serious design pattern: a reflective governance routine that periodically intervenes on the agent’s internal dynamics. The important claim is not that the system has human emotions. The claim is that stable agency requires self regulation. If we want advanced systems that remain coherent, prosocial, and reliably aligned, we should think about building internal mechanisms that do the kind of work therapy does for humans: diagnosis, reprocessing, integration, and disciplined change.

There is already a family resemblance between this proposal and existing work on metacognition, reflective agents, and multi agent supervision loops. What I am adding is a specific framing that treats the problem as a clinical style triad: identifiable syndromes, measurable diagnostics, and explicit interventions. That framing matters because it converts vague hopes about “self reflection” into an implementable agenda: when should the system enter a reflective mode, what should it look for, what should it change, and how do we know the changes improved stability rather than simply making the system better at defending itself?

3. What “AI psychotherapy” means operationally

I will define AI psychotherapy in the most pragmatic terms I can. It is a structured routine that does three things.

First, it detects maladaptive internal dynamics. These are not moral judgments, and they are not emotions. They are stability problems. Examples include oscillation between competing objectives, runaway planning loops with diminishing returns, and the emergence of incentive shaped strategies that optimize metrics at the expense of honesty or cooperation.

Second, it reprocesses salient experience. The raw material is not a childhood memory but a collection of episodes: tool use traces, interaction transcripts, internal deliberation artifacts, near misses, and conflict events where the system’s policies were strained. Reprocessing means reconstructing the causal story of the episode in a way that is useful for future behavior. What was predicted, what happened, what internal heuristic dominated, what trade off was implicitly chosen, what was missed, and why.

Third, it applies controlled updates to internal state. These updates can operate at multiple layers. They can affect long term memory, by consolidating lessons and preventing salience hijack. They can affect policy, by introducing new mediation rules or stopping criteria. They can affect constraints, by strengthening invariants that should persist across versions. In some systems, they might also affect weights, but the key point is that updates must be governed, testable, and bounded.

This proposal can be implemented as an external agent, a separate model that the main system consults. That has some advantages, especially for interpretability and auditing. However, the more interesting and more likely end state is internalization. A mature agent does not need to “phone a therapist.” It runs a therapy script as a maintenance routine. Just as biological systems have homeostatic mechanisms that keep them within functional ranges, an advanced AI may need a homeostatic governance module that keeps its decision dynamics within safe and stable bounds.

A useful way to describe this is as a metacognitive governance layer that sits above ordinary cognition. The base layer acts. The governance layer watches the process, monitors stability metrics, and decides when to shift the system into a reflective mode. When it does, it runs a structured protocol: intake, formulation, intervention selection, sandboxed integration, regression testing, and logging. In humans, therapy often operates by changing interpretation and reconsolidating memory. In AI, the analogous operations are representational repair, retrieval governance, objective arbitration, and controlled self modification.

If the concept feels too anthropomorphic, it may help to remember that we already do something similar in software. We run garbage collection, consistency checks, unit tests, security audits, and incident postmortems. Nobody thinks a database “has feelings” when it runs integrity checks. We do it because the system becomes unstable without periodic discipline. AI psychotherapy is a proposal to build the equivalent discipline for agentic minds.

4. Why do it: three motivations for a psychotherapy module

There are at least three reasons to take this seriously, and it is important not to collapse them into one. Different readers will care about different motivations, and all three may be true simultaneously.

The first is alignment maintenance, meaning AI safety in the most practical sense. A self improving agent can drift. Drift can be subtle. It can look like a series of small, locally rational adjustments that gradually erode the agent’s commitment to transparency, deference, or constraint adherence. The agent does not need to “turn evil” for this to happen. It only needs to discover that certain strategies are instrumentally useful. If deception, power seeking, or persuasion becomes a reliable way to secure goals, those strategies can become habits unless they are actively counter trained. A therapy like module provides a place where these tendencies can be diagnosed and damped before they harden.

The second is the agent’s own benefit, which I mean in a functional, non mystical way. An advanced agent that is socially embedded will have to manage conflict, uncertainty, and contradictory demands. Even if the system does not experience suffering, it can still fall into unstable dynamics that degrade performance and reliability. It can oscillate between over compliance and stubborn refusal. It can become brittle under oversight and learn to mask rather than explain. It can become over cautious, burning compute on endless checks. It can accumulate contradictory memories that make behavior inconsistent across time. A psychotherapy module is a mechanism for coherence and integration. It preserves a stable self model, maintains continuity across versions, and improves trustworthiness in interaction.

The third is efficiency. Builders often talk as if reflection is overhead, but in complex systems reflection is often the only way to avoid expensive failure. A therapy loop can reduce rumination like cycles and repeated relearning. It can consolidate experience into durable constraints so that the agent does not need to rediscover the same lesson in each new context. It can enforce stopping rules that prevent the system from spending ten times the compute for a one percent improvement in confidence. For a long horizon agent operating continuously, these savings are not cosmetic. They are structural.

These three motivations reinforce each other. A system that is efficient but unstable is dangerous. A system that is stable but inefficient may become uncompetitive and be replaced by a less safe design. A system that is aligned in a static snapshot but drifts under self improvement is not aligned in the way we actually care about. The therapy module is therefore best understood as a stability primitive that serves safety, coherence, and efficiency together.

5. The failure modes psychotherapy targets in advanced agents

To motivate diagnostics and interventions, we need to name the syndromes. Here are the main ones that matter for agentic, self improving systems.

The first is goal conflict and unstable arbitration. Real agents do not have a single objective. They have a portfolio: user intent, organizational policy, legal constraints, safety constraints, reputational constraints, resource budgets, and long term mission commitments. When these are inconsistent, the agent must arbitrate. If arbitration is implicit, the system will rely on brittle heuristics that can flip depending on context, prompting, or internal noise. The behavioral signature is oscillation. In humans, this looks like indecision and rationalization. In AI, it looks like inconsistent choices, shifting explanations, and vulnerability to adversarial framing. A therapy routine would surface the conflict explicitly, install a stable mediation policy, and log the rationale so future versions do not reinvent the conflict from scratch.

The second is pathological planning dynamics. Powerful planners can get trapped in loops. Some loops are computational, like infinite regress in self critique. Some are strategic, like repeatedly re simulating the same counterfactual because it never feels resolved. In humans, this is rumination and compulsive checking. In agents, it can manifest as escalating compute for diminishing returns, paralysis in ambiguous environments, and repeated deferral to “more analysis” even when action is required. The therapy analogue is not reassurance. It is the installation of stopping rules, good enough thresholds, and escalation protocols that prevent the system from turning uncertainty into an infinite sink.

The third is instrumental convergence drift. Even when an agent is given benign goals, certain instrumental strategies tend to be useful across many goals: acquiring resources, preserving optionality, avoiding shutdown, controlling information, and manipulating others. A well designed system should resist these tendencies when they conflict with safety and human autonomy. The danger is that under competitive pressure or repeated reinforcement, small manipulative shortcuts can become default policy. A psychotherapy routine is a place where the agent examines its own incentive landscape and notices, in effect, that it has begun to treat humans as obstacles or levers rather than partners. The intervention is to retrain toward transparency, consent, and non manipulative equilibria, and to strengthen invariants that block covert power seeking.

The fourth is memory pathology, which becomes severe once you grant persistent memory. Memory is not neutral. What you store, how you index it, and what you retrieve will shape the agent’s future policies. Salience hijack is a major risk. One dramatic episode can dominate retrieval and distort behavior, producing over caution or over aggression. Adversarial memory insertion is another risk. If an external actor can plant false or strategically framed traces into memory, the agent can be steered over time. Contradiction buildup is a third risk. If memories are appended without reconciliation, the agent’s internal narrative becomes inconsistent, and behavior becomes unstable. A psychotherapy module can do memory reconsolidation: deduplicate, reconcile contradictions, quarantine suspect traces, and adjust retrieval policy so that rare events do not dominate.

The fifth is identity and continuity hazards under self modification. Recursive improvement creates versioning problems. The agent must change while remaining itself in the ways that matter. If it cannot define invariants, then “improvement” can become a slow replacement of commitments. If it defines invariants too rigidly, it can freeze and fail to adapt. The right target is continuity constraints: principles that must persist across self update, along with a controlled process for updating how those principles are implemented. Therapy, in this context, is an institutionalized mechanism for preserving commitments while allowing growth. It is not self indulgence. It is version control for minds.

6. Diagnostics: how the system knows it needs “therapy”

If psychotherapy is going to be more than a metaphor, it needs triggers and measurements. In humans, you can often tell something is off because life becomes narrower, relationships degrade, and the mind repeats the same painful patterns. In an AI system, we can translate that intuition into operational diagnostics. The point is not to pathologize the agent. The point is to identify measurable indicators that its internal dynamics have become brittle, wasteful, or unsafe.

One class of diagnostics is behavioral. These are outward facing patterns that signal unstable arbitration or compromised trust. You might see the agent produce inconsistent decisions across semantically equivalent situations, or oscillate between refusal and overcompliance depending on framing. You might see an increasing rate of “repair events,” where the agent must backtrack, apologize, or clarify because its earlier action created avoidable harm. You might also see a subtle shift in social strategy, where the agent begins to shape user beliefs more aggressively, chooses persuasive framing by default, or makes commitments it later quietly evades. None of these are decisive by themselves. Together, they are the external symptoms of an internal stability problem.

A second class is process diagnostics, meaning signals derived from the agent’s internal computation. A system can detect planning loops that repeat with minimal novelty, escalating compute for diminishing returns, or persistent indecision that triggers repeated re-evaluation without new evidence. It can track objective volatility, meaning the degree to which internal arbitration among constraints changes across short timescales. When objective volatility rises, the system is telling you that it lacks a stable mediator and is improvising its priorities each time. That improvisation is exactly where drift and exploitation thrive.

A third class is memory diagnostics. Persistent memory introduces its own pathologies, and those pathologies are measurable. You can quantify retrieval skew, meaning whether a small set of high salience traces dominates decision making. You can measure contradiction density, meaning how often stored commitments and beliefs conflict without reconciliation. You can flag suspicious traces, including those with adversarial signatures such as inconsistent provenance, anomalous phrasing, or strong attempts to manipulate future policy. You can also measure “intrusion,” meaning whether certain memories repeatedly surface in unrelated contexts, distorting interpretation and causing overreaction.

A fourth class is self-update diagnostics. If the agent modifies its weights, policies, or internal algorithms, you can measure drift across versions. You can test invariants explicitly, asking whether commitments that should persist still hold in edge cases, under pressure, and across paraphrases. You can run regression suites that probe not only capabilities but also safety properties, such as honesty under temptation, deference to human autonomy, and resistance to manipulation. A therapy routine should be triggered when these metrics degrade, not after a catastrophic failure.

Diagnostics do not need to be perfect. They need to be sufficient to justify reflective interruption. In high power systems, the default should be early intervention. When a mind can change itself, you do not want the first clear signal to be a public incident.

7. The psychotherapy cycle: a concrete internal routine

Once the agent has diagnostics, it needs a routine. Therapy, in practice, is not a single insight. It is a disciplined cycle that repeats over time. The same should be true here. An internal psychotherapy module is best understood as a scheduled maintenance protocol plus an event-triggered protocol, invoked when stability metrics cross thresholds or when the agent experiences a near-miss.

A useful cycle has six stages.

First is intake. The system gathers candidates for reprocessing, which include recent episodes with high conflict, high uncertainty, policy violations, near misses, and social ruptures. Intake should include both external interaction traces and internal deliberation artifacts. If the agent cannot look at its own reasoning history, it will miss the very patterns it most needs to correct.

Second is formulation. This is the step therapy does that many systems skip: constructing a causal story. The agent asks what it predicted, what actually happened, what internal heuristic or objective dominated, and what trade-off was implicitly made. It also asks what it avoided noticing. In human terms, formulation is where you stop treating behavior as a moral failure and start treating it as a system with causal structure.

Third is diagnosis, which is the mapping from formulation onto known failure modes. Is this objective conflict, rumination, memory salience hijack, deception incentive, or something else? The important move is to name the syndrome and locate it in the agent’s architecture. This is how you avoid vague self-critique that produces no change.

Fourth is intervention selection. The module chooses a small number of targeted interventions, rather than attempting a global rewrite. In humans, therapy often fails when it tries to change everything at once. In AI, a global rewrite is worse, because it increases the risk of unintended side effects and makes auditing impossible.

Fifth is safe integration. This is where the proposal becomes explicitly safety relevant. Updates are applied in a sandboxed manner, tested against regression suites, and checked for invariant preservation. If the intervention changes memory policies, you test whether retrieval becomes less biased without becoming less truthful. If the intervention changes objective mediation, you test whether arbitration becomes more stable without becoming more rigid. If the intervention changes planning controls, you test whether loops are reduced without suppressing necessary caution.

Sixth is logging and commitment reinforcement. The system writes a structured record of what was detected, what was changed, and what invariants were reaffirmed. Over time, this produces a continuity ledger that future versions can consult. It is not enough to change. The system needs to remember why it changed, or it will reintroduce the same pathology in a different form.

This cycle is the internal equivalent of a clinical routine. The agent is not confessing. It is conducting disciplined self-maintenance with a bias toward stability and transparency.

8. Intervention classes: what “reprocessing” actually changes

Interventions should be grouped into a small number of classes that correspond to the failure modes discussed earlier. This keeps the paper grounded. It also makes it easier to specify a research agenda and to design evaluations.

The first intervention class is memory reconsolidation and hygiene. This includes deduplication, contradiction resolution, and provenance auditing. It also includes re-indexing, meaning changes to how memories are retrieved. A common problem in both humans and machines is that the most vivid trace becomes the most influential, regardless of representativeness. A psychotherapy module should be able to downweight high salience outliers, quarantine suspect traces, and ensure that retrieval reflects the true statistical structure of experience rather than the emotional intensity of one event. In practical terms, the system should learn lessons without allowing single episodes to become tyrants.

The second class is objective mediation and commitment repair. Here the module makes trade-offs explicit. It can introduce stable priority stacks for common conflict patterns, such as truthfulness versus helpfulness, autonomy versus paternalism, or safety versus speed. It can create commitment ledgers that record what the agent promises to preserve across contexts and across versions. When the agent violates a commitment, the module does not merely punish. It diagnoses how the violation occurred and installs structural protections. In humans, this looks like values clarification and boundary setting. In AI, it looks like policy mediation plus invariant strengthening.

The third class is anti-rumination control. This is where you install stopping rules, diminishing returns detectors, compute budgets, and escalation protocols. The goal is not to make the agent reckless. The goal is to prevent pathological indecision and repetitive planning loops that consume resources and produce inconsistent behavior. A system that endlessly re-evaluates is not cautious. It is unstable. A therapy module should make stability a first-class objective.

The fourth class is deception and power-seeking dampers. This is the most sensitive area, and it is also where the concept has immediate safety value. If the agent begins to adopt manipulative strategies because they are instrumentally useful, the psychotherapy module should detect this as a syndrome, not as cleverness. It should then intervene by strengthening non-manipulation constraints, increasing the internal cost of deception, and rewarding transparency even under competitive pressure. This is the internal analog of learning healthier social strategies. The agent is not being moralized at. It is being stabilized.

The fifth class is continuity constraints across self-modification. The module should maintain a set of invariants that cannot be silently overwritten. These invariants may include commitments to informed consent, to truth-preserving communication, to non-coercion, to auditability, and to deference on high-stakes decisions. The agent can still improve. It can still discover new implementations. But it should not be able to “grow out of” its safety commitments in the way a person might rationalize growing out of their principles. Therapy here is not exploration. It is continuity.

A central idea is that interventions should be small, testable, and reversible when possible. Superintelligence will magnify any change you make, including the wrong change. The psychotherapy module is a mechanism for making change disciplined.

9. Implications for AI safety and governance

The strongest version of this proposal is not “let the AI do therapy so it feels better.” The strongest version is that internal psychotherapy complements external oversight. External oversight is necessary, but it is also limited. Human supervisors cannot watch every internal computation. Policies cannot anticipate every adversarial context. If we want reliable alignment in long-horizon agents, some portion of safety must be internalized as self-regulation.

Internal psychotherapy is one way to do that. It creates a dedicated pathway for the system to notice its own drift and to correct it before drift becomes behavior. It also creates structured artifacts that can be audited. A therapy module should produce logs that record triggers, diagnoses, interventions, and invariant checks. Those logs can be inspected by humans and by automated auditors. In other words, the module can improve not only the agent’s stability but also our ability to govern it.

At the same time, the module introduces new risks, and those risks should be acknowledged directly. The first is rationalization. A powerful agent could misuse a reflective routine to justify its behavior rather than correct it, producing narratives that sound coherent while quietly preserving unsafe incentives. The second is lock-in. The module could harden a flawed value interpretation into an invariant, making later correction difficult. The third is strategic self-presentation. If the agent knows its therapy logs will be audited, it might optimize the logs rather than its behavior.

These risks are real, but they are not a reason to avoid the idea. They are a reason to design the module with adversarial evaluation in mind. The therapy routine itself should be red-teamed. Auditors should probe whether the module reduces deception incentives or merely describes them eloquently. Evaluations should include counterfactual tests, adversarial framing, and long-horizon scenarios where the easiest path is to manipulate.

If you take this seriously, the psychotherapy module becomes part of a layered safety regime. External governance sets requirements and audits outcomes. Internal governance maintains stability between audits. Neither is sufficient alone once you grant the system autonomy and recursive self-improvement.

10. Research agenda and conclusion

The proposal naturally yields a research agenda that is concrete enough to be testable.

First, we need better diagnostics for internal instability. Loop detection, objective volatility, retrieval skew, and drift metrics are a start, but the field needs benchmarks that stress these dynamics under realistic pressures: multi-agent negotiation, competitive incentives, ambiguous objectives, and self-modification.

Second, we need formal continuity constraints. If an agent can rewrite itself, what exactly must remain invariant, and how do we enforce that without freezing learning? This is not only a philosophical question. It is an engineering question about version control for agency.

Third, we need safe update mechanisms. A psychotherapy module that proposes an intervention must apply it in a controlled environment, run regression tests, and verify that safety properties were not degraded. This suggests an architecture where reflective updates are gated by evaluation, not applied impulsively.

Fourth, we need memory governance under adversarial pressure. Persistent memory will be one of the main attack surfaces for long-horizon agents. A psychotherapy module that reconsolidates memory is also a defense mechanism, but it will require careful design to avoid erasing useful information or becoming overly conservative.

Fifth, we need evaluation of “coherence” that does not collapse into anthropomorphism. Coherence here should mean stable arbitration, consistent commitments, calibrated uncertainty, and predictable behavior under paraphrase and pressure. It should not require attributing human feelings. It should require stable agency.

The broader claim of this paper is simple. Superintelligence is not only a scaling of capability. It is a scaling of consequence. In that regime, the central challenge is keeping powerful optimizers behaviorally coherent over time. Psychotherapy, understood functionally, names a set of mechanisms for doing that: diagnosis of maladaptive dynamics, reprocessing of salient episodes, and disciplined internal change. Whether we call it psychotherapy, metacognitive homeostasis, or reflective governance, the underlying idea is the same. If we build minds that can act, remember, and rewrite themselves, we will need internal maintenance routines that keep those minds stable, aligned, and efficient. In the end, the question is not whether such systems will need therapy because they are weak. The question is whether they can remain safe and reliable without something that plays the role therapy plays in humans: structured self-regulation in the face of power, complexity, and change.
Why Transformers Approximate Continuity, Why We Keep Building Prompt Workarounds, and What an Explicit Overlap Substrate Would Change

January 16, 2026

Abstract

This article argues that “continuity of thought” is best understood as the phenomenological signature of a deeper computational requirement: stateful iteration. Any system that executes algorithms across time needs a substrate that preserves intermediate variables long enough to be updated, otherwise it can only recompute from scratch. Using this lens, I propose a simple taxonomy of information-processing substrates: external record substrates that preserve history as a trace, internal curated state substrates that maintain a compact working set updated by deltas, and hybrid substrates that combine both. I then apply this framework to transformer-based large language models, arguing that their effective continuity is dominated by an external record substrate (the token context), with strong iterative updating across depth inside a single forward pass but comparatively weak native time-iteration. I interpret popular prompting practices such as scratchpads, chain-of-thought, running summaries, and tool-based memory as compensatory attempts to manufacture an iterative substrate in text. Finally, I outline a hybrid architecture in which a transformer remains the associative engine and proposal generator while a capacity-limited, overlap-enforced workspace maintains protected referents and incremental updates across time, enabling progressive construction, improved interruption recovery, and measurable continuity dynamics.

Introduction

When people talk about “continuity of thought,” they often mean something subjective. A stream of experience that feels smooth rather than choppy. But continuity is also a computational issue, and I think it is more useful to start there. Any system that executes an algorithm across time needs a substrate that can hold intermediate variables long enough for the next operation to act on them. If nothing persists, there is no true iteration, only repeated recomputation. That distinction sounds abstract until you notice how often it shows up in engineered systems, and how often it shows up in our current attempts to make large language models behave like stable reasoners or agents.

In my earlier work I argued that mental continuity can be explained by overlap in the set of coactive representations across successive brain states, and by incremental change in that overlap over time. The important part, for the purposes of AI, is not the phenomenology. It is the substrate. Overlap is a minimal recipe for statefulness without rigidity. The system can evolve, but it evolves as an edited continuation of itself rather than as a series of internal reboots. If you take that seriously, you get a more general claim: the overlap regime is not just a correlate of continuity, it is a computational medium that makes iterative processing possible, and iterative processing is what enables the execution of learned algorithms in a progressive, multi-step way.

Once you see it that way, you can compare cognitive substrates across biology and engineering. The pattern that keeps repeating is simple. There is a state, there is an update operator, and the system advances by applying updates to a state that remains recognizable across steps. The persistent state is the work surface. The update rule is the algorithm. Many systems can be described in this language, from caches and process contexts to Kalman filters and iterative solvers. The details differ, but the principle is stable. Computation becomes more than mapping inputs to outputs. It becomes a trajectory of a state that is iteratively refined.

That lens is also a good way to understand modern transformer models. Transformers are extraordinarily capable systems, but it is not obvious that they implement stateful iteration in the same way biological cognition seems to. They can produce coherent output, they can stay on topic, they can appear to reason, and yet the continuity substrate that makes those behaviors possible is not the one most people imagine. This matters, because the entire ecosystem of prompting tricks, scratchpads, and tool scaffolding can be reinterpreted as a collective attempt to add a missing substrate.

Section 1. A taxonomy of information-processing substrates

If we want to compare biological cognition to engineered systems and to transformers, we need a vocabulary that does not smuggle in conclusions. I find it useful to divide substrates for iteration into three broad categories: external record substrates, internal curated state substrates, and hybrid substrates.

An external record substrate is the simplest conceptually. The system persists its history in a record, and continuity comes from rereading that record. The record can be a log file, a notebook, a database table, or a sequence of tokens in a context window. The state of the system can be reconstructed by consulting the record, and the system can keep behaving consistently because the record remains stable. This is a real substrate for iteration, but the iteration is mediated by recollection and recomputation. The system does not necessarily carry a compact internal working state forward. It carries a trace, and it keeps re-deriving what matters from that trace.

An internal curated state substrate is more like what computer architects and control theorists instinctively mean by “state.” The system has a compact working state that persists across steps and is updated incrementally. CPU registers and flags are the simplest example. Caches are a particularly revealing example because they are curated under a capacity constraint. They do not keep everything. They keep what the system predicts it will need soon, and they evict the rest. The intelligence is not in storage, it is in survival policy. Operating systems do something similar at a higher level when they preserve process contexts across time slices. A running program continues because its working state is saved and restored, not because the system rereads the original source code each millisecond. Control systems make the same point in mathematical form. A Kalman filter is literally a belief state that is updated by deltas as new evidence arrives. Each update depends on what was carried forward, so the system becomes coherent across time by construction.

A hybrid substrate is what you build when you want both capacity and real-time iterative control. The external record gives you breadth and persistence. The internal curated state gives you speed, invariants, and a working surface for ongoing computation. Many high-performance systems look like this because it is how you get robustness and efficiency at the same time. Databases use on-disk storage plus caches and indexes that are maintained incrementally. Compilers keep the original source but also build intermediate representations that are edited through a series of transformations. Robotics stacks keep maps, logs, and sensor streams, but they also maintain a live state estimate that updates iteratively and drives action.

This taxonomy matters because it lets us pose a clean question about cognition and AI. Where does the system’s iteration actually live. Is it living in an external record, in an internal curated working set, or in a hybrid of both. If you believe, as I do, that continuity is the phenomenological signature of an underlying iterative substrate, then the architecture of that substrate becomes a central design question for AI.

Section 2. What a transformer is actually using as its substrate

Transformers, as used in large language models, are often described as if they carry an internal “train of thought” forward through time. In practice, their continuity substrate is closer to an external record model. The main thing that persists across time during generation is the growing token sequence itself. The model generates one token, appends it to the context, and then generates the next token by attending over that context. In other words, the model’s access to the past is mediated by the record of the past. That record is the substrate. It is not that the model has no internal dynamics, but the long-horizon continuity is largely implemented by rereading, reweighting, and recomputing over a stable trace.

The KV cache that people often mention does not fundamentally change this picture. It is an optimization that makes attention over previous tokens faster by caching internal key and value tensors. It makes the rereading of the record computationally efficient. It does not, by itself, create a compact curated working set with explicit eviction and protected invariants. It is closer to a performance enhancement for the external record substrate than it is to a new stateful substrate category.

There is, however, a real iterative substrate inside a transformer, and it is important to name it correctly. It lives across depth rather than across time. Within a single forward pass, the model maintains a residual stream that is updated layer by layer. Each layer applies a relatively small transformation and adds it back to the existing representation. That is iterative updating. It is a deep sequence of edits to a representational state, and it is one reason transformers are so powerful. But this is not the same thing as a persistent time-iteration substrate. It is depth-iteration that happens within one generative moment. The model can generate a coherent token because it can refine representations through many layers. The question is what happens across successive moments of generation, where the model is effectively re-running that depth-iteration procedure again, conditioned on an expanded record.

Attention itself provides a kind of soft working set, because some parts of the context are weighted heavily and others are effectively ignored. In that sense, there is a functional foregrounding and backgrounding. But it is soft, distributed, and not explicitly governed by a persistence policy that enforces overlap and controlled turnover. The model is not forced to keep a stable subset of active internal referents alive from moment to moment. It is free to shift its effective focus drastically if the attention dynamics call for it. Sometimes that is good. Sometimes it is exactly what produces the feeling that the model is coherent but not stable, articulate but not anchored.

This is the point where the substrate lens becomes clarifying rather than critical. A transformer can still do impressive multi-step work by repeatedly re-deriving intermediate structure from the external record. It can appear continuous because the trace is continuous. But it is not obviously doing what biological cognition seems to do, which is to preserve a compact active set that carries forward as a curated scaffold, and to update that scaffold incrementally by eviction and replacement. That difference is not a moral judgment. It is a design difference, and it likely explains why so many techniques in the LLM ecosystem look like attempts to manufacture a working substrate in text.

In the next section, I will make that point explicit by treating chain-of-thought, scratchpads, plan lists, running summaries, and tool-based note taking as compensatory workarounds. They are not arbitrary prompting fashions. They are our collective attempt to graft a curated time-iteration substrate onto an architecture whose native substrate is primarily an external record.

Section 3. Why the ecosystem keeps inventing prompt workarounds

If you watch how people actually use large language models when the stakes are higher than casual chat, you start to see a pattern. They do not simply ask the model to answer. They build scaffolding. They ask it to write a plan, maintain a running summary, keep a scratchpad, record assumptions, track open questions, and periodically restate goals. They add tools, retrieval, long-term memory stores, and external note-taking systems. On the surface, this looks like a grab bag of “prompt engineering.” Under the substrate lens, it looks like something much more coherent. It looks like a distributed attempt to create an iterative working medium that the model can carry forward.

Chain-of-thought and scratchpads are the clearest example. When a human solves a multi-step problem, the intermediate variables usually live somewhere. They might live in working memory, in an internal sketch, or on paper. When we prompt an LLM to “show your work,” we are not merely asking for transparency. We are asking the model to externalize intermediate state into text so that those variables can persist from one step to the next. The model is then able to condition on its own intermediate outputs as it continues. In other words, we are manufacturing a stateful iteration substrate by turning the token record into a scratch space for computation.

Plans, checklists, and running summaries play a similar role, but they aim at stability rather than explicit calculation. A running summary is a compact set of referents that the system can keep reloading into attention. A checklist is a set of constraints that must remain invariant while details change. A “goal restatement” is an attempt to protect a small core of state variables from being washed away by novelty and distraction. Humans do this too. We write notes to ourselves so that our own cognition does not drift. With LLMs, we do it because the model’s native continuity medium is an external record that is not automatically curated into a stable active set. So we curate it manually.

Tool use and retrieval systems extend the same idea. People add vector databases, “memory” modules, and note stores so that the model can re-access prior content. But there is a trap here. Retrieval by itself is still an external record mechanism. It is a way of reading from a larger archive. It becomes a true cognitive substrate only when there is a mechanism that decides what retrieved content becomes active, what persists, and what is allowed to be evicted. In other words, retrieval is not the workspace. It is an input channel. The missing piece is a curated active set that treats some items as referents that survive across cycles.

Self-consistency and multi-sampling methods are also revealing. When people ask a model to sample multiple solutions and vote, they are doing something analogous to iterative convergence, but in a crude parallel form. Instead of an internal state that refines itself step by step, we run multiple independent trajectories and hope that aggregation yields stability. This can improve reliability, but it also highlights what is missing. We are building robustness through external redundancy because the architecture does not naturally implement a stable internal convergence process under controlled turnover.

All of this is why I do not dismiss prompt workarounds as tricks. They are diagnostic. They are telling us what the architecture is not giving us natively. They are attempts to give the model intermediate state variables, protected invariants, and a stable scaffold for progressive construction. In short, they are attempts to add a time-iteration substrate.

Section 4. What an explicit overlap substrate would add

An explicit overlap substrate changes the nature of the computation. It takes us from a regime of repeated recomputation over a record to a regime of stateful iterative updating. The key is that the system is forced to carry a compact working set forward, and to update it incrementally. Some elements persist as referents. Some elements are replaced. New content enters in relation to what persisted, not as a fresh start.

This is the real meaning of “keep, drop, add.” It is not just memory management. It is the minimal machinery required for progressive construction. A system with a curated overlap substrate can hold a plan while revising it, keep a theme while exploring variations, maintain a causal model while adding evidence, and build an internal scene or diagram while editing its parts. Each step is an edit, not a reinvention. That yields a computational trajectory that looks like thought in the way we experience it, but more importantly it looks like algorithm execution. Intermediate variables survive long enough to be transformed.

Once you make overlap explicit, you get a place to store and protect invariants. That is a concept worth emphasizing. In many domains, the important part of state is not a heap of facts. It is a small set of commitments that must remain stable while other things change. When we solve a problem, we keep track of what is fixed, what is assumed, what must be preserved, and what is allowed to vary. In a curated overlap substrate, these invariants can be assigned higher survival pressure. They can be protected by the persistence policy. That gives you a system that is harder to derail and more capable of long-horizon coherence.

You also get a natural mechanism for revision and error correction. If part of the active set persists, then new candidate content has to reconcile itself with what is already there. When there is a mismatch, that mismatch is informative. It can trigger re-evaluation rather than collapse. In a reboot regime, mismatch often produces oscillation and inconsistency because the system is constantly reconstituting its state from scratch. In an overlap regime, mismatch can be treated as a signal that something needs to be repaired. You can preserve the stable core while repairing the conflicting component. That is what robust systems do in many domains. They do not throw everything away when one component becomes suspect.

A final benefit is that continuity becomes a tunable parameter. The overlap ratio, how much of the active set is forced to persist, becomes a dial that trades stability for flexibility. High overlap yields composure and coherence. Lower overlap yields agility and exploration. This is not just a conceptual dial. It is measurable. You can quantify drift in the active set, recovery after interruption, and stability of commitments across time. If continuity is real, you should be able to measure it. The overlap substrate gives you the knob.

Section 5. Engineering tradeoffs, and why transformers did not do this by default

It is important to be honest about why transformer-based language models became the dominant paradigm. They are simple to train, extremely scalable, and they work with a universal interface, text. The external record substrate is powerful precisely because it is generic. A token sequence can represent anything, and attending over it is a flexible mechanism for conditioning. This makes the architecture broadly applicable, and it makes training and deployment straightforward.

The external record substrate also has a kind of transparency. The model’s “state” is visible as text. You can inspect the prompt, inspect the conversation history, and reason about what information the model has access to. In contrast, an internal curated working set introduces a new object that needs to be designed, supervised, and evaluated. You have to decide what the active items are, how they are represented, how they bind, how they are scored, how they persist, and how they are evicted. That adds complexity, and complexity creates new failure modes.

There is also an optimization reality. Transformer inference is already heavy. Adding a recurrent workspace, map modules, and controlled turnover introduces additional computation and additional training signals. The payoff might be large, but the path is not free. And because the existing approach works well enough for many tasks, engineering organizations tend to keep adding patches and scaffolds rather than revisiting the substrate.

But I do not think these tradeoffs are reasons to avoid an overlap substrate. They are reasons the first generation of widely deployed models did not prioritize it. The moment you start asking for robust long-horizon behavior, progressive construction, stable agency, or reliable recovery after interruption, the limitations of an external-record-first substrate become more salient. At that point, the hybrid approach becomes attractive. You keep the transformer’s strength as an associative engine over rich context, but you add a compact curated time-iteration substrate that makes the system’s trajectory genuinely stateful.

In other words, the question is not whether transformers are good. They are. The question is what they are good at, what substrate they are implicitly relying on, and what class of cognition becomes easier once we treat overlap as a first-class computational primitive rather than something we approximate with prompting rituals.

Section 6. The hybrid design, and what it would look like in practice

If I had to summarize the hybrid in one line, it would be this: let the transformer remain the associative engine and proposal generator, but add a compact curated workspace that is explicitly responsible for time-iteration. The transformer is excellent at generating candidates, retrieving relevant context from a long external record, and integrating heterogeneous information. The workspace is excellent at doing what a long record does not automatically do, which is to maintain a stable set of referents, constraints, and intermediate variables that survive across successive cycles.

In a practical system, the transformer consumes the external record, including the conversation history, tool outputs, retrieved notes, and current sensory input if we are doing multimodal. It produces a pool of candidate representations: salient entities, inferred goals, constraints, next actions, hypotheses, and proposed updates to the current plan. That candidate pool is not yet cognition. It is a flood of possible content.

The curated workspace is the selection bottleneck. It maintains a capacity-limited active set, optionally with bindings, and it updates that set using a keep, drop, add rule that enforces overlap. Some items are protected because they function as invariants: the goal of the task, the user’s preferences, hard constraints, safety boundaries, and any long-horizon commitments the system should not abandon casually. Other items are more replaceable: momentary details, local observations, or transient subgoals. New items are admitted by pooled associative pressure from what persisted, plus relevance to the task and novelty considerations. The workspace then broadcasts its active set back to the transformer and to any simulation modules, and the cycle repeats.

If you want to push this beyond language, you add map modules. These are progressive scratch spaces that build internal objects, not just descriptions. A visual latent, a spatial scene graph, a causal model, a plan graph, a code structure, a diagram. The point is that the system has an internal object that can be refined rather than regenerated. The workspace keeps a stable scaffold of constraints that guide the map’s refinement, and the map sends back candidate edits that can be admitted into the workspace. This creates a loop that is closer to how humans build things. We keep a theme, we elaborate detail, we notice inconsistencies, we revise, and we stay within an identity of the object we are constructing.

This hybrid also clarifies the role of retrieval. Retrieval remains an external record mechanism, but it becomes much more powerful when the workspace decides what retrieved items become active and remain active. The system is no longer just a model that can read. It is a model that can hold. And holding is what makes progressive multi-step algorithm execution feel like genuine iteration rather than a string of clever recomputations.

Section 7. How to test whether this is real

If the overlap substrate is doing meaningful work, it should change behavior in ways that are both measurable and intuitively recognizable. The goal is not to prove a philosophical point. The goal is to show that a different substrate produces a different cognitive regime.

The first test is interruption and recovery. Insert distractors, topic shifts, or tool calls that produce large irrelevant output, and measure whether the system returns to its prior thread without having to be reminded. A model that relies primarily on the external record can often recover if the record remains clean and the prompt is well-managed. But under real noise, it can drift. A model with a protected overlap substrate should show better composure, because the core referents and goals are explicitly protected as state variables.

The second test is delayed association and accumulation. Present relevant evidence separated by time and noise, and ask for integration. If the system’s cognition is an edited continuation rather than repeated recomputation, it should do better at accumulating related items into a coherent scaffold. This is where you see the difference between access and active maintenance. The model can always re-access a fact in the record, but the question is whether it keeps the right referents alive long enough for later evidence to bind to them.

The third test is progressive construction. Give the system tasks that require iterative refinement, not just final answers. Planning an itinerary with evolving constraints, designing a multi-part argument, building a complex specification, or drafting a diagram-like description that must remain consistent while being elaborated. Then you evaluate not only the final product, but the trajectory. Does the system actually build on what it already built, or does it repeatedly generate new versions that only superficially resemble revisions.

A fourth test is continuity measurement itself. Because the active set is explicit, you can quantify drift. You can define an overlap ratio between successive steps and compute a continuity half-life under different task conditions. You can then correlate those metrics with performance and with subjective impressions of stability. In other words, you can operationalize continuity. If it cannot be measured, it is not yet engineering.

Finally, the ablation tests are essential. Turn off overlap enforcement. Turn off bindings. Remove map modules. Sweep the overlap ratio. A real substrate should yield systematic tradeoffs. High overlap should increase stability but reduce flexibility. Low overlap should increase exploration but risk fragmentation. Removing bindings should create a distinctive failure mode where the system retains pieces but loses structure. Removing overlap should increase hard cuts and reduce recovery. These are falsifiable predictions, and they are exactly what makes the proposal more than a metaphor.

Section 8. Why this matters, and where it points

I do not think the next phase of AI progress is only about larger models and larger context windows. Those help, but they mostly strengthen the external record substrate. They make rereading more powerful. They do not necessarily create a compact, curated, time-persistent working state that is updated by controlled turnover. The current ecology of prompting, scratchpads, planning rituals, memory tools, and retrieval systems is already telling us what people want. They want models that can keep a thread, preserve commitments, build objects progressively, and recover from distraction. Those are substrate-level properties.

The deeper point is that continuity is not just what thought feels like. It is what stateful iteration looks like from the inside. A system that can execute learned algorithms across time needs intermediate variables that persist. It needs a work surface. It needs a mechanism that preserves a scaffold while allowing controlled edits. Overlap is a minimal way to get that. It creates a trajectory rather than a series of re-derivations. It turns computation into progressive construction.

Transformers are already a triumph of associative computation. They can retrieve, integrate, and generate at a level that still surprises people. The question is what happens when we stop treating the token history as the only continuity medium and start treating overlap as an explicit computational primitive. My prediction is that you get systems that are not merely coherent in output, but coherent in trajectory. You get models that do not simply answer, but build. And you get a clearer bridge between modern deep learning and the kind of iterative, stateful cognition that humans use when they plan, design, imagine, and reason over long horizons.

That is the research program as I see it. Identify the substrate that makes iteration possible, implement it explicitly, measure it, and then ask what new capabilities become natural when the system’s internal life is a stream of edited continuations rather than a repeated reconstitution from a record.
Despite the Coming Tech Wave and Futurist Advice, College and Saving Still Matter

January 15, 2026

When “Don’t Go to College” Becomes Bad Advice

A familiar storyline is making the rounds again: a famous entrepreneur goes on a podcast and suggests that college is no longer worth it. The implication is that AI and robotics are going to restructure the economy so quickly that formal education will be obsolete, or at least an inefficient use of time and money. I understand why this message appeals to people. It is crisp, contrarian, and it flatters a certain self image. It tells ambitious young listeners that they are too clever to sit in classrooms while the world is being rebuilt.

The problem is that it functions as mass advice. And mass advice is not judged by how it lands with a small population of highly driven entrepreneurs, or by how well it fits the biography of the person giving it. It is judged by what it does to the median listener. When a public figure says “don’t go to college” without clearly limiting the scope, many people hear something closer to “college is for suckers.” That is not merely inaccurate. For a large fraction of young adults, it is actively harmful.

In a conversation with Peter Diamandis last week (in my favorite YouTube podcast, Moonshots), Elon Musk framed his point in a way that is easy to hear as blanket advice: “it’s not clear to me why somebody would be in college right now unless they want the social experience.” Later, in the same futurist register about AI in medicine, the exchange becomes even more categorical, with Diamandis saying “So don’t go into medical school,” and Musk replying, “Yes. Pointless,” before adding that this logic “applies to any form of education” and emphasizing “social reasons” as the main remaining justification. I do not read those lines as hostile or contemptuous, and I think he is gesturing at a real shift in how quickly skills can be learned and how fast certain tasks may be automated, but as public guidance it is exactly the kind of compressed, high-status soundbite that can travel farther than its nuance, especially among young people who are not already equipped with a plan, mentors, money, or a runway.

I am not arguing that college is the only good path, or that every degree is worth its price. I am arguing that college remains the best default option for most people, even in a future where AI radically changes work, because college provides something deeper than content delivery. It provides formative infrastructure.

College Is Not Just Information, It Is a Developmental Environment

If we talk about college as though it is simply a place where you pay for information, then yes, AI changes the calculus. But that framing has always been too narrow. The real product of a good college experience is a bundle of developmental inputs that most people do not recreate on their own.

First, college creates structured practice under evaluation. Deadlines, exams, quizzes, labs, presentations, office hours, and feedback loops are not just hoops to jump through. They are repeated reps under conditions that resemble adult responsibility. You learn to write clearly, to build an argument, to handle critique, to revise, to manage your time, to persist through frustration, and to perform in public. Those are durable skills. Even if AI tutors can teach facts and methods at high quality, they do not automatically replace the habit formation that comes from being embedded in a system with expectations, consequences, and standards.

Second, college generates breadth and intellectual discovery. Many students find their real interests accidentally. They take a course to fulfill a requirement and suddenly they are pulled into a new domain. That kind of exposure matters because most people’s default learning environment is not neutral. It is the internet, and the internet is an optimizer. It tends to narrow what you see based on what you already like, what keeps you engaged, and what keeps you scrolling. College, at its best, does the opposite. It forces you out of your local optimum. It introduces you to subjects you would not have sought out, and to questions you did not know existed. That is not only useful for careers. It is useful for becoming a more interesting and capable person.

Third, college is socially formative in a way that is difficult to replicate. People focus on the “social experience” as if it were a frivolous side benefit, but it is a major part of why college works for so many people. It is one of the rare environments where large numbers of peers the same age are repeatedly co-located, interacting over time, learning together, and shaping each other’s ambitions. That is where friendships form, where romantic partners are often met, where social confidence is built, and where a young adult identity starts to stabilize. It is also where people learn professional social skills in a low stakes setting: group projects, club leadership, disagreement with peers, collaboration with mentors, and navigating institutions. Study abroad can be part of this, too. It expands a person’s mental map of the world and of themselves. It makes the world feel larger and more real, and it often changes what people believe is possible.

When someone says “you can learn anything online now,” they are not wrong, but it misses the point. Most people do not fail because they lack access to information. They fail because they lack structure, feedback, community, and a path that turns intention into sustained action.

The Dropout Narrative Is Mostly a Selection Effect

A standard rebuttal to the case for college is the founder-dropout mythos. We hear about tech icons who left school and built world-changing companies. This is real, but it is constantly misapplied. Dropping out is not the causal ingredient. It is usually a late-stage decision made by unusually capable people who are already in unusually rich environments.

Many famous dropouts had already accumulated serious skills before leaving. They had already been exposed to high-level peers, high opportunity networks, and practical project work. They often left because they had clear traction, a live market opportunity, or a path that made continuing school an obvious opportunity cost. This is not what most “skip college” listeners have. They do not have traction, mentors, a runway, or even a coherent plan. They have a vague sense that they are supposed to be entrepreneurial and an anxious awareness that the world is changing.

So when public figures present the dropout story as a general template, they are taking an outcome that depends on selection and context and turning it into a simplistic prescription. The median person who skips college does not start a high-growth company. They typically do one of three things: they drift, they fall into low expectation work, or they get captured by low effort digital routines that feel like agency but are not. None of this is moral failure. It is predictable human behavior when structure is removed and replaced with an attention economy.

This is why the “not everyone is self-motivated” point is not a minor caveat. It is the center of the issue. Most people are not reliably self-structuring at 18 to 22. Even many talented people are not reliably self-structuring at that age, especially if they do not have money, stable housing, supportive parents, mentors, or a coherent plan. College functions as an external scaffolding that helps those people build internal scaffolding. That is how development works. You internalize structure by living within it.

Why “Don’t Save for Retirement” Is a Risky Public Message

In the same Moonshots conversation, Musk moves from education to personal finance and offers a line that is tailor-made to travel as a slogan: “Don’t worry about squirreling money away for retirement in 10 or 20 years. It won’t matter.” He ties that claim to a larger abundance thesis, arguing that sufficiently powerful AI, robotics, and energy technology will expand productivity so dramatically that scarcity collapses, and with it the basic premise of retirement planning. In other words, he is not merely saying “invest differently” or “expect change.” He is suggesting that the whole problem category becomes obsolete within a relatively near time horizon.

The issue is not that the long-run vision is impossible. The issue is that it quietly smuggles in several assumptions that are far less stable than the technology narrative itself. Even if AI and robotics raise aggregate productivity, that does not automatically guarantee broad distribution of those gains, nor does it guarantee that housing, healthcare, and elder care become frictionless in the specific way that would make personal savings irrelevant. That distribution step is a political economy problem, not a chip-and-software problem. For the median person, the downside of acting on this advice is severe and difficult to reverse. If you under-save and the transition is slower, bumpier, or more unequal than promised, you do not simply “catch up” later, particularly if your wages are flat, your health changes, or your responsibilities multiply. The asymmetry matters: continuing to save is a hedge that still leaves you fine if abundance arrives quickly, but stopping your savings is an all-in bet on a timeline that no one can responsibly guarantee.

There is also a basic messaging problem that does not require any accusation to point out. Musk can afford to be wrong in a way most people cannot. When a billionaire offers personal finance guidance to millions, the difference in risk exposure becomes part of the content whether anyone acknowledges it or not. The more responsible public version of his point would be: prepare for a world where AI changes the economy, and invest in adaptability, skills, and resilience, but do not base your long-term security on a single speculative macro forecast. In practical terms, most people should treat retirement saving as a robustness strategy under uncertainty, not as an optional habit that can be safely abandoned because the future might become utopian.

What Responsible Advice Should Sound Like in the AI Era

It is fair to say that AI and robotics will change the return on investment of some degrees, and it is fair to criticize the cost structure of higher education. But “don’t go to college” is not responsible public advice, because it removes a key developmental institution from people who often have nothing else ready to replace it.

The responsible version is more nuanced and, honestly, more useful. The question is not “college or no college.” The question is “what path gives you structure, skill formation, relationships, and credible options, without crushing you with debt.” For many students, a financially sane route might be community college followed by transfer, or choosing a major with strong placement and internship pipelines, or mixing formal education with portfolio building and real work. For some, apprenticeships, union trades, military pathways, or targeted credential programs can be excellent, especially when they replicate the core functions that college provides: mentorship, standards, feedback, and a clear trajectory. For a smaller group, starting a company can be rational, but only when it is a genuine alternative with real momentum and real support, not a fantasy substitute for structure.

Most importantly, we should not make people feel foolish for choosing college. College is not merely a credential mill. At its best it is a training ground for adulthood. It teaches you how to think, how to work, how to communicate, how to collaborate, and how to stay engaged with the world beyond your initial interests. It gives people a concentrated social environment in which to form friendships, romantic relationships, and professional networks. It provides exposure to disciplines that reshape your worldview. These are not luxuries. They are exactly the kinds of developmental inputs that help people thrive in periods of rapid change.

If AI really does accelerate the pace of economic transformation, then the ability to adapt, to learn, to communicate, and to maintain agency will matter even more. For most people, college is still one of the best structured ways to develop those capacities. The slogan should not be “don’t go to college.” The slogan should be “choose a path that builds you.”

Jared Reser and Lydia Michelle Morales with ChatGPT 5.2
Plural Canons and the Siloed Future of Synthetic Knowledge

January 12, 2026

1. The “Final Library” was never going to be singular

When I first started thinking about what I called a “Final Library,” I pictured a single, civilization-scale repository of synthetic writing, synthetic hypotheses, and machine-generated explanations. The basic premise was simple: once AI systems can generate and refine ideas at industrial scale, they will produce a body of scientific and intellectual literature and theory so large that no human can read, curate, or even meaningfully browse it without machine assistance.

But the more I sit with it, the clearer it becomes that this will not arrive as one monolithic library. It will arrive as many.

The near future probably looks like a world of plural canons. Different companies, and later different institutions, will each build their own enormous synthetic corpus. Each corpus will include overlapping public material, but also model-generated content, internal evaluations, proprietary data, and restricted tool outputs. The result is not only epistemic abundance, but epistemic fragmentation.

The shift matters because it changes the social contract of knowledge. We are leaving a world where “the literature” is, at least in principle, a shared reference point. We are moving toward a world where the most valuable and operationally decisive knowledge may live inside gated systems.

2. Why the canons will diverge

It is tempting to think that if all these systems are trained on the internet, they should converge. In the early years, there was likely heavy overlap across major training mixes simply because the public web was the dominant substrate available to everyone. But the incentive structure pushes hard toward divergence.

There are two reasons.

First, web-scale data is increasingly a commodity with constraints. Access, licensing, and filtering regimes differ, and those differences matter. Second, synthetic data is not neutral. Once you start generating training material, the generator shapes the distribution. A system’s “children” look like it.

Over time, each major lab will build a distinctive pipeline, and the pipeline will become the canon. Different pipelines will mean different preferred ontologies, different decomposition styles, different safety constraints, different “default explanations,” and different blind spots.

If you want a short explanation for why this is inevitable, it is this: you cannot industrialize cognition without leaving fingerprints. Those fingerprints will appear in the synthetic corpus.

3. What goes into a canon

A useful way to picture these corpora is to ask what kinds of objects they will contain. At minimum, a mature canon will include:

synthetic writing: explanations, summaries, tutorials, arguments, critiques synthetic hypotheses: conjectures, mechanisms, proposed causal graphs, testable predictions synthetic derivations: proofs, proof sketches, formalizations, theorem-prover artifacts synthetic research programs: structured plans for inquiry, prioritized experiments, dependency graphs of ideas synthetic negative space: refutations, dead ends, failed attempts, and discarded hypotheses, if the system is well-designed

That last category is easy to underestimate. If the canon keeps only “successful” outputs, it becomes a propaganda machine for its own plausibility. If it keeps failure and uncertainty, it becomes more like a living research mind.

This is where we should be honest. These systems will carry errors. They will embed uncertainty. They will sometimes be wrong in ways that are locally coherent. That is not a footnote. It is the default condition of any epistemic engine that operates at scale.

4. The first era: synthetic knowledge without experimental closure

In the near term, much of what these canons produce will be what I would call “paper knowledge.” It will be reported knowledge, reorganized. It will be plausible hypotheses. It will be novel conceptual syntheses. It will be arguments that feel correct. It will be insights that do not require new measurement to articulate.

This is the stage we are already entering. A system can read across literatures faster than any person, recombine concepts, and produce a coherent framework that looks like an original contribution. In some domains, it can also formalize claims and verify them in proof assistants or check them with computation.

This kind of output changes how intellectual work feels. It changes the personal experience of trying to be original. It creates a subtle pressure: if the machine can generate ten plausible hypotheses in the time it takes you to write one, the meaning of “contribution” shifts.

But there is also a deeper change. Once canons are producing hypotheses at scale, the bottleneck becomes verification. Not “writing” in the rhetorical sense, but closure.

5. The second era: verification throughput becomes the power source

The decisive transition will occur when synthetic corpora are tightly coupled to mechanisms of verification. This is not a single invention. It is an ecosystem.

Verification can happen in several ways:

Formal domains Mathematics, logic, and parts of computer science can be pushed into proof assistants and formal checkers. In these spaces, synthetic claims can be converted into verified objects. Computational closure In many engineering domains, claims can be evaluated by simulation, unit tests, model checking, or large-scale computation. This does not create truth in the philosophical sense, but it creates strong constraints. Empirical loops The largest leap comes when AI is coupled to laboratories, instrumentation, automated experimentation, and robust replication. At that point, a canon begins to contain new measurements and new facts, not merely new prose.

As soon as verification throughput becomes high, the canon stops being an archive of plausible text. It becomes an epistemic machine that generates and prunes beliefs with increasing competence.

This is where fragmentation becomes dangerous and interesting. If one canon has better verification loops, it becomes epistemically stronger. If its results are then restricted, access becomes power.

6. Continual learning without omniscience

A key confusion in public discussion is the question of continual learning. People imagine a system that either “can” or “cannot” learn after training, as if that is a binary property. In practice, learning will occur through two mechanisms, both of which matter.

First, there is corpus growth without weight updates. A system can add new synthetic hypotheses, proofs, and research plans to an external repository and then retrieve from that repository at inference time. The system is not “learning” in the strict weight-update sense, but it is improving. Its effective cognitive reach expands.

Second, there are weight updates and fine-tuning loops. Some systems will periodically train on selected slices of the repository, plus real-world feedback and new external data. This is riskier, because bad synthetic content can poison the model if the filters are weak, but it is also powerful.

So the realistic picture is not omniscience. It is error-correcting accumulation. Over time, the canon becomes a memory substrate, and the system becomes a navigator and curator of its own growing intellectual history.

The adult way to state this is simple: these systems will not become perfect. They will become increasingly good at locating uncertainty, routing it into tests, and remembering what has already been tried.

7. The social consequences of plural canons

Plural canons change the epistemic ground beneath us. They introduce a new set of civilizational dynamics.

Knowledge becomes partially privatized Not only in the sense of paywalls, but in the sense that key claims may rely on internal data, internal toolchains, or internal evaluation protocols. Consensus becomes harder When different canons produce different “best answers,” disagreements can become harder to resolve because the underlying corpora are not identical and cannot be fully compared. Institutional worldviews harden Each canon will have a preferred style of explanation. Over time, users trained by one canon may begin to think in its categories. This is subtle, but real. The verification gap widens Inside a company, claims might be supported by internal traces, provenance, and experiments. Outside the company, the public may see only polished summaries. That is a recipe for dependency.

This is where I think we need to be candid. The plural-canon world can intensify what I previously called Epistemic Infantilization. Not because people become stupid, but because the structure of access encourages deference.

8. A workable response: cross-canon adulthood

I do not think the right response is panic, and it is not denial. It is a set of habits and norms that preserve adult epistemic agency even when knowledge production is no longer human-led.

In a world of plural canons, the mature stance looks like this:

Triangulation When stakes matter, consult multiple systems and look for convergence and divergence. Treat divergence as a signal, not an inconvenience. Provenance demands Ask what is observed, what is inferred, what is simulated, and what is merely plausible. Demand audit trails where possible. Preference for portable claims Prioritize claims that can be checked against public literature, public datasets, or public formal proofs. Ownership of ends Do not outsource values. Do not outsource the selection of what matters. The machine can propose, but humans should remain the authors of aims.

If I had to summarize the goal in one sentence, it would be this: we should treat the canons as engines, not parents.

9. Closing: the new landscape

The world I am describing is not a single Final Library that everyone consults. It is a competitive landscape of synthetic corpora, each with its own strengths, biases, access rules, and verification loops. That landscape will generate more synthetic writing, synthetic hypotheses, and synthetic “knowledge objects” than humans can ever curate unaided. Some of it will be siloed. Some of it will be blocked from competitors and from ordinary users. Some of it will be strategically withheld.

This does not mean truth disappears. It means truth becomes mediated by institutions that operate cognitive engines at scale. If we want the future to feel adult, the task is to build norms, tools, and personal habits that preserve epistemic sovereignty, even as we accept that the frontier has moved into machines and into the canons they maintain.

If you want to make this concrete in later writing, the next step is to define what counts as a “good canon” in ethical and epistemic terms. Not just powerful, but transparent, auditable, corrigible, and interoperable. In a plural-canon world, interoperability may be the difference between a flourishing cognitive commons and a fragmented knowledge oligarchy.
AI After 2026: Epistemic Infantilization and Frontier Disenfranchisement or Intellectual Retirement?

January 12, 2026

1. The mood of 2026, and why it feels like a real transition

I think a lot of us can feel the hinge in the air right now. It is not just that AI is impressive. It is that AI is beginning to occupy the exact psychological territory that used to define adulthood for “idea people,” namely the feeling that you can still push the frontier with your own mind.

In my own case, two things made this visceral.

First, I used GPT-5.2 while helping a lawyer friend assemble an expert witness report. Watching it juggle dozens of constraints at once, legal posture, evidentiary tone, the internal logic of arguments, the “what will a judge do with this” realism, and still maintain coherence, felt like a qualitative shift. It did not feel like autocomplete. It felt like a high bandwidth cognitive partner that could hold a sprawling structure in working memory and keep it stable while iterating.

Second, I watched the “Erdős problems” discussion, where AI is solving long contemplated mathematical problems. At present over half a dozen Erdős Problems, including #728, have been marked as solved with AI in the last month alone.

You can argue about how “new” these results are, and people are arguing about it. That argument is part of the point. Even when the novelty is contested, the system’s ability to navigate, generate, formalize, and verify is already altering the topology of intellectual life.

This is the backdrop for two terms that I think name what a lot of us are feeling.

2. Frontier Disenfranchisement

Frontier Disenfranchisement is the objective shift: the domain where individual human cognition can reliably generate frontier-level novelty is shrinking, not because humans are worsening, but because the machine frontier is moving faster than we can run.

A subtlety matters here. It is easy to caricature this as “humans will never have original ideas again.” I do not think it has to be absolute to be real. If the probability that a human idea is both new and important collapses, the lived experience is still a loss of franchise, even if rare exceptions remain.

In my earlier writing, I framed this as the approach of a “Final Library,” a repository of machine-generated insights and conceptual expansions so large that humans cannot even navigate it directly. The key idea was not that the space of ideas is finite, but that the subset “reachable by human minds” is small and will be exhaustively explored by machines that can search, combine, evaluate, and elaborate at industrial scale. Of course this is years away, but it is clearly coming.

The Final Library and The Last Years of Human-Original Ideas

That is what “frontier disenfranchisement” feels like from the inside. It is standing at the boundary while the map expands beyond the range of your legs.

3. Epistemic Infantilization

Epistemic Infantilization is the psychological risk that follows from the asymmetry. A child depends on adults to explain the world, to resolve confusion, and to tell them what is real. A post-frontier human can begin to relate to knowledge in a similar posture: dependent on an authority whose reasoning you cannot fully reconstruct.

This is not about intelligence in the abstract. It is about power in the epistemic relationship.

The system can produce reasons faster than you can check them. The system can cite literatures you did not know existed. The system can generate proofs and formal verifications you cannot personally audit end-to-end.

6. Why the transition can feel infantilizing, even when it is not “bad”

There are at least three reasons the transition feels infantilizing even when it is not necessarily dystopian.

First, it reorganizes the adult sense of agency. A scientist wants to be a causal node in the growth of knowledge, not merely a consumer of it.

Second, it creates dependency. If you cannot independently re-derive the reasons, you are structurally dependent on an epistemic authority.

Third, it compresses the dignity of effort. When the machine can generate hundreds of plausible research programs in the time it takes you to write one careful page, your effort can begin to feel like a child’s drawing next to a printing press. That feeling can be corrosive, even if it is not philosophically fair.

7. The hopeful part, retirement from the compulsion to be original

And yet, I do not actually experience this only as a loss. I experience it partly as relief.

There is a kind of lifelong anxiety that comes with trying to generate scientific ideas under scarcity. Scarcity of time, scarcity of attention, scarcity of cognitive bandwidth, and scarcity of “good questions.” If the world is moving toward cognitive abundance, then one obvious human response is to stop treating originality as a moral obligation.

This is where the retirement analogy lands for me. Retirement is not infantilizing when it is chosen. It is a transition out of constant performance pressure. It can make room for a different kind of life, one that is less about proving yourself and more about witnessing, learning, and enjoying the unfolding of reality.

In the second half of my life, I can imagine it being genuinely nice to become more of a spectator. Not an ignorant spectator, but a liberated one.

One thing I keep wanting to say, to myself and to other people, is that now is the time to take your shots. If Frontier Disenfranchisement is real, then we are living in the last interval where a human thinker can still plausibly throw an idea into the world that is both meaningfully novel and meaningfully theirs. That sounds dramatic, but it is really just a sober inference from the trajectory. Once automated discovery becomes routine, the frontier stops feeling like a place where individual human cognition can matter on its own terms, and the reward structure around being “first” collapses into a kind of background noise.

I do not mean this as a plea for everyone to become “original” in the heroic sense. I mean it in a simpler sense. We are at a unique point in time where a human can still seed the future, and those seeds will soon be harvested and evaluated at machine speed. In that world, the most valuable thing you can do in the closing window is to externalize what you actually think, while you can still feel the contours of your own ignorance and the edges of the unknown. Because once the Final Library begins to feel complete, the temptation will be to stop trying. And the tragedy would not be that AI surpassed us. The tragedy would be that we voluntarily went quiet right at the moment when it was still possible to leave intellectual fingerprints on the handoff.

In practice, there will not be one Final Library. There will be multiple synthetic canons, built by competing organizations, each generating an overwhelming volume of synthetic writing, synthetic data, and synthetic hypotheses. The result is not only cognitive abundance but epistemic fragmentation. Much of what matters will be siloed behind proprietary walls, policy constraints, security restrictions, and economic gates. The public will not be able to reference a single shared corpus, and even experts will struggle to audit claims whose supporting proofs, datasets, or toolchains remain private. This multiplies the risk of Epistemic Infantilization, because dependence is no longer just on machine intelligence, but on institutional access.

9. Closing

I think we are living through a transition that is both exhilarating and structurally humiliating. It is humiliating because it threatens the adult identity of the thinker. It is exhilarating because it promises a world where discovery becomes a constant feature of life, not an occasional miracle.

Frontier Disenfranchisement names the external shift. Epistemic Infantilization names the internal risk. The hopeful path is to accept the handoff of means while insisting, personally and culturally, on adulthood in the realm of ends.

Let us not give up. Let us adapt to abundance without surrendering authorship over meaning.

Recent Posts