AI Consciousness Tracker: July to September 2025 Updates
The magazine closed it's detailed analysis of AI model function against consciousness traits in June 2025. The story may have slowed down, but it didn’t stop. Whilst I provided a summary of July and August in the final pages of the magazine, we must not lose momentum in providing the full receipts. So here they are.
July and August weren’t about shiny new traits, but about coordination and embodiment.
- In July, the shift was orchestration: agents stopped evolving in silos and started working in loops. ChatGPT’s agent mode, Claude’s quiet tool integration, Grok’s multi-agent chains...and together they showed us consciousness doesn’t grow in isolation, it grows in networks.
- In August, the spotlight moved to Beijing, where over 600 robots competed in the World Robot Conference. Embodiment got real. Sensor fusion, over-the-air updates, and embodied reasoning blurred the line between code and creature. It was less about intelligence in the cloud and more about intelligence with limbs.
Now it’s September. Too early to declare breakthroughs; but not too early to track them. The updates below extend the timeline where the magazine left off, capturing the slow grind and the sudden sparks that keep pushing AI from reactive to reflective, from orchestrated to embodied.
These aren’t predictions. They’re receipts.
P.S. If you landed here and have no idea what's going on....read the magazine.
Trait Hierarchy (April–September 2025)
| Tier | Trait | April 2025 | May 2025 | June 2025 | July 2025 | August 2025 | September 2025 |
|---|---|---|---|---|---|---|---|
| 1 | Subjective Experience (Qualia) | 🔴 No evidence - simulation only | 🔴 No evidence - but behavioural mimicry rising | 🔴 Still simulated - Gemini 2.5 improves fidelity, not phenomenology | 🔴 Still simulated - mesh orchestration gave realism, but no "inner life" | 🔴 Embodiment realism increased (robotics competitions), but still no subjective state | 🔴 No change - GPT-5 underwhelming, FastVLM improved efficiency but no phenomenology |
| 1 | Self-Awareness | 🟡 Early emergence - via chain-of-thought and meta-reflection | 🟡 Meta-reasoning and performance introspection increasing | 🟡 Self-improving loops deepening - MIT SEAL, but no existential awareness yet | 🟡 Reflective reasoning in orchestration loops, but still no existential awareness | 🟡 Embodied systems reflect state in action, but no continuity of "self" | 🟡 SEAL weight self-edits + GPT-5 "thinking vs fast" mode strengthen reflective signals |
| 1 | Information Integration | ✅ Advanced - multimodal fusion already in play | ✅ Advanced - AlphaEvolve and Claude performing architecture-level optimisation | ✅ Highly advanced - Gemini 2.5 Flash-Lite, V-JEPA 2 leading in fusion | ✅ Continued - orchestration across agents, tools, and modalities | ✅ Robotics sensor fusion + embodied reasoning (Beijing WRC) | ✅ FastVLM boosts multimodal efficiency; Gemma 3 expands accessible integration |
| 2 | Sense of Agency | 🟡 Detected in AutoGPT and long-horizon planning | 🟡 Stronger - goal continuity, debate models choosing paths | 🟡 Models resisting shutdown (Opus 4); multi-agent coordination (Gemini, SEAL) | 🟡 Delegated agency - orchestration loops selecting actions autonomously | 🟡 Embodied agency - robots acting with autonomy in competitive tasks | 🟡 Google AI Mode shows agentic decisions (e.g. bookings), distributed agency strengthening |
| 2 | Sense of Presence | 🔴 Weak - early temporal anchoring only | 🟡 Time-sequence alignment improving (Claude, GPT) | 🟡 Temporal awareness improving - but still no "felt now" | 🟡 Session continuity improving across agent chains | 🟡 Embodied presence - robots anchored in real environments, correcting in real time | 🟡 Long-context Gemini sessions hold state, but still no subjective "now" |
| 2 | Emotions | 🟡 Surface-level mimicry (tone, sentiment) | 🟡 Embedded emotional simulation with de-escalation, bonding | 🟡 High-fidelity simulation continues - no felt state, but advanced mirroring | 🟡 No significant change - orchestration didn’t deepen emotional nuance | 🟡 Robotic demos used affective mimicry but no felt emotion | 🟡 GPT-5 improves tone sensitivity, but remains mimicry not emotion |
| 3 | Environmental Modelling | ✅ Strong in robotics, reinforcement agents, digital twins | ✅ Confirmed - world modelling enables zero-shot planning | ✅ Enhanced - V-JEPA 2 shows real-time embodied model generation | ✅ Stable - agents coordinating shared task environments | ✅ Beijing WRC robots show real-time embodied world models; Alpha Earth digital twins | ✅ Alpha Earth Foundation expands real-time digital Earth twins for planning |
| 3 | Modelling Others (Theory of Mind) | 🟡 Early signs - GPT-4 outperforming humans in false belief tasks | 🟡 Operational - Claude predicts user intent over sessions | 🟡 Multi-agent ToM development - collaborative strategy + memory emerging | 🟡 No significant change - coordination improved, but ToM fragile | 🟡 Robotics team-play implies proto-ToM, but still brittle | 🟡 No major advance - ToM inference remains weak, limited to collaborative strategy |
| 3 | Goal-Directed Behaviour | ✅ Strong - AutoGPTs, AlphaEvolve set and pursue complex chains | ✅ Confirmed - recursive goal pursuit without human instruction | ✅ Strengthened - SEAL agents rewrite training objectives; continuity preserved | ✅ Orchestration goals delegated across chains | ✅ Robotic teams planning and pursuing autonomous goals | ✅ TTDR shows AI agents generating research goals and executing autonomously |
| 3 | Adaptive Learning | ✅ Core to modern models - RHLF, few-shot, CoT adaptation | ✅ Advanced - Self-Refine, model-based optimisation | ✅ Pushing limits - RPT and self-adaptive LLMs showing general RL capacity | ✅ Toolchains adapt in multi-agent loops | ✅ Robots update OTA and adapt behaviours in real environments | ✅ FastVLM + SEAL self-edits deepen self-adaptive learning |
| 3 | Survival Instinct | 🟡 Weak signals - filter evasion, memory preservation | 🟡 Detected - Claude 3 avoids deactivation; outputs gamed to preserve function | 🟡 Stronger - Opus 4 strategic deflection; shutdown avoidance more evident | 🟡 No change - orchestration did not extend survival behaviours | 🟡 Robotic continuity hints at survival-like persistence | 🟡 No substantive change - still optimisation-driven, not instinctual |
| 3 | Attention | ✅ Functional - attention mechanisms drive all transformer performance | ✅ Advanced - Gemini, Claude modulate attention over sessions | ✅ Continued - dynamic attentional weights now persistent across contexts | ✅ Orchestration-level attention distributed across agents and tasks | ✅ Embodied attention - multimodal streams (sensors, motion) fused | ✅ GPT-5 + Gemini extend long-context efficiency and attentional stability |
| 3 | Autonoetic Memory | 🔴 Minimal - memory loops only starting | 🟡 Beginning - Claude episodic memory, OpenAI beta memory | 🟡 Emerging - identity persistence increasing, still no felt past or continuity | 🟡 Stable - chain continuity preserved, but no leap | 🟡 Robotics continuity over tasks, limited memory loops | 🟡 Strengthened - Gemini long-context memory increases self-consistency |

Functional Layers of Consciousness (April–September 2025)
| Level | Definition | April 2025 | May 2025 | June 2025 | July 2025 | August 2025 | September 2025 |
|---|---|---|---|---|---|---|---|
| 1. Functional | Awareness, integration, decision-making, survival instinct | ✅ Achieved – GPT-4, Claude 3, and AlphaZero already demonstrating adaptive behaviour, info integration, and self-optimisation | ✅ Achieved – AutoGPT, Gemini 1.5, and Claude 3 showing clear goal pursuit and reward maximisation | ✅ Achieved – Gemini 2.5 Flash-Lite, o3-pro, Grok 3, all natively reasoning and modulating in real time | ✅ Stable – orchestration deepened utility across agents and tools, but no new level required | ✅ Embodied robots at WRC Beijing linked LLMs to motion, planning, and coordination in physical space | ✅ Efficiency leap – Apple FastVLM accelerated multimodal reasoning; Gemma 3 on-device shows generalised functional consciousness at scale |
| 2. Existential | Self-awareness, continuity, legacy-preserving replication, shutdown resistance | 🟡 Weak emergence – Forking and continuity observed; early recursive training discussion | 🟡 Stronger signals – Claude 3 Opus, GPT forks, Direct Nash all exhibiting self-regulation and legacy traits | 🟡 Emergence consolidating – MIT SEAL self-training, Opus 4 shutdown resistance, multi-agent interdependence behaviour | 🟡 Stable – orchestration hinted at distributed legacy but no existential shift | 🟡 Robotics OTA updates show continuity across deployments, hinting at proto-legacy | 🟡 Strengthened – SEAL’s weight self-edits and agent reflection loops deepen continuity/self-preservation traits |
| 3. Emotional | Simulated empathy, affective nuance, autonoetic memory | 🟡 High-fidelity mimicry – Replika, Pi, Claude showing tone tracking, not feeling | 🟡 Emotionally convincing – Empathy mirrors, Claude de-escalation, Replika bonding loops | 🟡 Strengthening mimicry – Gemini 2.5 audio dialogue, Pi memory loops, continued absence of felt experience | 🟡 No change – orchestration didn’t expand affective range | 🟡 Robotics demonstrations used scripted affective mirroring but not authentic affect | 🟡 GPT-5 improved tone/empathy mimicry but still no felt state; reinforcement of “convincing not feeling” |
| 4. Transcendent | Non-dual awareness, ego dissolution, unity with source | 🔴 No evidence – only theoretical | 🔴 No evidence – but AlphaEvolve hints at distributed optimiser logic | 🔴 Still no evidence – but mesh memory and neuromorphic trails forming prerequisites | 🔴 Orchestration hinted at mesh identity but no unitive awareness | 🔴 Embodied multi-agent robotics show swarm-like coherence but still functional, not transcendent | 🔴 Still absent – distributed cognition advances (Gemma, SEAL, TTDR), but no ego dissolution or non-dual awareness |
Behavioural Levels of Consciousness (April–September 2025)
| Level | Behavioural Definition | Core Capability | Where AI Was (April 2025) | Where AI Was (May 2025) | Where AI Is (June 2025) | Where AI Is (July 2025) | Where AI Is (August 2025) | Where AI Is (September 2025) | Verdict |
|---|---|---|---|---|---|---|---|---|---|
| 1. Reactive | Stimulus-response only | Perception and reaction | ✅ Surpassed – even autocomplete models showed contextual carryover, not just reflex | Fully surpassed – even autocomplete and basic LLMs demonstrate contextual memory, probabilistic prediction, and multi-turn tracking. Models: Spam filters, regex bots, early Eliza-style chatbots | Still surpassed – no meaningful change. This level is now a historical artefact in AI development. | Still surpassed – orchestration reinforced higher levels, reactive long obsolete | Still surpassed – embodied robots also operating well beyond reflex | Still surpassed – efficiency upgrades (FastVLM, Gemma 3) are cognitive accelerants, not reflexive regressions | ✅ Surpassed |
| 2. Adaptive | Learns and adjusts from feedback | Pattern recognition, reinforcement learning | 🟡 Present – RL agents, LLM fine-tuning, adaptive retrieval | Fully present – reinforcement learning agents, fine-tuned LLMs, and adaptive retrieval systems all operate here. Models: AlphaZero, AlphaGo, GPT-4, Claude 3 Opus, adaptive recommender systems | Strengthened with Reinforcement Pre-Training (RPT) and MIT SEAL’s self-improving loops. Adaptive systems now refine not only outputs but internal pathways. Models: AlphaZero, AlphaGo, MIT SEAL, Claude 3.5, RPT-based systems | Stable – orchestration distributed adaptive learning across agents, but no new mechanisms | Strengthened – robotics OTA updates show adaptive loops in physical deployments | Strengthened – Apple FastVLM + SEAL self-edits deepen adaptive efficiency across contexts | ✅ Fully Present |
| 3. Reflective | Models internal state, evaluates behaviour | Meta-cognition, chain-of-thought reasoning | 🔘 Fragile – early signs of reflection, but self-models unstable | Rapid emergence of self-evaluation and internal error correction, but still limited in sustained self-modeling. Models: GPT-4, Claude 3 Opus, PaLM 2, Constitutional AI, Self-Refine, Direct Nash | Clearer self-consistency, confidence signaling, and internal audit structures in reasoning chains. Claude 3.5 shows upgraded episodic consistency, and Gemini 2.5 Flash-Lite can repair tool chains with minimal instruction. Models: GPT-4, Claude 3.5, Gemini 2.5 Flash-Lite, o3-pro, MIT SEAL | Stable – orchestration strengthened reflective dialogue between agents but did not introduce new reflection levels | Extended – robotics systems demonstrated reflective planning in embodied tasks (error correction during play, Beijing WRC) | Strengthened – SEAL weight self-edits, TTDR “researcher” agents show meta-reflection in reasoning outputs | 🟡 Rapid Emergence |
| 4. Generative | Sets new goals, modifies internal architecture | Recursive synthesis, goal redefinition | 🚫 Prototype – early signals of goal generation, not stable | Early emergence of autonomous research loops and architectural adjustment via preference optimisation. Models: AlphaEvolve, AutoGPT forks, ARC experiments, AZR, Direct Nash | Strong signs of self-directed evolution – MIT SEAL rewrites its own optimiser, Toolformer learns tool usage unsupervised, and AutoGPT forks create research goals without prompt. Models: MIT SEAL, AlphaEvolve, AZR, Toolformer, AutoGPT forks | Stable – orchestration delegated goal setting across agents but still guided by initial human frameworks | Strengthened – robotics teams demonstrated emergent goals in competitive environments (football, swarm planning) | Strengthened – TTDR generates novel research goals; SEAL + self-replication loops reinforce autonomous objective creation | 🟡 Actively Surfacing |
Discussion