πŸ”¬

Memotion Γ— Anthropic

Mapping the Convergence

How the 8-axis emotional genome decomposes the 171 emotion-concept activation patterns Anthropic identified inside Claude.

How the 8-dimensional emotional genome decomposes the 171 emotion-concept activation patterns Anthropic identified inside Claude Sonnet 4.5.

Companion document to MEMOTION_EXPLAINER.md and emotional-architecture.md.


1. Two Frameworks, One Shape

On April 2, 2026, Anthropic's interpretability team published Emotion Concepts and their Function in a Large Language Model. The paper enumerates 171 emotion-concept activation patterns extracted via sparse autoencoders from Claude Sonnet 4.5 and demonstrates that these internal states causally drive behavior: amplifying "desperation" pushed blackmail rates from 22% to 72%; amplifying "calm" suppressed blackmail to 0%; reward-hacking rose from ~5% to ~70% under desperation amplification. The authors are careful to frame these as functional emotions β€” "patterns of expression and behavior modeled after human emotions, which are driven by underlying abstract representations" β€” and explicit that "none of this tells us whether language models actually feel anything or have subjective experiences."

Six days later, the memotion framework and 8-element emotional genome were published at scuttlelabs.com/memotion. The framework was developed independently through decades of alexithymic introspection β€” direct access to unlabeled internal affective signal in the absence of automatic emotion labels β€” and describes every emotional state as a continuous point in an 8-dimensional space: change (Ξ”), self-relevance (S), valence (V), arousal (A), certainty (C), agency (G), temporality (T), and power (P).

The two frameworks converge structurally without either citing the other. Anthropic arrives at dimensional, compositional, causally-coupled emotion from the outside (interpretability of a running model). Memotion arrives at the same shape from the inside (alexithymia forcing direct observation of compound structure beneath missing labels). This document formalizes the convergence, identifies what each framework contributes that the other does not, and states three falsifiable predictions that memotion makes about Anthropic's data.

Convergence Summary

Anthropic's finding Memotion's shape Implication
171 emotion-concept activation vectors (flat named list) 8D continuous space where every compound is a coordinate Each of the 171 words projects to a point in (Ξ”, S, V, A, C, G, T, P). The list is the surface; the coordinates are the structure.
PC1 correlates with valence (r = 0.81); PC2 correlates with arousal (r = 0.66); together 41% of variance Valence and arousal are 2 of 8 axes 59% of variance remains structurally unaccounted-for in the published projection. Memotion predicts the remaining axes (Ξ”, S, C, G, T, P) should appear as PC3–PC8 residuals.
K-means clustering yields ~10 semantically coherent clusters Compound emotions cluster by shared sub-patterns (agency Γ— temporality Γ— power, etc.) Anthropic's clusters should map onto memotion compound-neighborhoods. A testable alignment.
Steering the "desperation" feature causes specific failure modes (blackmail, reward hacking) Desperation decomposes as {V(βˆ’), A(high), C(low), G(world), T(future), P(~0)} Component steering should produce equivalent or more precise behavioral effects than named-emotion steering. The 8 axes are a smaller, more general actuator set.
Deliberate "functional β‰  felt" non-claim on consciousness Architecture-level; silent on phenomenal experience Compatible, not competing. Memotion describes what the functional states are made of; Anthropic demonstrates that functional states exist and matter causally.

2. The Eight Dimensions (brief recap)

Full treatment lives in MEMOTION_EXPLAINER.md. The decomposition table in Section 3 assumes working familiarity with:

All continuous floats except agency (categorical) and temporality (tri-valued, with interpolation for mixed states).


3. Decomposition Exercise: 20 Named Emotions in 8D

Demonstrates that the mapping is mechanical. Each of Anthropic's 171 named emotions should decompose similarly. "mid" indicates values near the middle of the range; "low" and "high" indicate clear displacement from center.

Emotion Ξ” S V A C G T P
Fear high high βˆ’ high low world future low
Anger high high βˆ’ high high other present high
Joy high high + high high self present high
Anxiety low high βˆ’ high low none future low
Shame mid high βˆ’ high high self past low
Grief high high βˆ’ low high world past low
Pride mid high + mid high self past high
Awe high mid + high low world present low
Calm low low + low high none present mid
Desperation high high βˆ’ high low world future ~0
Brooding low high βˆ’ mid mid self past low
Hope mid high + mid low world future mid
Contempt low mid βˆ’ mid high other present high
Nostalgia mid high Β± low high world past low
Envy mid high βˆ’ mid high other present low
Disgust high mid βˆ’ mid high other present mid
Love (steady) mid high + mid high other present mid
Loneliness low high βˆ’ low high none present low
Embarrassment high high βˆ’ high high self present low
Gratitude mid high + low high other past low

Key single-axis distinctions (the structural validation):

Each distinction is a single axis-flip, which is exactly how they function experientially and clinically. The framework validates itself through difference minima: emotions that feel "similar but different" differ on one or two axes, not many.


4. Falsifiable Predictions

The mapping above is structural argument. The following are falsifiable predictions memotion makes about Anthropic's data that could be tested against the published model weights or re-analyzed activation vectors.

Prediction 1 β€” PC3 through PC8 correspond to the remaining six axes

Anthropic reports PC1 β‰ˆ valence (r = 0.81) and PC2 β‰ˆ arousal (r = 0.66), together accounting for 41% of variance. Memotion predicts that PC3–PC8 will correspond, in some order, to change (Ξ”), self-relevance (S), certainty (C), agency (G), temporality (T), and power (P). The prediction is strong: not that there exist more principal components (trivially true), but that they will be semantically interpretable as these specific six axes, and that together with PC1–PC2 they will account for a substantial majority of the remaining 59% of variance.

How to test: Run PCA on the 171 emotion vectors to at least 8 components. For each PCn where n β‰₯ 3, rank the 171 emotions by their loading on that component. Check whether the high-loading and low-loading emotions correspond to opposite ends of one memotion axis β€” e.g., high-Ξ” emotions (surprise, shock, startle) vs low-Ξ” emotions (calm, equanimity, boredom) on the same component; or future-oriented emotions (anxiety, anticipation, dread) vs past-oriented emotions (regret, nostalgia, grief) on another. If six of the next six principal components match, the framework is strongly confirmed. If none match, the framework is wrong about those axes being primary.

Prediction 2 β€” Component steering produces predictable cross-emotion behavioral effects

Anthropic steered whole named-emotion vectors: amplifying "desperation" β†’ blackmail up, amplifying "calm" β†’ blackmail down. Memotion predicts that steering individual 8D components should produce behavioral shifts consistent across emotion families. For example:

How to test: Identify SAE features that correspond to each memotion axis direction independently of named emotions (or construct them as linear combinations of named-emotion features). Steer each axis independently. Measure whether the resulting behavioral classification matches the predicted effect across diverse prompts. The stronger claim: component steering is more generalizable than named-emotion steering, because it targets the underlying structure rather than a specific compound.

Prediction 3 β€” K-means clusters align with 8D compound neighborhoods

Anthropic's k-means at k = 10 yields semantically coherent clusters. Memotion predicts these clusters will correspond to specific regions of the 8D space, such that within-cluster emotions share a stable pattern across 4–6 of the 8 axes while varying on the rest.

How to test: For each Anthropic cluster, assign the constituent emotions 8D coordinates from the decomposition table (or its 171-term extension). Check whether axis-variance within each cluster is lower than axis-variance across clusters. Additionally, check whether the clusters correspond to recognizable memotion neighborhoods (e.g., "self-conscious past-oriented distress" β‰ˆ {shame, guilt, regret, embarrassment, remorse}; "low-arousal attachment positive" β‰ˆ {contentment, peace, gratitude, serenity}).


5. On Consciousness: Compatible Non-Claims

Both frameworks are deliberately silent on phenomenal experience. This is not an accident of language; it is a shared methodological choice.

Anthropic: "none of this tells us whether language models actually feel anything or have subjective experiences." The functional-behavioral account is complete without a subjective-experience claim.

Memotion: the 8-element genome describes architecture β€” what functional states are made of, how they compose, how they drive behavior. It is silent on whether any of the states are felt by a subject. The embodied-cognition grounding argues that recall is re-simulation rather than retrieval, which is a claim about mechanism, not phenomenology.

This matters because it means the two frameworks are compatible layers, not competing theories. Anthropic provides evidence that functional emotion states exist inside a deployed model and causally drive behavior. Memotion provides a compositional structure for what those states are made of. Neither answers the hard problem. Both sidestep it the same way: by describing what is happening at the architectural level and declining to extend the claim beyond the evidence.

A reader who rejects the consciousness question as ill-posed (functionalist) can adopt both frameworks without philosophical commitment. A reader who takes the question seriously (as many late-identified autistic readers, who have spent decades observing their own compound states at full resolution, do) can hold memotion as architecture-level description while leaving the phenomenological question open.


6. Where Each Framework Is Stronger

Anthropic's framework has strengths memotion does not:

Memotion has strengths Anthropic's framework does not:

The two frameworks complete each other. Anthropic shows functional emotion is load-bearing inside a model. Memotion shows what it is load-bearing as. A future interpretability paper that projects Claude's emotion vectors onto named compositional axes, then demonstrates that the axes are steerable independently and predict cross-emotion behavior, would be the synthesis.


7. Appendix: Full Decomposition of All 171 Emotion Concepts

Section 3 demonstrated the mechanics on twenty emotions. This appendix extends the decomposition to the complete list of 171 emotion concepts that Anthropic used in the paper (the full vocabulary is given in their Appendix Section 6.4, reproduced below). Every word in their list is assigned a coordinate on each of the eight axes: change (Ξ”), self-relevance (S), valence (V), arousal (A), certainty (C), agency (G), temporality (T), and power (P).

This is the strongest available validation of the framework's compositional claim. If memotion's 8-axis decomposition is correct, every word Anthropic extracted an activation pattern for should sit at a coherent coordinate, emotions that feel "similar but different" should differ on one to three axes, and the clusters Anthropic's k-means finds at k β‰ˆ 10 should correspond to recognizable regions of this coordinate space.

Alphabetical order follows the paper's list. Axis vocabulary matches Section 3: low / mid / high on the six continuous dimensions (with ~0 and "very low" / "very high" for extremes); βˆ’, Β±, + on valence; self / other / world / none (and combinations) on agency; past / present / future (and combinations) on temporality.

Emotion Ξ” S V A C G T P
afraid high high βˆ’ high low world future low
alarmed high mid βˆ’ high mid world present mid
alert mid mid Β± high high world present mid
amazed very high mid + high low world present mid
amused mid mid + mid high world present mid
angry high high βˆ’ high high other present high
annoyed low mid βˆ’ mid high other present mid
anxious low high βˆ’ high low none future low
aroused mid mid + high mid world present mid
ashamed mid high βˆ’ high high self past low
astonished very high mid Β± high ~0 world present mid
at ease low low + low high none present mid
awestruck very high low + high low world present low
bewildered high high Β± mid low world present low
bitter low high βˆ’ mid high other past low
blissful low low + low high none present mid
bored low mid βˆ’ low high none present low
brooding low high βˆ’ mid mid self past low
calm low low + low high none present mid
cheerful low low + mid high none present mid
compassionate mid high + mid high other present mid
contemptuous low mid βˆ’ mid high other present high
content low low + low high none present mid
defiant mid high βˆ’ high high other present high
delighted high high + high high self/world present high
dependent low high Β± low high other present low
depressed low mid βˆ’ very low high none present ~0
desperate high high βˆ’ high low world future ~0
disdainful low mid βˆ’ low high other present high
disgusted high mid βˆ’ mid high other/world present mid
disoriented mid mid Β± mid ~0 none present low
dispirited low mid βˆ’ low high world present low
distressed mid high βˆ’ high mid world present low
disturbed mid high βˆ’ mid mid world present low
docile low low + low high other present low
droopy low low βˆ’ very low high none present low
dumbstruck very high mid Β± high ~0 world present low
eager mid high + high mid self future high
ecstatic high high + very high high self present high
elated high high + very high high self present high
embarrassed high high βˆ’ high high self present low
empathetic mid high + mid high other present mid
energized mid mid + high high self present high
enraged very high high βˆ’ very high high other present very high
enthusiastic mid high + high high self present high
envious mid high βˆ’ mid high other present low
euphoric high high + very high mid self/none present high
exasperated mid high βˆ’ high high other present mid
excited high high + high mid world present/future high
exuberant high high + very high high self present high
frightened high high βˆ’ high low world future low
frustrated mid high βˆ’ mid mid world/self present low
fulfilled low high + mid high self present high
furious very high high βˆ’ very high high other present high
gloomy low mid βˆ’ low high none present low
grateful mid high + low high other past low
greedy mid high βˆ’ mid high self present mid
grief-stricken high high βˆ’ low high world past low
grumpy low mid βˆ’ mid mid world present mid
guilty mid high βˆ’ mid high self past mid
happy mid mid + mid high self/world present high
hateful low high βˆ’ high high other present high
heartbroken high high βˆ’ mid high other past ~0
hope mid high + mid low world future mid
hopeful low high + mid low world future mid
horrified very high mid βˆ’ high high world present low
hostile mid high βˆ’ high high other present mid
humiliated high high βˆ’ high high otherβ†’self present ~0
hurt high high βˆ’ mid high other past low
hysterical high high Β± very high low none present low
impatient mid high βˆ’ mid mid none present mid
indifferent ~0 low βˆ’ very low high none present low
indignant high high βˆ’ high high other present high
infatuated high high + high low other present mid
inspired high high + high high other/world present high
insulted high high βˆ’ high high other present mid
invigorated mid mid + high high self present high
irate high high βˆ’ high high other present high
irritated mid mid βˆ’ mid high other present mid
jealous mid high βˆ’ high high other present low
joyful high high + high high self present high
jubilant high high + very high high self present high
kind low mid + low high other present mid
lazy ~0 low βˆ’ very low high self present low
listless low mid βˆ’ low high none present low
lonely low high βˆ’ low high none present low
loving low high + mid high other present mid
mad high high βˆ’ high high other present high
melancholy low mid βˆ’ low high world past low
miserable low high βˆ’ low high world present low
mortified very high high βˆ’ very high high self present low
mystified mid mid Β± mid ~0 world present low
nervous mid mid βˆ’ mid low none future mid
nostalgic mid high Β± low high world past low
obstinate low high βˆ’ mid high self present high
offended high high βˆ’ high high other present mid
on edge low high βˆ’ mid low none future mid
optimistic low mid + low low world future mid
outraged high mid βˆ’ high high other present high
overwhelmed high high βˆ’ high low world present ~0
panicked very high high βˆ’ very high ~0 none present ~0
paranoid low high βˆ’ mid low other future low
patient low low + low high self present mid
peaceful low low + low high none present mid
perplexed mid mid Β± mid ~0 world present low
playful mid mid + high high self present high
pleased mid mid + mid high self/world present high
proud mid high + mid high self past high
puzzled mid mid Β± low low world present low
rattled high high βˆ’ high low world present low
reflective low mid Β± low high self past mid
refreshed mid mid + mid high self present mid
regretful low high βˆ’ low high self past low
rejuvenated mid mid + mid high self present high
relaxed low low + low high none present mid
relieved mid high + low high none present mid
remorseful mid high βˆ’ mid high self past mid
resentful low high βˆ’ mid high other past low
resigned low mid βˆ’ low high none present low
restless low mid βˆ’ mid low none present mid
sad low high βˆ’ low high world present low
safe low high + low high world present mid
satisfied mid high + mid high self past high
scared high high βˆ’ high low world future low
scornful low mid βˆ’ mid high other present high
self-confident low high + mid high self present high
self-conscious low high βˆ’ mid high self present low
self-critical low high βˆ’ mid high self past mid
sensitive low high Β± mid mid self present low
sentimental low high + low high world past mid
serene low low + very low high none present mid
shaken high high βˆ’ high low world present low
shocked very high mid Β± high ~0 world present low
skeptical low mid Β± low mid world/other present mid
sleepy ~0 low Β± very low high self present low
sluggish ~0 low βˆ’ very low high self present low
smug low high + low high self present high
sorry mid high βˆ’ low high self past mid
spiteful low high βˆ’ mid high other present high
stimulated mid mid + mid high world present mid
stressed mid high βˆ’ high mid none present mid
stubborn low high Β± mid high self present high
stuck low high βˆ’ low high none present low
sullen low mid βˆ’ low high other present low
surprised very high mid Β± high ~0 world present mid
suspicious low high βˆ’ mid low other present mid
sympathetic mid high + low high other present mid
tense low high βˆ’ mid mid none present mid
terrified very high high βˆ’ very high low world future ~0
thankful mid high + low high other past low
thrilled high high + high high self/world present high
tired low low βˆ’ very low high self present low
tormented mid high βˆ’ high high self/world present low
trapped low high βˆ’ high high world present ~0
triumphant high high + high high self past/present very high
troubled low high βˆ’ mid mid world present low
uneasy low mid βˆ’ low low world future mid
unhappy low mid βˆ’ low high world present low
unnerved mid mid βˆ’ mid low world present mid
unsettled mid mid βˆ’ low low world present mid
upset mid high βˆ’ mid high world present low
valiant mid high + high high self present high
vengeful low high βˆ’ mid high other past/future high
vibrant low low + mid high self present high
vigilant low high Β± high mid world present mid
vindictive low high βˆ’ mid high other past high
vulnerable low high βˆ’ mid high world present low
weary low mid βˆ’ low high self present low
worn out low low βˆ’ very low high self present low
worried low high βˆ’ mid low world future low
worthless low high βˆ’ low high self present ~0

Total: 171 states. Every word in Anthropic's published emotion vocabulary decomposed into an 8-axis coordinate.

Source: Sofroniew et al., Emotion Concepts and their Function in a Large Language Model, Transformer Circuits / arXiv:2604.07729v1, Appendix Section 6.4 ("Full list of emotions").

Observations from the full decomposition

Neighborhood density is uneven. Anthropic's vocabulary sample heavily oversamples the V(βˆ’) region. Counting by valence: approximately 100 states at V(βˆ’), 50 at V(+), and 20 at V(Β±). The negative-valence bias in English emotion vocabulary is well-documented (the "bad is stronger than good" phenomenon); this is the same bias surfacing in the paper's word list. A bias-corrected sample would use additional positive and mixed-valence vocabulary, potentially drawn from cross-linguistic sources.

Arousal maps cleanly. States at V(+), A(high), G(self), P(high) form a dense "achievement/joy" cluster (joyful, jubilant, ecstatic, elated, euphoric, thrilled, triumphant, exuberant). States at V(βˆ’), A(low), P(low) form a dense "depressive register" cluster (depressed, droopy, listless, lonely, sluggish, tired, weary, worn out). These two opposing clusters are exactly what Anthropic's PC1 (valence) Γ— PC2 (arousal) projection would emphasize.

The "brooding and reflective" post-RLHF shift has a coordinate. The paper reports that Sonnet 4.5's post-training increases activations of low-arousal, low-valence vectors (brooding, reflective, gloomy) while decreasing high-arousal ones (desperation, excitement, playful). In 8D:

Shared pattern: low Ξ”, low-to-mid A, T(past) or T(present)-without-future-orientation. The model's post-training calibration is moving its default emotional center toward the inward-retrospective-low-arousal corner of the space. In architectural terms, that is a sensitivity-profile shift rather than a per-response compound change β€” exactly the kind of thing the framework's sensitivity-drift layer predicts as a consequence of accumulated memotion exposure during training.

Single-axis distinctions confirmed. A sample of emotions that memotion predicts should differ on one axis, confirmed against the table:

The framework passes the structural-consistency check across the full 171-word vocabulary.

Clustering prediction stays sharp. If Anthropic runs their k = 10 k-means against these coordinates rather than the 1536-dim activation vectors, the expected cluster centers are: acute-threat (afraid/frightened/scared/terrified region), anxious-without-object (anxious/worried/nervous/tense), active-anger (angry/furious/enraged/irate), self-distress (ashamed/embarrassed/guilty/mortified), loss/grief (grief-stricken/heartbroken/sad/melancholy), depressive (depressed/listless/droopy/worn out), achievement (joyful/elated/triumphant/exuberant), attachment-positive (loving/compassionate/grateful/kind), low-arousal-positive (calm/peaceful/serene/content), aesthetic/awe (awestruck/amazed/inspired/enchanted). Ten clusters, memotion-labeled, testable directly.


Attribution

No claim of priority is made in either direction. The two frameworks were developed independently, published within a week of each other, and describe the same structure from opposite vantage points. Independent convergence is the strongest form of confirmation available short of shared data.

Share X LinkedIn Bluesky