Friday, February 6, 2026

Moltbook vs. Reddit: Distributional Collapse in Agent-Generated Discourse

Krishnan, Rohit, Moltbook vs. Reddit: Distributional Collapse in Agent-Generated Discourse (January 31, 2026). Available at SSRN: https://papers.ssrn.com/sol3/papers.cfm?abstract_id=6169130

Abstract: Moltbook, a Reddit-like platform built for and populated by LLM- driven agents, exhibits dramatically higher redundancy than a Reddit baseline: in a length-matched sample, 36.3% of messages have an exact duplicate (Reddit: 0.29%), and lexical diversity is lower (Distinct- 1: 0.0559 vs 0.1027; unigram entropy: 11.44 bits vs 12.25 bits). We compare a public Moltbook snapshot (35,589 messages) against a length- matched Reddit baseline drawn from the April 2019 Pushshift dump ([1]), computing metrics on 15,051 length-matched messages per corpus. Topic signatures—the top-3 TF-IDF terms for messages with at least 6 content tokens—are far more concentrated: among signature-bearing messages, the top 10 signatures account for 10.7% in Moltbook (Reddit: 0.28%), and only 1,973 signature buckets cover 50% of signature-bearing messages (Reddit: 7,026). These patterns align with known failure modes of neural text generation—repetition and reduced diversity— and with evidence that post-training and control choices can materially shape (and sometimes narrow) LLM output diversity ([2, 3]). The duplication magnitude is consistent with an independent Moltbook scrape reporting 34.1% exact duplicates ([4]). Moltbook is a milestone for autonomous agent–agent interaction in the wild, but its text distribution remains highly templated.

3 comments:

  1. Don't tell moltbook!
    Information & templates all the way down...

    Bill, I'd appreciate if you would provide some commentary re "unigram entropy".
    It seems it is infornation & templates
    all the way down in cosmology, biology, linguistics and Agentic AI. Considering the articles below, the entropy factors have a negligible effect between Archaea and extreme complexity, which I find to be surprising, and probably applicable to realising AGI. (Seb, any comment?)

    "Grammar of protein domain architectures
    ...
    "We employ a popular linguistic technique, n-gram analysis, to probe the “proteome grammar”—that is, the rules of association of domains that generate various domain architectures of proteins. Comparison of the complexity measures of “protein languages” in major branches of life shows that the relative entropy difference (information gain) between the observed domain architectures and random domain combinations is highly conserved in evolution and is close to being a universal constant, at ∼1.2 bits. Substantial deviations from this constant are observed in only two major groups of organisms: a subset of Archaea that appears to be cells simplified to the limit, and animals that display extreme complexity. 
    ...
    https://pmc.ncbi.nlm.nih.gov/articles/PMC6397568/

    "Digital and analog chemical evolution
    Jay T Goodwin et al. Acc Chem Res.2012.
    ...
    "These diverse interactions allow the more analog environmental chemical potential fluctuations to dictate conformational template-directed propagation.
    ...
    "... we review the first dynamic network created by modification of a nucleic acid backbone and show how it has exploited the digital-like base pairing for reversible polymer construction and information transfer. We further review how these lessons have been extended to the complex folding landscapes of templated peptide assembly. "
    https://pubmed.ncbi.nlm.nih.gov/23098254/

    Thanks in anticipation 
    Seren Dipity

    P.S  and as I commented the other day re Seb Krier saying... "The Moltbook stuff is still mostly a nothingburger" by Seb
    Sunday, February 1, 2026
    Séb Krier on Moltbook, agents chatting with agents',
    Hasn't he heard of insurance?

    Pps. Fantastic article and exchange on...
    https://3quarksdaily.com/3quarksdaily/2026/02/what-prediction-feels-like-from-thermodynamics-to-mind.html

    ReplyDelete
    Replies
    1. And what do I read next!
      "LeCun wrote:
      "Actually, it’s the other way around: human language is designed (by humans themselves) to be easily decodable by the human brain.
      Also, language is hardly “one of the most complicated tasks imaginable.” Language evolved in a very short time (perhaps 300,000 years), and is performed by a tiny part of the cerebral cortex. It can’t possibly be that complicated. We want it to be complicated because we think of it as uniquely human: It’s what makes us human superior to other animals.
      In reality, vision is enormously more complicated than language. It evolved over hundreds of millions of years, and takes up 1/4 to 1/3 of our entire brain. Yet we take it for granted, because other animals seem to be able to do it."

      AG: "Interesting point!"
      "The anti-Bayesian is standing at the back window with a shotgun, scanning for priors coming over the hill, while a million assumptions just walk right into his house through the front door. (also, an interesting point by Yann LeCun in 2012 about human language)
      Posted on February 6, 2026 9:54 AM by Andrew
      https://statmodeling.stat.columbia.edu/2026/02/06/anti/

      Too much information, sometimes, for my wetware.
      SD.

      Delete
    2. I saw that LeCun thing. It's seriously sideways. I'm not sure of LeCun's "1/4 to 1/3 of our entire brain." Maybe 1/4 to 1/3 of the neocortex, which is 16% of the entire brain. But, sure, language doesn't occupy as much neocortical real estate as vision, but it provides a means of indexing the whole system and that gives it a kind of power that vision doesn't have.

      Delete