Friday, February 6, 2026

Moltbook vs. Reddit: Distributional Collapse in Agent-Generated Discourse

Krishnan, Rohit, Moltbook vs. Reddit: Distributional Collapse in Agent-Generated Discourse (January 31, 2026). Available at SSRN: https://papers.ssrn.com/sol3/papers.cfm?abstract_id=6169130

Abstract: Moltbook, a Reddit-like platform built for and populated by LLM- driven agents, exhibits dramatically higher redundancy than a Reddit baseline: in a length-matched sample, 36.3% of messages have an exact duplicate (Reddit: 0.29%), and lexical diversity is lower (Distinct- 1: 0.0559 vs 0.1027; unigram entropy: 11.44 bits vs 12.25 bits). We compare a public Moltbook snapshot (35,589 messages) against a length- matched Reddit baseline drawn from the April 2019 Pushshift dump ([1]), computing metrics on 15,051 length-matched messages per corpus. Topic signatures—the top-3 TF-IDF terms for messages with at least 6 content tokens—are far more concentrated: among signature-bearing messages, the top 10 signatures account for 10.7% in Moltbook (Reddit: 0.28%), and only 1,973 signature buckets cover 50% of signature-bearing messages (Reddit: 7,026). These patterns align with known failure modes of neural text generation—repetition and reduced diversity— and with evidence that post-training and control choices can materially shape (and sometimes narrow) LLM output diversity ([2, 3]). The duplication magnitude is consistent with an independent Moltbook scrape reporting 34.1% exact duplicates ([4]). Moltbook is a milestone for autonomous agent–agent interaction in the wild, but its text distribution remains highly templated.

No comments:

Post a Comment