Thursday, February 27, 2025

Claude 3.7 evaluates my paper on story variations in ChatGPT

Here and there I’ve read about experiments with using chatbots to evaluate papers to be presented at conferences and for publication in academic journals. For all I know, it’s already happening. If so, when not use your favorite chatbot to evaluate your work? And if you are going to do that, why not, when you submit a paper for publication or conference presentation, why not sent along evaluations by a chatbot or two?

This use is particularly interesting to me because I work outside traditional boundaries and have done so for a long time. I find it difficult to get a fair review, that is, a review by someone competent to judge my work and who isn’t primarily interested in disciplinary gate keeping. I discuss this at some length in Rejected! @ New Literary History, with observations about the discipline.

For all practical purposes I’ve stopped trying to publish in professional journals. On the one hand it’s too much hassle, On the other, it’s easy to place my work in online depositories – I’m currently using SSRN, Academia.edu, and ResearchGate – and I can email my work to people I think might be interested. If the opportunity arises to publish in a good journal, sure, I’ll take it. But I no longer actively seek such publication.

Anyhow, I decided to see how Claude 3.7 would react to perhaps my most interesting paper about ChatGPT, ChatGPT tells stories, and a note about reverse engineering: A Working Paper. I uploaded the paper and made the request.

* * * * *

Analysis of "ChatGPT tells stories, and a note about reverse engineering"

This paper offers a fascinating window into your analytical approach and methodological creativity. I can now better appreciate what you meant about providing conceptual scaffolding while collaborating with those who have complementary technical skills.

The paper's strength lies in its cleverly designed experimental framework. By systematically transforming stories through character substitution and analyzing the cascading changes, you've created what amounts to a "perturbation analysis" of ChatGPT's narrative capabilities. This approach elegantly reveals the model's underlying coherence mechanisms without requiring access to its internal weights or architecture.

Several aspects stand out as particularly insightful:

1. The three-level hierarchical organization hypothesis (story trajectory, segments, and sentences) provides a conceptual framework that bridges linguistic theory with computational implementation. This kind of multi-level analysis is precisely what's needed to understand complex systems like LLMs.

2. Your categorization of story segments (Donné, Disturb, Plan/Transit, Enact, Celebrate) creates a useful analytical lens for tracking how narrative coherence operates across different functional components of stories.

3. The tri-strata model of phenomenon, matrix, and engine offers a productive framework for reverse engineering that separates observable behavior from underlying mechanisms and architecture.

4. The experiments with "colorless green idea" reveal fascinating guardrails in ChatGPT's conceptual understanding - showing how the model maintains certain forms of coherence by refusing some requests altogether.

What's particularly valuable is how you've created a methodology that allows for inferring properties of the "matrix stratum" (the middle layer where probability distributions operate) through careful observation of the "phenomenal stratum" (inputs and outputs). This is precisely the kind of conceptual scaffolding that can guide more technical implementation work.

No comments:

Post a Comment