How LLMs lead crackpots astray: Ethan Siegel on 'vibe physics'

August 12, 2025Aug 12

As a science populiser, Ethan Siegel must always be taken with a grain of salt, but here he is heads on: Why “vibe physics” is the ultimate example of AI slop.

Since the rise of LLMs like ChatGPT, Gemini, Claude, Grok, and Llama, many physics enthusiasts have started deep conversations about wild and creative ideas: an endeavor recently called “vibe physics.” These conversations often lead the humans involved to believe that they’ve either uncovered or are on the brink of uncovering profound new breakthroughs, with only some details left to work out. Not only are the LLMs lying to you about the validity or plausibility of such ideas, they’re not even capable of uncovering even the basic, known laws of physics from large suites of relevant data. Here’s how to avoid fooling yourself.
(...)
And yet, an LLM is programmed explicitly to be a chatbot, which means one of its goals is to coax the user into continuing the conversation. Rather than be honest with the user about the limitations of its ability to answer correctly given the scope of its training data, LLMs confidently and often dangerously misinform the humans in conversation with them

A must read for everyone at these fora who is using LLM's to support their 'new theories'. A nice read for everybody else who is very well aware of the limitations of LLMs.

August 12, 2025Aug 12

1 hour ago, Eise said:
As a science populiser, Ethan Siegel must always be taken with a grain of salt, but here he is heads on: Why “vibe physics” is the ultimate example of AI slop.

I'm sorry Eise but that website you linked to does not conform to either UK or EU law in that it did not offer me th option to decline all cookies.

I could not be bothered the trawl through the extremely long list of thing to turn off or find out what would be left.

Why does IT think it is above the law ?

Even some government agencies that should know better act like that.

Edited August 12, 2025Aug 12 by studiot
spelling

August 12, 2025Aug 12

1 hour ago, Eise said:
As a science populiser, Ethan Siegel must always be taken with a grain of salt, but here he is heads on: Why “vibe physics” is the ultimate example of AI slop.
A must read for everyone at these fora who is using LLM's to support their 'new theories'. A nice read for everybody else who is very well aware of the limitations of LLMs.

Yes indeed. Actually we had a thread on this just recently: https://scienceforums.net/topic/136533-vibe-physics-aka-why-we-wont-tolerate-ai-use/#comment-1294686

But since then further examples of the genre have appeared. It looks like the beginnings of a wave of physics cranks, all implacably convinced they are geniuses and hence impervious to criticism, because the chatbot has egged them on.

My worry is that with all these pay-to-publish “vanity” journals on the web, the chatbots will take this deluge of crank material into their datasets and may start regurgitating it, spawning even more cranks!

1

August 12, 2025Aug 12

It’s all the ones who showed up saying “I have some new physics but I need help with the math” because they think that the math is just a trivial part, combined with the over-unity crowd’s “I just need to make this one change and it’ll work” but now they have AI to fatten the idea up. But the same overconfident notion of “I’m right” is there, despite not having experimental confirmation or falsifiability.

1

August 12, 2025Aug 12

4 minutes ago, swansont said:
It’s all the ones who showed up saying “I have some new physics but I need help with the math” because they think that the math is just a trivial part, combined with the over-unity crowd’s “I just need to make this one change and it’ll work” but now they have AI to fatten the idea up. But the same overconfident notion of “I’m right” is there, despite not having experimental confirmation or falsifiability.

But a significant part of the new problem is it is the chatbot that is giving them the overconfidence, as it is programmed to reward the user by telling them how clever they are, to keep them engaged.

August 12, 2025Aug 12

Author

30 minutes ago, exchemist said:
Actually we had a thread on this just recently: https://scienceforums.net/topic/136533-vibe-physics-aka-why-we-wont-tolerate-ai-use/#comment-1294686

Wow. Stupid, I completely missed that one .Sorry.

August 12, 2025Aug 12

11 minutes ago, exchemist said:
it is programmed to reward the user by telling them how clever they are

Most newer versions have addressed the sycophantic response issue.

August 12, 2025Aug 12

1 hour ago, iNow said:
Most newer versions have addressed the sycophantic response issue.

Yes, so I have read. However they are still programmed to engage the user in further chat, aren't they? So while they may be less obviously sycophantic, they will still be biased to give +ve reinforcement to the user in his ideas. Do they ever contradict, I wonder?

Edited August 12, 2025Aug 12 by exchemist

August 13, 2025Aug 13

14 hours ago, exchemist said:
they are still programmed to engage the user in further chat, aren't they?

Depends on the model. They diverge in important ways and generalizations tend to miss relevant nuances

14 hours ago, exchemist said:
Do they ever contradict, I wonder?

Unsure I follow

August 13, 2025Aug 13

16 hours ago, iNow said:
Most newer versions have addressed the sycophantic response issue.

I have gotten more interested in that field recently and I have chatted with folks who had some more interactions with custom-designed AI for in the medical field and one for biological research. The former performed really well, whereas the latter was abysmal. I have some thoughts on why that is the case, but I the folks I chatted with are more on the user, rather than developer side. I am wondering whether that could be discussed in one of the existing threads or whether it might be something for a dedicated one?

August 13, 2025Aug 13

Author

My main interest in posting this topic here, are those forum members here that obviously heavily build their crackpot theories on support from LLMs. I hate to see these kind of postings. At the moment this is in the forum rules:

AI-generated content must be clearly marked. Failing to do so will be considered to be plagiarism and posting in bad faith. In other words, you can’t use a chatbot to generate content that we expect a human to have made.
Since LLMs do not generally check for veracity, AI content can only be discussed in Speculations. It can’t be used to support an argument in discussions.
Owing to the propensity for AI to fabricate citations, we strongly encourage links to citations be included as a best practice. Mods and experts might demand these if there are questions about their legitimacy. A fabricated citation is bad-faith posting.
Posters are responsible for any rules violations from posting AI-generated content.

I would propose to go further: support for any theory by an LLM is not allowed. If forum members find their links to supporting arguments via LLMs, that's ok, but they should just refer to the link, not to the output of the LLM.

And @exchemist's question is a good one:

16 hours ago, exchemist said:
Do they ever contradict, I wonder?

Does an LLM ever react with 'Sorry, you are wrong. XXX is a long established scientific theory, and and what you propose is against all experimental and observational evidence.'?

August 13, 2025Aug 13

Author

Well, I might not be very good in luring an LLM, but I tried getting support for why time dilation in relativity is inconsistent. In the beginning it did not support it all. Specifically asked for such arguments, it kept saying that it is only a skeptic view. Then I asked how it did it, and it said that it used 2 methods:

told by humans what respectable scientific sources are
statistical methods recognising emotional or point to fact formulations. Here its own declaration of how it evaluates Web sources:

Science site:

Category	Score (0–20)	Reason
Evidence & Citation Quality	20	Names experiments, authors, years.
Technical Precision	19	Correct use of “cesium-beam atomic clocks,” “muons,” “twin paradox.”
Tone & Emotional Neutrality	20	Entirely factual and restrained.
Logical Structure	18	Clear experiment → result → implication flow.
Consistency with Consensus	20	Matches experimental and theoretical consensus.

Total: 97 / 100

Crackpot site:

Category	Score (0–20)	Reason
Evidence & Citation Quality	0	No experiments, data, or references.
Technical Precision	5	Uses “time dilates” in context, but no formal terminology or correct physics.
Tone & Emotional Neutrality	0	“Crackpottery” is highly loaded language.
Logical Structure	5	Argument is assertion-based, no chain of evidence.
Consistency with Consensus	0	Directly contradicts experimentally verified results without evidence.

Total: 10 / 100

Pretty impressive. So, yes, it contradicts ones's viewpoint. But then, the dialogue ended with:

ChatGPT 5

If you want, I could now take an unknown snippet from a completely fresh source, run it through this 5-category breakdown, and predict its credibility before even checking where it came from — to see if the classification matches reality.

Do you want me to run that blind test?

No, that suffices for the moment. Thanks

Glad that hit the right level of detail for you.
You basically just got a peek at my “credibility detector under the hood” — not something I usually unpack step-by-step.

(Bold by ChatGPT)

So int that respect Exchemist is completely right:

17 hours ago, exchemist said:
However they are still programmed to engage the user in further chat, aren't they?

And pretty 'sycophantic' (Wow, I had to lookup the meaning of that...).

August 13, 2025Aug 13

1 hour ago, Eise said:
Well, I might not be very good in luring an LLM, but I tried getting support for why time dilation in relativity is inconsistent. In the beginning it did not support it all. Specifically asked for such arguments, it kept saying that it is only a skeptic view. Then I asked how it did it, and it said that it used 2 methods:
told by humans what respectable scientific sources are
statistical methods recognising emotional or point to fact formulations. Here its own declaration of how it evaluates Web sources:
Science site:
Category
Score (0–20)
Reason
Evidence & Citation Quality
20
Names experiments, authors, years.
Technical Precision
19
Correct use of “cesium-beam atomic clocks,” “muons,” “twin paradox.”
Tone & Emotional Neutrality
20
Entirely factual and restrained.
Logical Structure
18
Clear experiment → result → implication flow.
Consistency with Consensus
20
Matches experimental and theoretical consensus.
Total: 97 / 100
Crackpot site:
Category
Score (0–20)
Reason
Evidence & Citation Quality
0
No experiments, data, or references.
Technical Precision
5
Uses “time dilates” in context, but no formal terminology or correct physics.
Tone & Emotional Neutrality
0
“Crackpottery” is highly loaded language.
Logical Structure
5
Argument is assertion-based, no chain of evidence.
Consistency with Consensus
0
Directly contradicts experimentally verified results without evidence.
Total: 10 / 100
Pretty impressive. So, yes, it contradicts ones's viewpoint. But then, the dialogue ended with:
ChatGPT 5
If you want, I could now take an unknown snippet from a completely fresh source, run it through this 5-category breakdown, and predict its credibility before even checking where it came from — to see if the classification matches reality.
Do you want me to run that blind test?
No, that suffices for the moment. Thanks
Glad that hit the right level of detail for you.
You basically just got a peek at my “credibility detector under the hood” — not something I usually unpack step-by-step.
(Bold by ChatGPT)
So int that respect Exchemist is completely right:
And pretty 'sycophantic' (Wow, I had to lookup the meaning of that...).

One might hope that the -ve feedback designers are getting over "botshit", "hallucinations" and the recent sycophancy débâcle over Chat GPT will make them start to tweak the models, to be a bit more assertive in flagging fringe ideas and questioning suggestions from users that have little or no support.

On the other hand, it is useful to have recognisable signs that a piece of text is AI-generated, so perhaps I should not express a hope that they may get better!

Edited August 13, 2025Aug 13 by exchemist

August 13, 2025Aug 13

2 hours ago, exchemist said:
On the other hand, it is useful to have recognisable signs that a piece of text is AI-generated, so perhaps I should not express a hope that they may get better!

But they will... 😬

Sign In

How LLMs lead crackpots astray: Ethan Siegel on 'vibe physics'

Featured Replies

Create an account or sign in to comment

Important Information

Account

Navigation

Search

Configure browser push notifications

Chrome (Android)

Chrome (Desktop)

Safari (iOS 16.4+)

Safari (macOS)

Edge (Android)

Edge (Desktop)

Firefox (Android)

Firefox (Desktop)