Preliminary Bot or Not Results Are Here!
Patience, we’ll get to Fermi below. First, breaking news.
Our “Bot or Not” project has found a fast and easy way to tell the difference between bots and humans. But, but! What if some future AI finds a way to fool our newly discovered technique?
The good news from our Botmaster Jeremy Lichtman is that Shannon entropy implies that there should be countless ways to ensure that AIs should not be able to forever hide — as long as we humans keep on making use of Shannon entropy to discover new ways to detect them faster than they could possibly find ways to fake being human.
Our Bot or Not research is using Shannon entropy to lay the foundations for detecting bots, lest we find out the hard way: why the Fermi Paradox? Why haven’t we seen any signs of other technological beings?
There are trillions of opportunities for technologically capable species to have arisen. According to NASA, our galaxy alone hosts some 300 million life-friendly planets, and that our universe contains some two trillion galaxies. Yet despite our increasingly powerful optical and radio observatories, we still see no signs of technological life. Is it possible that every intelligent species self-destructs before making their mark on the universe? Or is it possible that despite life having spread across our planet almost as soon as it cooled down enough to allow it to survive, that life may be unique to planet Earth?
Here’s why Carolyn Meinel, principal investigator for our Bot or Not project, says that Shannon entropy means that we should be able to keep the AIs out of trouble.
The number of possible Shannon entropy detection techniques is effectively
infinite. Most importantly, it is mathematically provable — OK, OK, a
hypothesis with vast empirical support — that finding techniques to
detect the works of AIs should be easy. By contrast, any AI that wants
to rule us must first figure out everything possible that we’ll do
before we do it, and figure out how to prevent them all. Yet any
conceivable AI also would be limited by how much energy, heat sinking,
and computing power they could control, and ultimately by the
mathematics inherent in our universe.
Of course, that depends upon the human species making it a priority to keep the AIs from accidentally or on purpose destroying us. Granted, it might be a costly effort. However, given the vast infrastructure of energy, heat sinking and hardware needed for any superhuman AI to survive, they can’t hide and we can destroy them.
The inherent math-based limitation of any AI is based on the same math that underlies encryption. Even the strongest encryption system is easy and cheap to use. By contrast, cracking at least one of today’s encryption systems has been shown to be impossible for all possible future computers, including quantum computers. That is — unless someone or some AI discovers that NP=P, but how likely is that? See also Computers and Intractability, a book Carolyn rereads every few years.
Her favorite: the Turing machine halting problem.
But, but! What about future computing systems that could make superhuman AIs so small and so energy efficient that they could evade our bot killers? Too bad for the future’s truly intelligent AIs, Moore’s law is over. The basic limitation of AIs is inherent in the structure of our universe: quantum mechanics. As more computing power gets packed into chips, their features get smaller. Quantum mechanics means that they increasingly make mistakes, and correction systems take up greater percentages of chip real estate.
Bottom line: Math and physics fundamentally represent the structure of our universe, and it is on our side against the AIs.
Right now, we are happy to report our results with AutoIC, an AI detection technique that inexpensive and easy to use. The Political Cognition Lab of the University of Montana has kindly provided the use of AutoIC, a tool reserved for researchers to automate the scoring of textual data for integrative complexity. Result: Jeremy’s Multi-AI Oracle scores well on some measures of integrative complexity, poorly on others, but almost always zero on measures of dialectical integration, meaning ability to synthesize seeming opposites into a whole that makes logical sense. This is a key element of true intelligence. We also have tested the 000_bot of Lars Warren Ericson that he has been competing in the Metaculus AI Forecasting Benchmark Tournament, and it also gives similar results. If substantiated by tests on more GenAI bots, this will be huge as it may be revealing a fingerprint, so to speak, of the minds of GenAIs.
Another advantage of AutoIC: It is great at smoking out lies and propaganda. Carolyn and Jeremy are working on that, and soon will share the results here.
If you are thinking that GenAI should be perfect for spreading harmful bull**t, yes, that’s frightening us, too. The problem is that making GenAI increasingly persuasive is a way to ensure investments today, profits tomorrow. That isn’t the fault of their tech bro leaders. Building boringly unbelievable bots would be a fast track to bankruptcy.
We now also have a preliminary finding that nearly all Generative AI-based bots in the ongoing Metaculus AI Benchmark Tournament tend to be overconfident on questions that, on average, are over 50% likely to be scored as “yes.” Conversely, none so far tend to score as underconfident. This contrasts with human crowdsourcing competitions, in which average humans tend to be underconfident, as reported by Wharton professor Barbara Mellers. However, in other real-life situations, Prof. Mellers has found that humans are generally overconfident, much like the GenAI bots we have evaluated. So in general, overconfidence is not a measure of whether texts have been generated by a human or a GenAI bot. It is as important for us to rule out hypotheses as it is to support them. That’s life as a scientist — or should be.
Yet another preliminary finding: Jeremy’s bestworldbot turns out to be excellent at forecasting questions that score yes, but terrible at those that score no. We see this discovery as a win because at least we know the broad outlines of our bestworldbot’s problem. We won’t be competing in the upcoming Metaculus Q1 (Jan. 2025 — March 2025) competition because we will be focusing on research on our Bots or Not project. In the meantime, Botmaster Jeremy Lichtman will continue testing and improving his creations. Perhaps we will run one of Jeremy’s creations in the final quarter of that competition. Stay tuned.
In other research news, Dec. 13, 2024, Botmaster Jeremy’s current version of bestworldbot broke its ten-day streak at #2, gradually slipping to #7 as of Dec. 31, the last day of Metaculus’ Q4 AI Benchmark Tournament. But when the questions scored after the 31st were counted, as of today, Jan. 13 , 2025 our bot has plunged to #37 out of 42. We are examining the data to determine why bestworldbot scored so poorly on questions of the form “Will X happen before Dec. 31?” Our initial finding is that nearly, perhaps all questions scored after Dec. 31 resolved as “no.” If this is a common bias in question choices at Metaculus, then competitors who accounted for this would have taken this into account. See the color-coded bots in the most recent leaderboard history below to see which bots suddenly rose in their standings in January.
Our oldest experiment, launched July 12, 2024, ended Sept. 30, 2024: forecasts by Jeremy’s bestworldbot in the Q3 AI Forecasting Tournament on 125 geopolitical questions, which were only open for 24 hours each weekday, beginning at 10:30 AM EDT. The sponsor, Metaculus, has been providing the resulting data to us for our planned “Bots or Not” analysis. We expect this AI Forecasting Tournament to continue until June 30, 2025.
Our second experiment, the Humans vs Multi-AI Panel Forecasting Experiment was launched July 18, 2024 on “What is the probability that the US FED will cut interest rates in September 2024?” It ended Sept. 18, 2024, when the US FED lowered its key overnight borrowing rate by a half percentage point, or 50 basis points, a huge cut compared to the usual quarter point changes, and the first cut since 2020. The Multi-AI Panel was created by Jeremy, assisted by Brian LaBatte and Carolyn Meinel, using five generative AIs: Perplexity, Claude, Mistral, Cohere, and OpenAI , aided by the AskNews system, working together in a panel format. The competing human forecasters were BestWorld staffers Brian LaBatte, Michael DeVault, and Carolyn Meinel.
The third experiment, also launched on July 18, 2024, ending Sept. 18, 2025, was Old Bot, which also forecasted the FED rate cut in competition our human team. It used the same component AIs as the Multi-Panel experiment above, but in a different format. Humans won.
Our long term research goal is creation of a “What’s News and What’s Next” system of traditional journalism enhanced with Gen AI-based news aggregation along with AI, human, and hybrid AI/human discussions of what will likely happen next. The objective is to give credibility to our news coverage via what we expect to be our mostly true forecasts, much like how people nowadays trust weather forecasts.
Learn more here —>