OpenAI’s lead over other AI companies has largely vanished, ‘State of AI’ report finds

Hello and welcome to Eye on AI. In this edition…AI’s fast-falling cost…Google goes nuclear…LLMs may be dumber than you think…and a filmmaker burned by genAI backlash.

Every year for the past seven Nathan Benaich, the founder and solo general partner at the early-stage AI investment firm Air Street Capital, has produced a magisterial “State of AI” report. Benaich and his collaborators marshal an impressive array of data to provide a great snapshot of the technology’s evolving capabilities, the landscape of companies developing it, a survey of how AI is being deployed, and a critical examination of the challenges still facing the field.

OpenAI’s lead mostly vanishes

One of the big takeaways from this year’s report, which was published late last week, is that OpenAI’s lead over other AI labs has largely eroded. Anthropic’s Claude 3.5 Sonnet, Google’s Gemini 1.5, X’s Grok 2, and even Meta’s open-source Llama 3.1 405 B model have equaled, or narrowly surpassed on some benchmarks, OpenAI’s GPT-4o.

But, on the other hand, OpenAI still retains an edge for the moment on reasoning tasks with the release of its o1 “Strawberry” model—which Air Street’s report rightly characterized as a weird mix of incredibly strong logical abilities for some tasks, and surprisingly weak ones for others. (For more on the fragility of o1’s reasoning abilities, see the “Research” section below.)

Inference costs fall rapidly

Another big takeaway, Benaich told me, is the extent to which the cost of using a trained AI model—an activity known as “inference”—is falling rapidly. There are several reasons for this. One is linked to that first big takeaway: With models less differentiated from one another on capabilities and performance, companies are forced to compete on price.

Another reason is that engineers for companies such as OpenAI and Anthropic—and their hyperscaler partners Microsoft and AWS, respectively—are discovering ways to optimize how the largest models run on big GPU clusters. The cost of outputs from OpenAI’s GPT-4o today is 100-times less per token (which is about equivalent to 1.5 words) than it was for GPT-4 when that model debuted in March 2023. Google’s Gemini 1.5 Pro now costs 76% less per output token than it did when that model was launched in February 2024.

AI researchers have also become good at creating small AI models that can equal the performance of larger LLMs on dialogue, summarization, or even coding, while being much cheaper to run. Taken together, these two trends mean that the economics of implementing AI-based solutions are starting to look much more attractive than they did a year ago. This may ultimately help businesses find the return on investment from generative AI that they have complained has been elusive so far.

Robotics makes a come back

Another key trend Benaich picks up on is how robotics is coming back into vogue, with robotics companies marrying LLMs and new “world models” to existing tech to make significant progress in making robots more capable and easier (as well as cheaper) to deploy and customize.

Benaich’s State of AI report always ends with some bold predictions for the year ahead (and Benaich grades himself each year on how he’s done.) Among the things he got right last year: that a Hollywood production would make use of genAI models for visual effects and that there would be limited progress on international AI governance efforts. Among those he got wrong: that a company would spend more than $1 billion training a single LLM.

This year, among the report’s predictions, are that an open source alternative to OpenAI’s o1 will surpass it across a range of benchmarks and that a $10 billion investment from a sovereign state into a U.S. AI company will cause the U.S. government to institute a national security review. We’ll check back next year to see how Benaich did.

Fortune Brainstorm AI takes the pulse of a fast-changing industry

The State of AI report is not the only way to find a fantastic overview of what’s happening in AI. Another great place to gain a vantage point on AI’s rapidly evolving landscape and find out how AI is impacting business is Fortune’s upcoming Brainstorm AI conference in San Francisco. This must-attend annual event is coming up on December 9 and 10, held at the St. Regis Hotel.

This year’s conference will include conversations with, among many others: Amazon’s head scientist for artificial general intelligence, Rohit Prasad, who will update us on how the Everything Store is trying to ensure it doesn’t get left behind in the race to build superpowerful—and super useful—AI; Liz Reid, Google’s vice president of search, who will discuss the future of Google’s signature product in an AI world; Christopher Young, Microsoft’s executive vice president of business development, strategy, and ventures, who will discuss how the tech giant is trying to see around corners to what is coming next for AI; Daniela Braga, the founder and CEO of Defined.ai who will tell us what it really takes to build AI models that work for customers; and Colin Kaepernick, former Super Bowl quarterback for the San Francisco 49ers and current founder and CEO of Lumi, a company that builds AI-powered tools for content creators, who will speak about his own transformation from professional athlete to entrepreneur, and what AI may mean for influencers, brands, and beyond.

I’ll be there, of course, helping to cochair the discussion with a gaggle of ultra-talented colleagues. I hope you will all consider joining me! And I’m very excited to be able to offer Eye on AI readers a special discounted rate—20% off the regular price of attendance! Just write the code KAHN20 in the Additional Comments section of the application to secure your discount. You can click here to find out more. Follow the link on that page to apply to attend. Remember to use the discount code!

With that, here’s more AI news.

Jeremy Kahn
[email protected]
@jeremyakahn

AI IN THE NEWS

India’s central bank chief says AI creates financial stability risk. Shaktikanta Das, the governor of the Reserve Bank of India, became the latest central bank head to warn that the growing use of AI in financial services presents potential risks, especially if banks and hedge funds largely use the same handful of technology vendors, Reuters reported.

New York Times takes aim at generative AI search startup Perplexity. The newspaper’s lawyers have sent Perplexity a “cease and desist” letter asking it to stop accessing and using the publication’s content without permission, the Wall Street Journal reported. Perplexity CEO Aravind Srinivas told the Journal that the company isn’t ignoring the Times’ requests and would respond to its letter by the end of the month. “We have no interest in being anyone’s antagonist here,” Srinivas told the paper. The New York Times is already embroiled in a lawsuit with OpenAI, alleging that AI company violates copyright law by ingesting the Times’ content. (Full disclosure: Fortune has a licensing deal with Perplexity.)

Google orders small nuclear reactors to power data centers as energy demands of AI increase. The Guardian reports that the tech giant has struck a deal with California-based Kairos Power for a fleet of six to seven mini nuclear reactors to generate power for data centers where it will train and run AI models. The first reactor is scheduled to be up and running by 2030. Large cloud providers are increasingly looking at nuclear energy to power data centers without expanding their carbon footprints. Amazon and Microsoft have both struck nuclear power deals in recent months.

OpenAI’s former CTO Mira Murati is trying to poach staff for a new project as staff turmoil continues. That’s according to reporting in The Information, which cited two unnamed sources familiar with Murati’s outreach. Murati has not told staff whether she is launching her own startup or trying to entice OpenAI employees to an existing company that she’s joining, according to the publication. It also said OpenAI’s post-training team, which helps make AI models safer and more customer-friendly, is in upheaval following the departure of its former head, Barret Zoph—whose departure was announced the same day as Murati’s—and his replacement with Liam Fedus. Some researchers have, according to the publication, requested transfers to other teams rather than work under Fedus.

OpenAI hires key researcher from Microsoft. The Information reports that Sebastian Bubeck, who lead Microsoft’s efforts to develop a family of highly-capable, open-source small language models called Phi, has been lured away to OpenAI. This may signal OpenAI wants to train similar kinds of models. It may also signal further tension between OpenAI and its major backer and partner, Microsoft.

EYE ON AI RESEARCH

Do LLMs really reason? A provocative study from six researchers at Apple suggests the answer is no—or, at least, not particularly well, and nothing like how humans do.

The researchers found subtle changes in the phrasing of questions or the addition of irrelevant information to questions resulted in significant degradations in how LLMs performed on benchmark tests. Even the most recent, powerful AI models, including OpenAI’s o1-preview, which was specifically designed to perform better on reasoning tasks, experienced a drop-off in performance on the altered dataset the researchers created. This suggests the reasoning abilities of all of these models is overstated, and instead they mostly just memorize the answers to questions they encounter during training.

At the same time, the research showed the performance of the latest, most powerful LLMs degraded less than those of smaller models. So it may be that the largest models perform something closer to human reasoning, while smaller models don’t.

You can read the full research paper on arxiv.org here.

FORTUNE ON AI

Why Elon Musk’s Cybercab robotaxi vision is likely still several years away—by Jessica Mathews

The U.S. defense and homeland security departments have paid $700 million for AI projects since ChatGPT’s launch—by Kali Hays

Inside Wendy’s drive-thru AI that makes ordering fast food even faster—by John Kell

AI CALENDAR

Oct. 22-23: TedAI, San Francisco

Oct. 28-30: Voice & AI, Arlington, Va.

Nov. 19-22: Microsoft Ignite, Chicago

Dec. 2-6: AWS re:Invent, Las Vegas

Dec. 8-12: Neural Information Processing Systems (Neurips) 2024, Vancouver, British Columbia

Dec. 9-10: Fortune Brainstorm AI, San Francisco (register here)

BRAIN FOOD

Will audience backlash against genAI slow its widespread adoption by creators? Quite possibly. Filmmaker Morgan Neville told Wired that he will never use AI again in his films after facing widespread criticism from fans over his use of AI to recreate the voice of the late chef and travel journalist Anthony Bourdain in his 2021 documentary on Bourdain’s life Roadrunner. Even though Neville only used the AI-generated voice to read text actually written by Bourdain, the use of AI confused viewers, Neville told Wired. Many assumed those aspects of the film were entirely fictionalized, he lamented. Overall, Neville said the use of AI had damaged the Roadrunner’s credibility with audiences.

Neville is not the only creator to discover that AI can undermine a hard-won reputation for authenticity. Toymaker Lego—which is coincidentally the central medium in Neville’s innovative new documentary about musician Pharell Williams, Piece by Piece—has foresworn using generative AI to create catalogues and advertisements after an early experiment with the tech generated significant blowback from Lego aficionados.

OpenAI’s lead over other AI companies has largely vanished, ‘State of AI’ report finds

OpenAI’s lead mostly vanishes

Inference costs fall rapidly

Robotics makes a come back