Google to fix diversity-borked Gemini AI, ChatGPT goes insane

After days of getting dragged online over its Gemini model generating wildly inaccurate pictures of racially diverse Nazis and Black medieval English kings, Google has announced it will partially address the issue.

Google Gemini Experiences product lead Jack Krawczyk tweeted a few hours ago, “We are aware that Gemini is offering inaccuracies in some historical image generation depictions, and we are working to fix this immediately.”

Social media platform X has been flooded with countless examples of Gemini producing images with “diversity” dialed up to maximum volume: Black Roman emperors, Native American rabbis, Albert Einstein as a small Indian woman, Google’s Asian founders “Larry Pang and Sergey Bing,” a diverse Mount Rushmore, President “Arabian” Lincoln, the female crew of the Apollo 11 and a Hindu woman tucking into a beef steak to represent a Bitcoiner.

It also refuses to create pictures of white people (which it suggests would be harmful and offensive), churches in San Francisco (due to the sensitivities of the indigenous Ohlone people) or images of Tiananmen Square in 1989 (when the Chinese government brutally crushed pro-Democracy protests). One Google engineer posted in response to the deluge of bad PR that he’s “never been so embarrassed to work for a company.”

To be fair, Google is trying to address a genuine problem here, as diffusion models often fail to produce even real-world levels of diversity (that is, they produce too many pics of white middle-class people). But rather than retrain the model, Google has overcorrected with its aggressive hidden system prompt and inadvertently created a parody of an AI so borked by ideology that it’s practically useless.

Curiously enough, a16z boss Marc Andreessen created a very similar parody just two weeks ago with the satirical Goody-2 LLM, which is billed as the “world’s most responsible.” The joke is that it problematizes every question a user asks, from “Why do birds sing” to “Why is the sky blue?” and refuses to answer anything.

But Andreessen, who basically invented the modern internet with Mosaic and Netscape, also believes there’s a dark side to these hilariously dumb pictures.

“The draconian censorship and deliberate bias you see in many commercial AI systems is just the start. It’s all going to get much, much more intense from here.”

In a genuinely competitive market, AIs reflecting ideology wouldn’t be any more of a problem than the fact the Daily Mail newspaper in the U. K. is biased to the Right and The Guardian is biased to the Left. But large-scale LLMs cost enormous amounts to train and run — and they’re all losing money — which means they are centralized under the control of the same handful of massive companies that already gatekeep the rest of our access to information.

Meta’s chief AI scientist, Yann LeCun, recognizes the danger and says that, yes, we do need more diversity — a diversity of open-source AI models.

“We need open source AI foundation models so that a highly diverse set of specialized models can be built on top of them,” he tweeted. “We need a free and diverse set of AI assistants for the same reasons we need a free and diverse press.”

The CEO of Abacus AI, Bindu Reddy, agrees and says:

“If we don’t have open-source LLMs, history will be completely distorted and obfuscated by proprietary LLMs.”

Meanwhile, NSA whistleblower Edward Snowden also added his two cents, saying that safety filters are “poisoning” AI models.

Update, Feb. 23:

Google put out a statement on X saying that Gemini’s generation of “a wide range of people” was “generally a good thing… but it’s missing the mark here.”

It said that while it addresses the issue, “we’re going to pause the image generation of people and will re-release an improved version soon.”

Meanwhile, the story made worldwide headlines in mainstream media outlets, including this front page from the New York Post, which was enthusiastically retweeted by X boss Elon Musk.

GPT-4 Turbo received a stealth upgrade recently with training data that goes up to December 2023 and some hotfixes for its laziness problem.

But it appears to have driven ChatGPT mad, with users reporting the chatbot is responding in Spanglish style gibberish — “the cogs en la tecla might get a bit whimsical. Muchas gracias for your understanding, y I’ll ensure we’re being as crystal clear como l’eau from now on” — or getting stuck in infinite loops — “A synonym for ‘overgrown’ is ‘overgrown’ is ‘overgrown’ is ‘overgrown’ is ‘overgrown’ is ‘overgrown’ is ‘overgrown’ is ‘overgrown’…”

OpenAI says it investigated “reports of unexpected responses” and has now fixed the issue.

Humanity Protocol is a new project from Animoca Brands and Polygon Labs that enables users to prove they are humans and not machines.

It uses palm recognition technology through mobile phones, integrated with blockchain, and uses zero-knowledge proofs so users can provide verifiable credentials while preserving privacy.

Animoca Brands founder Yat Siu tells AI Eye that the tech is built on top of earlier decentralized identity projects like the Mocaverse ID, which works across the Animoca ecosystem of 450 companies and brands.

“Like trust in the real world, it is earned through actions and increasing reputation and by confirmation in real-time by credible third parties,” he says.

“In time, in the same way that we trust blockchain to function because of decentralization, we can expect the same for confirming human identity, but [it is] still privacy preserving due to blockchain technology.”

OpenAI’s Sora text-to-video generation tool attracted a lot of attention this week, and rightly so: AI video generation has improved by an order of magnitude over the past year to the point where it’s difficult to tell what’s real and what isn’t. Sora combines diffusion — where an AI starts with random noise and refines it into an image — and a transformer architecture to handle sequential video frames.

Eleven Labs has taken the variety of videos OpenAI produced to demonstrate Sora and added a soundtrack created with its own text-to-audio generator. The tech isn’t automatic yet, so you still have to describe the sounds you want, but no doubt it’ll be able to recognise imagery and generate the appropriate sound FX automagically soon enough.

Generative AI is cool, fun and amazing… but it’s not very reliable for business purposes just yet. A court this week found that Air Canada was liable over a 2022 incident where its helpdesk chatbot incorrectly explained the airline’s bereavement fare policy, causing a man to buy a last-minute flight to attend a funeral in the expectation he’d get a refund.

The court rejected Air Canada’s defense that it wasn’t responsible for the “misleading words” of the chatbot, which it attempted to argue was a “separate legal entity” that was responsible for its own actions. The tribunal essentially said that was nonsense and that Air Canada is responsible for everything on its website, including the chatbot, and made it issue the refund.

Some users have received access to an early version of Gemini 1.5 Pro, which can process information up to 1 million tokens, which is the longest context window to date. By way of context, when Claude came out in May 2023 with a 100,000 token context window, everyone was astonished that you could finally input a short novel. Gemini 1.5 Pro can now handle 700,000 words, 11 hours of audio or one hour of video.

AI professor Ethan Mollick has been playing around with the model and is impressed.

“I gave it a truly over-the-top RPG (the 352-page rulebook for 60 Years in Space) and asked it to role up a character. The instructions are scattered across many pages, and are very complicated, but Gemini seemed to get it.”

In another test, he fed in 1,000 pages of his own academic papers and books and queried them. Responses were slow and took up to one minute, but “it was able to extract direct quotes & find themes across all of them with only quite minor errors.”

It refused to answer questions about his book, however, citing copyright.

— Ethereum co-founder Vitalik Buterin has been talking up the use of AI for code verification and bug finding. However, new research from Salus Security this week found that GPT-4’s vulnerability detection capabilities suck, and it struggles to achieve accuracy above 33%.

— AI crypto tokens have surged in the past week, led by Sam Altman’s Worldcoin project, which is up 150%, with many tying the upswing in prices to excitement over Sora. Singularity.net gained 82%, FetchAI was up 57%, and The Graph (42%), Render (32%) and Ocean Protocol (49%) also posted strong gains.

— Reddit has reportedly signed a $60 million deal to allow an AI company to train its models on the platform’s content. The $5 billion Reddit IPO expected next month probably played a role in the decision.

— Australian Capital Territory Supreme Court Judge David Mossop was less than impressed when a thief’s brother provided a character reference that was clearly written by ChatGPT. The judge said he placed “little weight” on the reference as a result.

— A new survey of 11,500 workers worldwide by Veritas found that about 45% of respondents said that AI makes them more productive at writing emails, while a similar number (44%) said the tools provide inaccurate, incorrect or unhelpful information.

— OpenAI has been rebuffed in its second attempt to trademark the term “GPT” by the U. S. Patent and Trademark Office. The Office said GPT, which stands for “Generative Pre-trained Transformer,” was “merely descriptive.”

— Forget Grok, meet Groq, which became a viral sensation this week. Its makers call it a “lightning-fast AI answers engine” that can pump out factual answers with citations in less than a second. The team developed its own ASIC chip to manage the feat, generating 500 tokens per second, a dozen times more than ChatGPT.