General Knowledge

Spot the AI Voice Quiz

Deepfakes, Synthesia, ElevenLabs — can you tell real voices from AI?

Spot the AI Voice Quiz: Deepfakes, Cloning, and the Synthetic Speech Era

An AI can clone your voice from just 3 seconds of audio — and ElevenLabs' system generates speech with only 300 milliseconds of latency. This 50-question deep dive covers the history of text-to-speech from 1980s Kurzweil to WaveNet and Transformers, today's cloning tools including ElevenLabs, OpenAI Voice Engine, and Meta Voicebox, infamous scams like the $25M Hong Kong deepfake and the fake Biden robocall, detection techniques, watermark systems like SynthID, and the SAG-AFTRA and NO FAKES Act regulatory response.

How It Works

Each round presents 10 randomized questions from a pool of 50, with four multiple-choice options and instant feedback after every answer. Your final score comes with a performance tier and shareable results.

What You'll Learn

You'll explore concatenative, parametric, and neural TTS, landmark systems like WaveNet and Tacotron, ElevenLabs' founding and funding, voice-cloning scams that cost victims $4 billion+ in 2023, detection tools like Hive and Resemble Detect, Google SynthID watermarking, the Scarlett Johansson vs OpenAI 'Sky' controversy, Tennessee's ELVIS Act, and the telltale signs that distinguish synthetic speech from a real human.

Frequently Asked Questions

How much audio is needed to clone a voice?

Modern tools like ElevenLabs can produce a recognizable voice clone from just 3 to 10 seconds of reference audio, and OpenAI's Voice Engine (announced March 2024) uses a 15-second reference sample.

What is a deepfake?

A deepfake is AI-generated or AI-manipulated audio, image, or video that convincingly imitates a real person. Audio deepfakes use voice cloning to impersonate politicians, executives, or family members in scams.

How can you spot AI-generated speech?

Listen for unnaturally consistent tone, missing breath and swallow sounds, mispronounced proper nouns, mechanical pauses or 'um/uh,' and incorrect emotional inflection on complex sentences. Detection tools like Hive and Resemble Detect claim 90%+ accuracy.

Last updated: April 2026