Technology companies including Apple and Nvidia allegedly collected data from YouTube channels such as Khan Academy, BBC and PewDiePie. Credit: Thinkstock Proof News has published a new audit showing that major tech companies such as Apple, Nvidia, Anthropic, and Salesforce used subtitle data from 173,536 YouTube videos to train their artificial intelligence (AI) tools. The companies plan to use the “Youtube Subtitles” data collection, created by EleutherAI; it contains transcripts from news channels such as Khan Academy, MIT, Harvard, The Wall Street Journal, NPR and BBC, as well as entertainment channels such as The Late Show with Stephen Colbert, Last Week Tonight with John Oliver and Jimmy Kimmel Live. The data collection also contains subtitles for videos belonging to big YouTube stars such as MrBeast, Swedish PewDiePie, and Jacksepticeye. According to Youtube’s rules, companies are not allowed to harvest material from the platform without permission. EleutherAI has so far not commented on Proof News’ review. More tech news: FTC is looking into Amazon’s deal with AI startup Adept UK regulators probe Microsoft’s hiring of former Inflection staff AI chip battleground shifts as software takes center stage OpenAI whistleblowers seek SEC probe into ‘restrictive’ NDAs with staffers EU accuses X/Twitter of breaching the Digital Services Act Related content opinion Agentic RAG AI — more marketing hype than tech advance CIOs are so desperate to stop generative AI hallucinations they’ll believe anything. Unfortunately, Agentic RAG isn’t new and its abilities are exaggerated. By Evan Schuman Aug 16, 2024 5 mins Technology Industry Generative AI Emerging Technology news Researchers tackle AI fact-checking failures with new LLM training technique Deductive Closure Training (DCT) looks to address the problems of LLM bias, misleading information, and outright contradiction. By John E. Dunn Aug 15, 2024 4 mins Generative AI IBM Technology Industry news MIT delivers database containing 700+ risks associated with AI Called the AI Risk Repository, the goal, its creators say, is to provide an accessible and updatable overview of risk landscape. By Paul Barker Aug 15, 2024 1 min Generative AI Security news brief Hollywood unions OK AI-cloned voices in commercials But companies must first obtain consent from the actor for any ad that uses the digital voice copy. By Viktor Eriksson Aug 15, 2024 1 min Generative AI Technology Industry Podcasts Videos Resources Events SUBSCRIBE TO OUR NEWSLETTER From our editors straight to your inbox Get started by entering your email address below. Please enter a valid email address Subscribe