Americas

  • United States

Asia

Viktor Eriksson
Skribent

Big tech firms have reportedly used thousands of YouTube videos to train AI

news brief
Jul 17, 20242 mins
Data PrivacyGenerative AI

Technology companies including Apple and Nvidia allegedly collected data from YouTube channels such as Khan Academy, BBC and PewDiePie.

steal theft hacker crime laptop firewall
Credit: Thinkstock

Proof News has published a new audit showing that major tech companies such as Apple, Nvidia, Anthropic, and Salesforce used subtitle data from 173,536 YouTube videos to train their artificial intelligence (AI) tools.

The companies plan to use the “Youtube Subtitles” data collection, created by EleutherAI; it contains transcripts from news channels such as Khan Academy, MIT, Harvard, The Wall Street Journal, NPR and BBC, as well as entertainment channels such as The Late Show with Stephen Colbert, Last Week Tonight with John Oliver and Jimmy Kimmel Live.

The data collection also contains subtitles for videos belonging to big YouTube stars such as MrBeast, Swedish PewDiePie, and Jacksepticeye. According to Youtube’s rules, companies are not allowed to harvest material from the platform without permission.