Anonymous individuals are claiming that ChatGPT stole ‘vast amounts of data’ in what they hope to become a class action lawsuit. It’s a nebulous claim about the nebulous data that OpenAI has used to train ChatGPT.
…“Despite established protocols for the purchase and use of personal information, Defendants took a different approach: theft,” they allege. The company’s popular chatbot program ChatGPT and other products are trained on private information taken from what the plaintiffs described as hundreds of millions of internet users, including children, without their permission.
Microsoft Corp., which plans to invest a reported $13 billion in OpenAI, was also named as a defendant…”
“Creator of buzzy ChatGPT is sued for vacuuming up ‘vast amounts’ of private data to win the ‘A.I. arms race’“, Fortune.com, Teresa Xie, Isaiah Poritz and Bloomberg, June 28th 2023.
I’ve had suspicions myself about where their training data came from, but with no insight into the training model, how is anyone to know? That’s what makes this case interesting.
“…Misappropriating personal data on a vast scale to win an “AI arms race,” OpenAI illegally accesses private information from individuals’ interactions with its products and from applications that have integrated ChatGPT, the plaintiffs claim. Such integrations allow the company to gather image and location data from Snapchat, music preferences on Spotify, financial information from Stripe and private conversations on Slack and Microsoft Teams, according to the suit.”…Misappropriating personal data on a vast scale to win an “AI arms race,” OpenAI illegally accesses private information from individuals’ interactions with its products and from applications that have integrated ChatGPT, the plaintiffs claim. Such integrations allow the company to gather image and location data from Snapchat, music preferences on Spotify, financial information from Stripe and private conversations on Slack and Microsoft Teams, according to the suit.
Chasing profits, OpenAI abandoned its original principle of advancing artificial intelligence “in the way that is most likely to benefit humanity as a whole,” the plaintiffs allege. The suit puts ChatGPT’s expected revenue for 2023 at $200 million…”
ibid (same article quoted above).
This would run contrary to what Sam Altman, CEO of OpenAI, put in writing before US Congress.
“…Our models are trained on a broad range of data that includes publicly available content,
licensed content, and content generated by human reviewers.3 Creating these models requires
not just advanced algorithmic design and significant amounts of training data, but also
substantial computing infrastructure to train models and then operate them for millions of users…”[Reference: 3 “Our Approach to AI Safety.” OpenAI, 5 Apr. 2023, https://openai.com/blog/our-approach-to-ai-safety.]
“Written Testimony of Sam Altman Chief Executive Officer OpenAI Before the U.S. Senate Committee on the Judiciary Subcommittee on Privacy, Technology, & the Law“, Senate.Gov, Sam Altman,CEO of OpenAI, 5-16-2023.
I would love to know who the anonymous plaintiffs are, and would love to know how they got enough information to make the allegations. I suppose we’ll find out more as this progresses.
I, for one, am curious where they got this training data from.