AI dataset licensing companies form trade group


Seven content-licensing sellers of music, image, video and other datasets for use in training artificial intelligence systems have formed the sector's first trade group, they said on Wednesday.

The Dataset Providers Alliance (DPA) will advocate for "ethical data sourcing" in the training of AI systems, including rights for people depicted in datasets and the protection of content owners' intellectual property rights, the companies said in a statement.

Founding members include U.S. music dataset company Rightsify, image licensing service vAIsual, Japanese stock photo provider Pixta, and Germany-based data marketplace Datarade.

The emergence of generative AI technologies that can mimic human creativity in recent years has triggered an outcry from content creators and a string of copyright lawsuits against tech companies like Google (NASDAQ:GOOGL), Meta (NASDAQ: META), and ChatGPT maker OpenAI, which Microsoft backs (NASDAQ: M SFT).

Developers have been training models by feeding them vast quantities of content, much of it scraped from the internet for free without the consent of those who created the works or own rights to them.

Tech companies, which claim the usage is legal, are also quietly paying for access to private collections of content to fulfill needs for particular types of data and hedge against legal and regulatory risks.

The prospect that demand for licensed data will grow if copyright owners prevail in their legal fights has prompted a nascent industry of companies that package content and sell access to it for use by AI systems.

As a result, groups have been formed to establish ethical standards for that trade, like Fairly Trained, a non-profit founded this year that certifies models that have not used copyrighted materials without a license.

The DPA targets the content of those transactions, requiring, for example, that its members agree not to sell text data obtained by crawling the web or audio that features people's voices without their explicit consent.