uniKpopAI

News

Report Says AI Training Datasets Contain Thousands of K-Pop Songs—From BTS to Blackpink

June 24, 2026

kpop ai training featured image - Report Says AI Training Datasets Contain Thousands of K-Pop Songs—From BTS to Blackpink — Report Says AI Training Datasets Contain Thousands of K-Pop Songs—From BTS to Blackpink AI-generated image visualizing the key points.

June 24, 2026 Wednesday, published in the 'News' category. This is a post. Title: Report Says AI Training Datasets Contain Thousands of K-Pop Songs—From BTS to Blackpink...

Thousands of K-pop tracks—potentially including music from major artists such as BTS and BLACKPINK—are reportedly being used as data to train generative AI systems, according to a new analysis cited by Koreaboo. The article points to a broader investigation by The Atlantic into AI training datasets, raising fresh concerns about copyright, consent, and the risk that AI systems could imitate or plagiarize existing songs.

How the reported discovery works

The underlying issue centers on how generative AI models learn. In broad terms, AI systems are trained by ingesting large volumes of data—text, images, audio, and other content—so that they can identify patterns and produce new outputs when prompted. When music is part of that training, critics argue, the model may learn enough about melodies, structure, and timbre to produce material that sounds extremely close to existing tracks.

In the investigation described by Koreaboo, The Atlantic examined several AI training datasets and reportedly found that they contained millions of songs. The key technique in the reporting, the article says, was using dataset searches: by looking up an artist’s name, researchers could determine whether an artist’s tracks were present in the underlying material.

Scope: thousands of tracks across large catalogs

According to the Koreaboo summary, searching for BTS in one specific dataset surfaced more than 200 songs. It further claims that BTS music appeared across several different datasets, suggesting the exposure could be widespread rather than limited to a single archive.

kpop ai training Image showing the article's key context - In the investigation described by Koreaboo , The Atlantic examined... — AI-generated image visualizing the article’s key points. In the investigation described by Koreaboo , The Atlantic examined several AI training datase…

The article also describes similar results when searching other K-pop acts, particularly those with large discographies. It states that across a sample of around 11 artists, reporters found over 2,000 songs within the datasets—implying that the number of K-pop songs used for training could scale into the hundreds of thousands when considering the genre’s total output and the likelihood that more artists are included beyond that initial sample.

While these figures are presented as estimates based on dataset search results, they are nonetheless notable for the scale: the difference between “a few tracks” and “thousands to hundreds of thousands” matters greatly for whether rights holders can realistically track what was used, how it was licensed, and whether opt-outs exist.

Why it matters: from “learning” to remixing existing works

One practical worry highlighted in the coverage is the possibility of AI-generated plagiarism. The article explains that if an AI model has been trained on a particular song—or enough songs from a given artist—it could generate outputs that closely resemble existing music, potentially with only slight changes.

To illustrate that risk, the article references an example described in The Atlantic reporting: a figure skating performance using an AI song described as a near rip-off of an existing track. While the specifics of that case relate to performance rather than commercial releases, the underlying point is the same—AI can replicate recognizable audio patterns, and doing so without permission can create legal and ethical conflicts.

Industry and fan pushback over consent and environmental impact

The Koreaboo piece situates the dataset findings within a wider controversy around AI in music, where fans and artists have criticized AI use on multiple fronts. Beyond copyright and consent, it notes an additional argument often raised by critics: the environmental footprint of AI training and operation.

kpop ai training Image explaining the article's impact and background - To illustrate that risk, the article references an ex... — AI-generated image explaining the article’s background and impact. To illustrate that risk, the article references an example described in The Atlanti…

Large-scale AI training typically requires substantial computing resources, which in turn involves significant electricity use and can affect local water systems used for cooling. The article frames these concerns as part of a larger debate about whether AI deployment is socially and ecologically responsible—especially when training data includes creative works without clear authorization.

For K-pop specifically, the stakes are amplified by the genre’s highly engineered production practices and global distribution strategies. Artists and labels may also rely on strong brand protections, making the prospect of AI outputs that mimic copyrighted songs more than a theoretical concern.

What happens next: transparency, licensing, and enforcement

If the reported dataset presence is accurate, the next phase is likely to focus on accountability: whether the songs found in training sets were licensed, whether rights holders were informed, and what mechanisms exist for removing content or limiting future training. In practical terms, researchers and lawmakers have been pushing for greater transparency around training data provenance—often summarized as the question of who provided the data and under what permission.

For the music industry, rights management will likely intensify. Labels and collecting organizations may seek clearer standards for AI training exceptions, opt-in or opt-out registries, and enforcement pathways against outputs that are demonstrably derivative. For consumers, the challenge will be distinguishing legitimate remixes or AI-assisted creations from outputs that effectively substitute for original works.

As generative AI grows more capable, dataset disputes like this one could become a central battleground—one that determines not only what AI can learn, but also who gets to decide.

#AI #Copyright #k-pop #Machine Learning #Music Industry

What do you think about this post?

Like 0

Wow 0

Dislike 0

Angry 0

Comments

Max characters 0 / 500

Report Says AI Training Datasets Contain Thousands of K-Pop Songs—From BTS to Blackpink

How the reported discovery works

Scope: thousands of tracks across large catalogs

Why it matters: from “learning” to remixing existing works

Industry and fan pushback over consent and environmental impact

What happens next: transparency, licensing, and enforcement

Related Articles

BTS’ “Dynamite” Crosses 2.1 Billion Views, First K-Pop Boy Group MV to Hit the Mark

ENHYPEN’s Sunoo Marks Birthday With ₩50 Million Donation to Samsung Medical Center

BTS Returns to Billboard’s Top 10 as “Come Over” Debuts on the Hot 100

Comments