OpenAI CTO dodges questions around training data for text-to-video generator Sora

Murati’s reaction has drawn flak on X for her apparent confusion around what publicly available data actually meant [File]
| Photo Credit: REUTERS

A video clip from a WSJ interview with OpenAI CTO Mira Murati has gone viral on social media for the wrong reasons. Murati, who sat down earlier in the week with the publication’s Joanna Stern to discuss OpenAI’s new text-to-video tool, Sora, evidently didn’t have a lot of clarity when it came to answering questions about the datasets the tool had been trained on.

When asked what kind of data the company had used in Sora, Murati responded by saying they stuck to “publicly available data and licensed data.”

Stern then went on to specifically ask where this was from. “So, videos on YouTube?”

Murati made a confused expression in response to this, saying she didn’t know.

Stern persisted with the same line of questioning, asking, “Videos from Facebook, Instagram? What about Shutterstock? I know you guys have a deal with them.”

Murati replied to this saying she wasn’t “actually sure about that” and if they were publicly available, they might have been but she wasn’t “confident about it.”

(For top technology news of the day, subscribe to our tech newsletter Today’s Cache)

She concluded her answer by just saying,” I’m not going to go into the details of the data that was used, but it was publicly available or licensed data.”

Murati’s reaction has drawn flak on X for her apparent confusion around what publicly available data actually meant, her refusal to answer the questions clearly, and possible ignorance.

The source of training datasets in AI tools has become a hotbed for legal muddle. Several authors and media publishers have already filed lawsuits against OpenAI for using their writings to train their AI chatbot ChatGPT without permission.

Source link