OpenAI launches Sora: instant videos out of text prompts

The mind behind ChatGPT has opened the floodgate of possibilities in the world with the launch of Sora, unlike many AI based tools that get launched pretty much on a weekly basis, recently Google’s Gemini comes to mind. 

Sora completely sets its own league compared to the other tools. As it grants users the power to make quality videos by just texts.

Without further a do, let’s just get into the meat of the topic.

Introducing Sora, a remarkable text to video tool 

The markers of ChatGPT have done it again with this breakthrough tool that will captivate people for a long time. Sora is definitely another massive win for the generative artificial intelligence, as now anybody can instantly create a video out of thin air by inputting the written commands.

The instantaneous of Sora was demonstrated on X or formerly known as Twitter, where a Mr.beast (Popular YouTuber) wrote a tweet to Sam Altman (CEO, OpenAI) .

‘Sam please don’t make me homeless’. Then Sam replied with ‘Will generate you a video, what would you like?’. Then Mr. Beast asked for a video of a monkey playing chess in the park. And in a matter of seconds, Sam drops a high-quality video on X about monkeys playing chess in the park.

Why is Sora a leap forward in the AI world?

In the past we have seen plenty of text to video tools, tech giants such as Google and Meta have shared similar technology in the past but none of them can capture the sheer quality of Sora. In terms of producing quality content, Sora seems to be not only a few steps but miles away from what was considered normal or capable.

OpenAI has taken the quality aspect of artificial generative content to a height that was deemed impossible but they have done it with ChatGPT now, it’s time for the videos as well.

Everything you see in the video crafted just from words, the quality to every little thing from the snow, tree, clothes, hair, building, sky, and more is dipped in high-definition quality. The potential Sora has to offer is mind-numbing. The video has garnered 35 million views in just 11 hours. This goes on to tell you how Sora is breaking new grounds in the AI world.

Here is why Sora is a massive to the AI community, this here features the list of things Sora can do, almost instantly –

Able to create qualitative videos from text prompts.
The videos are few seconds long but emphasis is paid to the quality
Feedback is being taken from filmmakers, artists, designers to improve upon the current Sora model.
The complexity of the video will reflect the text inputs.
Ability to create complex scenes with multiple characters
Power to not only add characters but also determine their movement, motion, direction as well.
Users also have the ability to input accurate details to the video by mentioning the subject and background of the footage.
Sora uses Transformer architecture to allow superior scaling performance.
Sora can also create videos out of still images.
It can also take video as an input and extend the duration by accurately depicting the scenes in it.
Everything here is done instantly, which was brand new to the Text-to-video AI generative tools in the market.

Sora model carries in-depth understanding of the language, as it interprets the data accurately to create the video instantly. The range of interpretation is what sets Sora apart from others, as it can also take text prompts or input’s character expression, motion into account. Nearly every aspect of the text that gets written down, Sora takes every little word of the text seriously to craft a video that doesn’t miss out on anything.

Limitation and safety of Sora

There is one thing we know about AI, it is a long road to perfection, Sora is no different. The tool is nowhere near the perfect stage, it still has a long way to go to reach that point. Some errors are still present in Sora; a statement released by the OpenAI has said Sora might confuse the spatial details of the prompt and will have some difficulty in following the camera trajectory.

The AI might struggle with creating physics in more complex scenes and might have a difficult time understanding the cause and effect. A longer text input for a video might not lead to good results. Sora can definitely create videos out of text, but if the text goes on to create more and more complex scenarios, then Sora might fail to deliver that to the surface.

The company is also pushing safety to the AI by working on terms such as misinformation, hateful content, and bias. The company is also planning to release a detector tool that can identify the AI video from the real one. OpenAI also said to make the output video adhere to the user policies, as they have the power to review every frame of the video to ensure it stays in the circle of their policies.

How to use Sora? Or When can I use Sora?

The tool is not out for the public yet. It is currently open for red teaming, a phase where they are still rectifying errors in the system by identifying the flaws. It’s available for visual artists, designers, filmmakers, to have their input for the model going forward.

Santosh Kumar
Santosh Kumar

Santosh Kumar is a writer covering Tech, entertainment, gaming, and some philosophy. His other interests include gaming, reviewing renaissance paintings, and playing at a sport he is not good at.

