Adopting AI-Based Closed-Captioning for New Streaming Platform Requirements


Large U.S.-based mobile video platform content producer headquartered in California.

Over the past few years, content consumption over streaming media has increased rapidly and there has been a host of new players launching streaming services into the market. Recent examples include the introduction of Quibi and Peacock in April 2020, and HBO Max has plans to expand into Latin America by July 2020. Content owners are scrambling to get new or repurposed content to these platforms, which will require them to meet the platform’s standards in terms of video as well as corresponding metadata. This critical metadata includes closed captions and, as is the case with video, closed captions must meet standards and style guides mandated by individual streaming platforms.

The exponential growth of content consumption and the widespread acceptance of closed captions beyond the hearing-impaired community has not only driven up the output volume required of captioners but also introduced the need to provide captions that can be used on different platforms.

Being a content giant that broadcasts and streams content over various platforms, the client faces the challenge of adhering to the captioning style guide of a new streaming platform while working with its conventional closed-caption-generation applications. Adding to this challenge, the production schedules offer a very limited time frame for in-house captioning teams to concurrently create closed captions, verify they conform with style guides from streaming media platforms, and create translations following the same guidelines. Existing desktop applications failed to support the company in meeting this challenge because they required the involvement of more staff members, not fewer, and added redundant processes to the captioning workflow.

The most recent requirements are evolving to repurposing existing captions or creating new closed captions in conformance with style guides from streaming platforms is adding another layer of checks by captioners. In addition, content going global with text localization where captions have to be created in multiple languages is leading to more sophistication. Text localization is a process that requires close attention to ensure it portrays the exact message of the content. This creates a growing need for complete solutions that address all aspects while being flexible to accommodate various style-guide requirements from streaming platforms.

To find a solution to its captioning challenges, the client evaluated Digital Nirvana’s Trance AI-based, enterprise-grade, cloud-based platform for transcription, closed captioning, and translation. The platform enhances efficiencies by using various AI modules to address the needs of transcription, caption generation, and translation based on the target streaming platform’s style guide preference, in turn enabling users to automatically confirm compliance with output requirements. Once media is ingested, a speech-to-text output is automatically generated and then displayed alongside video in the user interface as a time-synced transcript. The operator can easily review and correct the transcript, then convert it to closed captions based on the profile set (e.g., Netflix, Quibi, Prime, etc.).

This process enables adherence to parameters such as character count, line count, text frame gaps, maximum words per minute, and more. Once the initial review is performed, the content is displayed in a captioning professional window, where users can review it along with the video and confirm how the content appears on platforms.

Once the caption review is complete, the user can automatically generate translations in the same window alongside the video and source-language closed captions. This feature eliminates the need to recheck conformance on style-guide based parameters and gives users the opportunity to review automatic translations in line with the source language captions.

Trance also has a built-in caption conformance module that helps users to repurpose existing captions, correcting and reformatting them to comply with new streaming media requirements. This feature generates time-synced alerts on any non-conformance so the user can easily navigate to the occurrence and review. After completion of caption generation or repurposing using caption conformance, users can download caption output formats based on the profile set, including customized WebVTT or TTML formats suitable for various streaming platforms. Users can also choose to download multiple formats that are in conformance with various broadcast and streaming platforms.

There are a lot of impressive closed captioning applications on the market. What makes Digital Nirvana’s Trance different? While the basic functionality of other applications may be similar, Trance is unique in offering a collection of simple yet sophisticated AI modules that simplify captioning for the user and support an evolving captioning/processing workflow. Be it automatic STT content, automatic caption generation based on style guides, or translation, each aspect has been designed with a focus on reducing the effort involved to create the output.

Because it is an enterprise-grade application, Trance comes with an orchestration layer that enables easy project management, automatic assignment of tasks to users, and a holistic view of day-to-day operations. Combining these future-ready functionalities with superior ease of use, Trance was the customer’s top choice.

Contact Us

Schedule Meeting