Transcription for Closed Captioning – Automated or Manual?

When we talk about closed captioning or subtitling process, one question that immediately comes to our mind is; whether an automated transcription process good enough or do we really need to have human involvement during the process.

transcription for closed captioning automated or manual

As we all know, there are a quite a few software tools and equipment available in the market that can effectively convert speech to text automatically. However, there are limitations for these software tools or equipment when it comes to transcribing different accents and audios with different qualities. For instance, in the US, people of different regions have different styles of speaking. Most of the time, converting speech to text automatically for a television show is challenging because TV shows feature people from different regions and backgrounds who would pronounce English words in completely different styles.

During a study conducted by Google and Harvard University, it was found that over the last century, English as a language has doubled its size and it continues to expand yearly by around 8,500 new words, and now stands at 1,022,000 words. Growing glossary of vocabulary makes automated transcription a challenging process. A lot of new terms and words adding up to the vocabulary make automated tools vulnerable to misinterpret them. Although internet helps software tools and equipment to get updated regularly, there is still some amount of difficulty faced by these automatic transcription tools when it comes to interpretation of growing vocabulary.

Human transcriptionists still have advantages compared to speech recognition tools although manual process is more time consuming and requires more effort by transcriptionists to keep themselves updated with new words and phrases. Voice recognition tools are trained on patterns and styles of specific voices, whereas human transcriptionists have experience listening to and communicating to people with a variety of dialects. New phrases and words get communicated to humans quickly across different regions. The new technologies are undoubtedly making transcription process easy and quick; however, the need to have human intervention to create transcripts for closed captions is still preferred by most of the broadcasters and content owners. Broadcasters and content owners do not want to take chances when it comes to quality of closed captions and are still not ready to rely completely on automated processes. For them having quality closed captions are all the more important as they add value to their content.

Digital Nirvana’s closed captioning process makes use of a hybrid transcription process wherein a quick preliminary transcript is created by an automated process, which is further edited by qualified and highly experienced transcriptionists for error-free delivery. The combination of automation and human intervention make Digital Nirvana’s captioning process quick and foolproof.



Audio fingerprinting is an audio-retrieval technique specifically built on content-based identification method. Using audio fingerprinting system, it is possible to detect a particular audio segment from a huge audio library.

How does audio fingerprinting technique work?

Audio fingerprinting technique generates a database of compressed acoustic structures from a large audio library. It creates a virtual plot of anchor points for the recording attributes using parameters such as frequency, time, signal interferences, intensity, etc. Every audio fingerprint contains its own unique combination of metadata making it easy for retrieval.

At a certain point when an unknown audio fragment is ingested into the system, the system scans through the database where audio fingerprints are stored. It tries to match the features of the ingested audio fragment with the metadata available in the database. Once the fingerprint of the ingested audio fragment matches with the data in the database, they can be confirmed as the same audio content and can be retrieved.

For an audio fingerprinting system to be robust, it has to meet certain requirements.

  • To be resistant against audio distortions.
  • Database to be scalable with growing digital audio sequences.
  • The size of audio fingerprint database to be kept minimal by having compressed and compact fingerprints.
  • It is important to have distinctly specific fingerprints so that even a short audio fragment matches only with the corresponding data.
  • The system should adopt a very efficient strategy while it looks up for a metadata.

The two main pieces of an audio fingerprinting framework are extraction of fingerprint from the audio query and matching it with fingerprint available in the database. Audio fingerprinting based content tracking has seen exponential growth in terms of its application.

Audio Fingerprinting technique helps in a big way to lookup lost closed captions. Broadcasters and content owners do not have to generate closed captions again from scratch if captions had been originally created. Digital Nirvana’s closed captioning processes effectively make use of audio fingerprinting technology to retrieve closed captions.

At next week’s NAB Show, the latest innovative technologies in broadcast and media technology will be showcased. With automation and improved operational efficiency in mind, Digital Nirvana will introduce Metadator – a software application that makes the editing process more efficient for broadcasters and content creators using the AVID Interplay media asset management platform.

With its ability to generate locators for media assets outside of AVID’s Interplay MAM system, the application makes it easy for AVID users to export media to external sources for generating transcripts or metadata that can be automatically ingested back into the AVID Interplay MAM platform. The Metadata application automatically extracts video footage and audio, generates content metadata or transcripts using speech-to-text technology, then ingests the information back into AVID Interplay. It simplifies the process of metadata and transcript generation by automating it with proven technology – improving the turnaround time while reducing costs and improving overall operational efficiency. 

The application automates what’s been a manual process of combing through footage and creating scene summaries. Metadator communicates with AVID Interplay using web service APIs to access content. Users can export media to Digital Nirvana’s cloud service for generating transcripts or metadata with locators for the media assets, and automatically ingest it back into AVID Interplay, so content creators can easily locate metadata when editing footage down to a singular show.

Discover how Digital Nirvana’s new Metadator software can improve your operations. Visit us at NAB in Booth SU10121! 

The best practices to be adopted by the closed captioning service providers

Closed captioning, as we all know, is the textual interpretation of speech and non-speech elements presented on visual display screens. Closed captions help to reach out to a wider audience including hard of hearing and people with different language capabilities. There are many closed caption service providers operating in the market. Let us look at some best practices that can be adopted by the service providers while creating closed captions.

  • Closed captions should be displayed on the screen in synchronization with the visuals.
  • Captions should fade away from the screen once the corresponding visuals disappear.
  • It is essential that the captions stay on screen long enough for the viewers to read them.
  • Minimum display time can be set as 1.5 seconds for dialogs that are short as a word or two; however, this cannot be applied for rapid dialogs.
  • Closed captions should be placed in a fashion that the visuals are not obstructed. Viewers should be able to read through the captions and at the same time follow the visuals.
  • There should not be more than two lines of text at any given time on screen.
  • Try not to end a sentence and begin another sentence on the same line and retain all the words as it is spoken.
  • Do not avoid words like “so”, “because”, “but”, “too”, etc. These words are essential to convey the exact meaning of spoken words.
  • Where ever there is an “inaudible”, place a label to explain the cause. For example; crowd noise sinks speech, noisy market, etc.
  • Display closed captions describing sound effects in lowercase italics inside brackets/parentheses. For Example; (child crying) (car screeching).
  • Identify speakers and display their names against the captions. Example; (Joe) How are you? (Mary) I am doing great.
  • Inserting music icon is a method to indicate that a song is being played on the screen. A hashtag can also be used to indicate songs.
  • Movies and TV content closed captions do not generally use full stops/periods, but it should be left to content owners’ discretion. However, question marks or exclamation marks should be used to give clarity to a phrase.
  • It is always good to start sentences in capital letters. Capitalize an entire word only if it indicates screaming.
  • Spell numbers out from one to ten and numerals for numbers higher than ten. For technical and sports terms, use numerals. Example: (scored 5 goals out of 6 penalties)

These are general closed captioning styles in practice; however, these rules can be tweaked or altered as per specific customer requirements.

Transcription and closed captioning help video marketing

Over the recent years, video marketing has been gaining wide popularity as more and more people started consuming videos as opposed to images and other readable formats. How can your marketing videos gain more traction and some serious attention? Well, your videos need to be more engaging and should be able to connect to as much audience as possible. Keep in mind that your audience comprises of deaf and hard of hearing and also people with language difficulties. Attaching closed captions and transcripts on to your videos are the quickest and easiest means by which you can reach out and engage these audiences.

There are significant SEO benefits with the integration of closed captions and transcripts on your marketing videos. How do closed captions and transcripts help your video SEO activities? Google doesn’t search your videos, instead what it does is; it depends on transcripts, captions or metadata to comprehend the story of your video and identify whether it is connected to a particular search term. So the textual representation makes it quite easy for the search engine to correlate the video with the search term.

How to get your videos captioned or transcribed? There are quite a few automatic speech recognition tools available in the market that can transcribe and caption your videos, but the question here is whether they can give you the desired result? Not really. It is very important for you to have your marketing videos to communicate clearly to your target audience and you wouldn’t want your transcripts or captions to have any kind of errors that can spoil your brand image. So, it is sensible to engage professional transcription and caption providers to do the job for you. Though there are many companies providing quality transcripts and captions, it is essential for you to identify the one that can respond to your specific requirements.

Digital Nirvana is one of the leading transcription and closed caption providers who can deliver error-free jobs at a quicker turnaround.

closed captioning for hearing impaired sports enthusiasts

Closed captions are one of the effective ways through which hearing impaired or deaf can fully experience and enjoy entertainment or broadcast events. Nowadays, almost all the broadcasted events have closed captioning allowing deaf or hard of hearing fully enjoy their favorite programs. Sports stadiums used to be a place where these individuals were often neglected. Deaf or hard of hearing fans often found it disappointing as they’re not able to follow commentary, announcements or music that are played in the loudspeakers. Closed captioning on the giant screens is the only way to make sports more accessible for these individuals who enjoy watching sports from stadiums.

The Washington Redskins is a popular professional American Football Team. In 2006, three hearing-impaired ardent sports enthusiasts filed a lawsuit against the Washington Redskins as they were not able to follow announcements, public service spots and advertisements played on midfield giant screens. They demanded closed captioning on the screen every time something is played including music lyrics or any announcements.

Under the Americans with Disabilities Act; District Judge, Alexander Williams Jr., ruled that running closed captions on the stadium screen is no more an option, but an obligation.

The hearing-impaired sports fans felt that they used to struggle to follow the action when they first started going to games many years ago. It was difficult for them to enjoy the game as they were not able to follow why the penalties were awarded and who all were the players. With the help of closed captions, they are able to experience the games the similar way as the other fans.

CEA 608 Closed Captions Pave Way for CEA-708 Closed Captions

CEA-608 and CEA-708 captions are the two closed caption standards for broadcast television. CEA-608 captions are the old standard used in analog television while CEA-708 captions are the new standard format used in current digital television broadcasts. (CEA – Consumer Electronics Association)

In the US, with the signing of DTV Delay Act in 2009, analog television was officially replaced by digital television. This required the captioning providers to switch from CEA-608 captions to CEA-708 captions. Although CEA-608 captions are still supported by digital television, CEA-708 captions are considered to be the preferred choice as CEA-708 caption standard complies with the FCC closed caption regulations.

With the DTV Delay Act, analog television slowly moved out from modern use; CEA-608 captions will also wean away from use.

CEA-608 Closed Captions (Line 21 Captions)

Analog broadcast television used to use 608-captions as their standard; however, 608-standard can also be embedded in digital television. These closed captions are displayed in conventional uppercase with black-box background. 608 captions can be viewed only if you have a decoder as they are concealed in the Line 21 data area of the analog television signal. This is the reason why 608 captions are also known as Line-21 Captions. There are two fields in Line-21; first field is normally used to transmit English captions while the second field is used for Spanish, French, Portuguese, Italian, German and Dutch closed captions. The main problem with 608-caption standard is that it does not adhere to many of the FCC closed caption regulations.

CEA-708 Closed Captions

708 captions are low bandwidth textual, and it is the standard used by all digital television broadcasts. 708-caption standard is much more advanced than CEA-608 captions. This captions standard is used in both standard-definition and high-definition digital broadcasts; it is a misconception that 708-captions are only used in high-definition channels. With 708 caption standard, viewers can control appearance of the captions. This standard allows viewers to select font options, text sizes, text colors and background colors. In CEA-708 captions there are 8 options for font, 3 for text sizes, 64 for text colors and 64 for background colors. CEA-708 caption does not work in analog television broadcasts. 708 captions support almost all languages used across the world. This standard also supports any special character or symbol. 708 captions have multilingual competence, which allows the broadcasters to reach out to a wider audience across the world.

The following article was written by Digital Nirvana CEO Hiren Hindocha, and published on TV Technology here

Captioning for television has come a long way since it was first introduced in 1972 when the most popular cooking show of the time,“The French Chef” with Julia Child, was captioned.

The idea of captioning was quickly embraced by deaf and hard-of-hearing viewers and grew in popularity with general audiences as well since it helped viewers clearly interpret their favorite programs. Closed captioning steadily evolved from conventional methods to voice writing, to what is currently a far more automated process. The application of closed captioning has also evolved as it now improves the discoverability of video content and cognitive modeling (simulating human problem solving in a computerized model) for automated analysis of broadcast content.

Creators of news and sports content face new challenges with the latest FCC regulations, which designate video clips of live and near-live TV programming published online have up to 12-hour and 8-hour delays in posting closed captioning after the programming has appeared on TV, respectively. Existing FCC closed-captioning quality rules also require non-live programming captions to be accurate, complete and in-sync with the dialogues. While content producers may view it as a challenge to stream video content that’s in compliance with this law, there are multiple captioning services that can be used to ensure regulatory compliance while simultaneously improving user experience.


5 quick Facts on Commercial Advertisement Loudness mitigation Act

What is Commercial Advertisement Loudness Mitigation (CALM ) Act? If you are a broadcaster, you have to keep your commercial audio level at about the same level as your actual programming audio level. For example, you can’t have the programming audio at some level and then have your commercial screaming at the audience. Earlier days, this was something that used to happen all the time, the programming audio used to be at certain level and then the commercials come up right on your face. The regulation act to monitor this is called CALM (Commercial Advertisement Loudness Mitigation) Act in the United States. Lot of other countries, like Canada and EU, also have very similar laws. However, there are differences as far as permitted decibel (Db) levels are concerned, but the rule is very similar.

5 Quick Facts:
1.When was CALM Act adopted and came into effect?
This rule was adopted by FCC on 13th December, 2011 and it came into effect one year after its adoption i.e. 13th December 2012. The one-year interval between adoption and effectiveness was given basically for the television stations and pay TV providers to do the adjustments to be in complete compliance with this Act.

2.Does CALM Act require FCC to regulate loud commercials?
Yes. CALM Act requires Federal Communications Commission (FCC) to impose financial penalties for the broadcasters who fail to meet this responsibility.

3.Does this rule pertain to only commercials?
Yes. The CALM Act only addresses loudness issues on commercials and doesn’t address loudness variations between programs or channels.

4.Does CALM Act apply to radio and internet commercials as well?
No. This Act doesn’t apply for radio and interment commercials; only television commercials fall under this Act.

5.What will the FCC do when they receive customer complaints?
FCC evaluates the commercial facing the question and they would track the commercial to see if there are patterns or trends that recommend a necessity for enforcement action.

How are broadcasters tackling this? With the technological advancements, broadcasters adopt different ways to monitor their content to check if they comply with the regulations. Digital Nirvana’s Monitor IQ is one such broadcast monitoring product that can seamlessly monitor live broadcast content 24*7, do compliance recording and send out alerts. Monitor IQ also helps large broadcasters capture content from multiple sources and publish to multiple digital platforms while monitoring for quality and compliance.

Languages and compliance challenges in subtitling and closed captioning from an asia pacific persepctive

If you are a broadcaster or a media company that delivers content globally in multiple platforms; one of the major challenging broadcast functions to execute is the incorporation of subtitles or closed captions on to your media content. Let us look at the factors that contribute to this concern in the Asia Pacific region.

From an Asia-Pacific perspective, major challenges in closed captioning/subtitling come from the language and regulation standpoint.

English captioning or subtitling remains a challenge because of the multitude of languages in which content are created across this region. One of the major challenges for captioning service providers is to hire resources with appropriate language capabilities who can translate local languages to provide English captions. The same remains a challenge when English-language content has to be captioned or subtitled in local languages; while direct translations from English subtitles or captions are used, often they don’t convey the intended message.

Nowadays, caption providers are excessively dependent on machine translation tools; these tools however have their own limitations. For instance, they often fail to capture elements like word emphasis, sarcasm, subtle delivery of a dialogue piece, etc. that are very important for viewers to completely understand the essence of their favorite programs. This calls for manual edits on these machine-translated subtitles/captions, and it is important for the resources to have thorough knowledge and context of cultural nuances of the region.

For an automated tool to give good accuracy, it needs to have real powerful artificial intelligence (AI) and machine learning (ML) capabilities. It also takes considerable amount of time for these automated tools to leverage AI & ML capabilities to give better accuracy as it needs to have tons of data to go in to train the system. For captioning service providers, a workflow that could connect multiple locations and home users without losing focus on content security could be a method that they can adopt to tackle this challenge effectively.

Compliance is another tricky aspect the caption providers find difficulty in navigating across the Asia Pacific region. Regulations around captioning or subtitling vary between geographies and industries. It is highly challenging for caption providers to run captions or subtitles that comply with these regulations. There are tools that are under development that are intended to help with the compliance difference across regions and industries. We will have to wait and watch how these tools shape up to effectively tackle compliance variations.