How to Create Transcripts and Closed Captions for Your Online Content

Credit: iStock.com/PeopleImages
Credit: iStock.com/PeopleImages

Universal Design for Learning guidelines suggest that institutions provide a text alternative to videos, podcasts, and other media that use sound. This has traditionally been a laborious process of transcribing content manually after a recording is made. Although some faculty instead script their audio ahead of time, this usually leads to a robotic tone that sounds scripted.

Happily, transcription and closed captioning software has advanced considerably in recent years to the point where faculty no longer need to script or manually transcribe their audio. While not perfect, the software is close enough to perfect to require only minor editing in most cases. Many systems draw upon Google’s voice transcription service, which has been tuned through millions of hours of use to yield remarkably accurate results. Bear in mind that as the system has been trained on a wide range of voices, it tends to be most accurate with speech that tends toward an “average” accent—one akin to the neutral, middle-America voice common to network news shows. That said, the system can reliably generate closed captions in a variety of languages other than English.

One important practice is to distinguish between transcripts and closed captions. Transcripts used to be the default method for ADA compliance, but they are not ideal in all situations. Someone watching a video who needs to use a transcript must go back and forth between the transcript and the video to follow along. Thus, it is far better to use closed captions for videos, where the viewer can read the text without leaving the video, and the text corresponds to the place in the video where the words are spoken. Transcripts are best used with audio content, such as podcasts, as there is no visual to follow and the person can simply use the transcript by itself.

Closed captioning systems for video

Kaltura: If your LMS has the Kaltura video creator plug-in, then you can make use of the automatic closed captioning built into the system. It can take a good hour or more for the closed captions to be ready, but they are easy to edit, and students can easily turn them on or off when viewing the video.

YouTube: YouTube has a built-in closed captioning system that utilizes Google’s speech recognition algorithm. As with Kaltura, the person uploading a YouTube video needs to tell the system to create closed captions, but once done the viewer can choose whether or not to view the video with captioning. Note that you can also have the system create closed captions in real time during live events hosted on YouTube, making it good for videoconferencing. Learn the steps to creating closed captions in YouTube on its support site.

PowerPoint: The latest versions of PowerPoint have a little-known feature that will translate your voice into subtitles while you are speaking in presentation mode. It can even translate spoken words from one language into subtitles in another on the fly. This is a good option for adding subtitles to live presentations. See Richard Byrne’s tutorial on how to use the feature.

Google Slides: Google Slides, a popular alternative to PowerPoint, also has a closed captioning feature for live presentations. Just click the “Present” button while in Google Slides, then click the “cc” icon, and it will begin captioning your presentation. Note that Slides captions only in English for now. As with all captioning systems, it is important to speak clearly and without ambient noise.

Transcription systems for audio

Rev and Temi are the two current industry leaders in transcription services. Both are simple to use and cost 25 cents per minute of audio in the auto translation mode. Rev also offers human-made transcription for $1.25 per minute. It claims an accuracy of 80 percent for automatic transcription, though my experience is that it is closer to 90–95 percent, at least for my voice. It claims a 99 percent accuracy rate for human-made transcriptions. Temi offers only automatic transcriptions and has a claimed accuracy of 90–95 percent. It also offers one free transcription of up to 30 minutes. With both systems, the user uploads their audio file, the system calculates the cost, and after the user pays by credit card, the system delivers a transcript that can be edited and downloaded. Both also guarantee delivery of the transcript within 12 hours, though I have always received my transcriptions from Rev within two hours.

Vocalmatic, another automatic transcription service, is a bit cheaper than either Rev or Temi. The first 30 minutes are free, and the cost per hour goes down as you purchase more hours of transcription service. This make it good for departments or institutions looking to buy transcription services in bulk. It also allows the user to phone in to make their audio and claims to process audio in about the same amount of time as the length of the audio file itself, which makes it much faster than either Rev or Temi.

One Response

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Related Articles

Love ’em or hate ’em, student evaluations of teaching (SETs) are here to stay. Parts <a href="https://www.teachingprofessor.com/free-article/its-time-to-discuss-student-evaluations-bias-with-our-students-seriously/" target="_blank"...

Since January, I have led multiple faculty development sessions on generative AI for faculty at my university. Attitudes...
Does your class end with a bang or a whimper? Many of us spend a lot of time crafting...

Faculty have recently been bombarded with a dizzying array of apps, platforms, and other widgets that...

The rapid rise of livestream content development and consumption has been nothing short of remarkable. According to Ceci...

Feedback on performance has proven to be one of the most important influences on learning, but students consistently...

wpChatIcon

Universal Design for Learning guidelines suggest that institutions provide a text alternative to videos, podcasts, and other media that use sound. This has traditionally been a laborious process of transcribing content manually after a recording is made. Although some faculty instead script their audio ahead of time, this usually leads to a robotic tone that sounds scripted.

Happily, transcription and closed captioning software has advanced considerably in recent years to the point where faculty no longer need to script or manually transcribe their audio. While not perfect, the software is close enough to perfect to require only minor editing in most cases. Many systems draw upon Google’s voice transcription service, which has been tuned through millions of hours of use to yield remarkably accurate results. Bear in mind that as the system has been trained on a wide range of voices, it tends to be most accurate with speech that tends toward an “average” accent—one akin to the neutral, middle-America voice common to network news shows. That said, the system can reliably generate closed captions in a variety of languages other than English.

One important practice is to distinguish between transcripts and closed captions. Transcripts used to be the default method for ADA compliance, but they are not ideal in all situations. Someone watching a video who needs to use a transcript must go back and forth between the transcript and the video to follow along. Thus, it is far better to use closed captions for videos, where the viewer can read the text without leaving the video, and the text corresponds to the place in the video where the words are spoken. Transcripts are best used with audio content, such as podcasts, as there is no visual to follow and the person can simply use the transcript by itself.

Closed captioning systems for video

Kaltura: If your LMS has the Kaltura video creator plug-in, then you can make use of the automatic closed captioning built into the system. It can take a good hour or more for the closed captions to be ready, but they are easy to edit, and students can easily turn them on or off when viewing the video.

YouTube: YouTube has a built-in closed captioning system that utilizes Google’s speech recognition algorithm. As with Kaltura, the person uploading a YouTube video needs to tell the system to create closed captions, but once done the viewer can choose whether or not to view the video with captioning. Note that you can also have the system create closed captions in real time during live events hosted on YouTube, making it good for videoconferencing. Learn the steps to creating closed captions in YouTube on its support site.

PowerPoint: The latest versions of PowerPoint have a little-known feature that will translate your voice into subtitles while you are speaking in presentation mode. It can even translate spoken words from one language into subtitles in another on the fly. This is a good option for adding subtitles to live presentations. See Richard Byrne’s tutorial on how to use the feature.

Google Slides: Google Slides, a popular alternative to PowerPoint, also has a closed captioning feature for live presentations. Just click the “Present” button while in Google Slides, then click the “cc” icon, and it will begin captioning your presentation. Note that Slides captions only in English for now. As with all captioning systems, it is important to speak clearly and without ambient noise.

Transcription systems for audio

Rev and Temi are the two current industry leaders in transcription services. Both are simple to use and cost 25 cents per minute of audio in the auto translation mode. Rev also offers human-made transcription for $1.25 per minute. It claims an accuracy of 80 percent for automatic transcription, though my experience is that it is closer to 90–95 percent, at least for my voice. It claims a 99 percent accuracy rate for human-made transcriptions. Temi offers only automatic transcriptions and has a claimed accuracy of 90–95 percent. It also offers one free transcription of up to 30 minutes. With both systems, the user uploads their audio file, the system calculates the cost, and after the user pays by credit card, the system delivers a transcript that can be edited and downloaded. Both also guarantee delivery of the transcript within 12 hours, though I have always received my transcriptions from Rev within two hours.

Vocalmatic, another automatic transcription service, is a bit cheaper than either Rev or Temi. The first 30 minutes are free, and the cost per hour goes down as you purchase more hours of transcription service. This make it good for departments or institutions looking to buy transcription services in bulk. It also allows the user to phone in to make their audio and claims to process audio in about the same amount of time as the length of the audio file itself, which makes it much faster than either Rev or Temi.