Subtitles are familiar to most people when they are on the television. A practice that is not so well-know for many is transcribing, which means decoding recorded sound to text. According to the requirements of the Accessibility Act, the information provided in recordings must also be available in a text format, which in practice means that both subtitling and transcribing are becoming more common.
When is transcribing needed?
Transcribing is used, for example, for recording court, meeting and interview recordings in text format. On the other hand, the Act on the Provision of Digital Services, which is based on the Accessibility Act that is in force in Finland, requires that all audio and digital content published by public operators or publicly funded projects be accessible, i.e. they must also be provided in a text form. Therefore, text equivalents are transcribed nowadays for, for example, videos and podcasts.
Transcription means more than just typing something into a text format. Sometimes what is said is more essential than how something is said. This is why it is important to agree on the objective and intended use of the transcription before starting a project and ordering a transcription.
Accuracy of transcription
The most common form of transcription is clean verbatim transcription. In clean verbatim transcription, everything that is irrelevant to the end result, such as possible filler words (like, uh), unnecessary sounds or extra repetitions are omitted from the result of the transcription, i.e. the transcript. Of course, these must be included if they are important for the objective of the transcription, such as a laugh to indicate that the speaker is not being very serious. An example of clean verbatim transcription could be the conversion of a standard recorded interview into a text format.
So-called full verbatim transcription differs from clean verbatim transcription. In full verbatim transcription, all filler words, background noises (doorbell, coughing) and non-verbal communication are recorded in the transcript. Full verbatim transcription is especially necessary when, in addition to speech, one wants to record all possible reactions and expressions. Examples of such situations include psychological or market researches.
Transcription into spoken or standard language?
If spoken language is used in the original audio recording, the transcription is usually also done in spoken language. Even if it would be a case of clean verbatim transcription and the filler words in the speech being omitted, the words and characteristics typical for the spoken language are still present. For example, podcasts are often transcribed in spoken language.
It is a case of transcription in standard language when the speech is changed into a standard, literary form. So, the speech is tidied up a bit by, for example, changing the dialect and slang words from the text equivalent to a standard language form and by omitting sounds and generally everything that is not related to the subject matter. Attention is also paid to grammatical correctness, such as punctuation. This will make the text easier to work with. Transcriptions in standard language can be seen especially in the minutes of meetings and in research interviews. Then, the common thing for the transcriptions is the fact that the most important thing is the content, not the way of speaking.
Speech recognition and transcription
Speech recognition technology is constantly evolving, and the software are understanding Finnish better every day. Therefore, if there is a lot of material to be transcribed, it is probably worth using speech recognition as part of the transcription process.
First, the speech is converted into raw text by the speech recognition software and after that it is edited. Editing is usually necessary because speech recognition may not recognise all words, especially those in spoken language. Similarly, the speech recognition software encounters difficulties if the speakers are speaking simultaneously, articulating poorly or speaking very fast, and words spoken in a language other than the main language of the recording do not usually convert correctly.
Sometimes the transcription is needed in a language other than the original language. The transcribed material can be translated into another language or, if only a translation is required, the recording to be transcribed can be translated directly upon decoding. In a case like this, the term translated transcription is usually used.
Translated transcription is especially necessary when the transcribed material is processed in a different language. An example of this could be the processing of a market research discussion conducted in Finland in a foreign parent company. The parent company is unlikely to be interested in listening to a discussion in Finnish or reading a text in Finnish, but they want information about the participants’ reactions and opinions translated into the language they understand.
How much does transcribing cost?
The pricing of transcribing is influenced by a number of factors, such as the method of transcribing, the quality and format of the recording as well as the number of speakers. Based on these, a cost estimate is given for the transcription, which is based either on the time spent on the work or the amount of material (the duration of the recording). In addition, how quickly the transcription is needed naturally plays a part.
According to the Web Content Accessibility Guidelines, audio and video recordings must be made accessible. Sometimes, transcribing is probably the only way to do this (a podcast), and sometimes the text equivalent can be taken care of, for example, with subtitles (a video). Because transcriptions are not timed to videos like subtitles, producing transcriptions is often cheaper than subtitling a video, even if the usability would then suffer.