Section 4 Exercise 1: Introducing Text Processing

EXERCISE TIME


Form small groups or pairs and check out the transcripts called ex1_transcript.pdf (you can find it in the exercise folder in the data folder).
The aim is to find out how many words (types and tokens) the transcript contains. Write down how you would instruct someone to remove everything that is not a word and how you would then instruct them to count the words in the clean document.