Open-access materials
Resources
Free tutorials, R scripts, data sets, and course materials · All open-access via LADAL and GitHub
Tutorials
Below are links to tutorials I created for the Language Technology and Data Analysis Laboratory (LADAL).
DATA SCIENCE BASICS
- Computer tips and tricks — useful tips about working with computers, e.g. how to keep your computer running smoothly
- Introduction to data science — basic concepts of data science
- Introduction to quantitative reasoning
- Basic concepts of quantitative research (methodology)
INTRODUCTION TO R
- Introduction to R for beginners
- String processing with R
- Regular expressions in R
- Working with tabulated data in R — how to create, manipulate, and process tabular data
DATA VISUALIZATION
- Introduction to data visualization with R
- Common visualization types in R — scatter plots, line graphs, bar plots, box plots, etc.
- Geo-spatial data visualization (mapping) with R
STATISTICS
- Descriptive statistics
- Basic inferential statistics
- Fixed- and mixed-effects regression
- Tree-based models
- Cluster and correspondence analysis
- Semantic Vector Space models and other grouping procedures
TEXT ANALYTICS / TEXT MINING / CORPUS LINGUISTICS
- Text analysis and distant reading
- Keyword-in-context concordances in R
- Network Analysis in R
- Co-occurrence and Collocation Analysis in R
- Topic Modeling with R
- Sentiment Analysis with R
- Part-of-speech annotation and syntactic parsing in R — for English, German, Spanish, Italian, and Dutch
CASE STUDIES / FOCUS TUTORIALS
- Creating vowel charts with Praat and R: This LADAL tutorial shows how to extract formant values in Praat and use these to create a vowel chart in R.
- Corpus Linguistics: Gender and Age Differences in Swearing: This LADAL tutorial exemplifies how to perform a simple corpus analysis focusing on gender and age differences in swear word use in Irish English.
- PDF to txt: This LADAL tutorial shows how to extract text from PDF files into txt files for further processing.
For Students
General Notes for Students attending my Courses (Merkblatt für Seminare)
You will find a document with general information about my seminars here. Please read this document if you are attending or plan to attend one of my seminars! (last updated 2015/02/16)
Model term paper
You will find a model term paper here. This model term paper includes information about the structure, content, and formatting of term papers. You can also use it as a template and use the formatting within the model. (last updated 2015/04/08)
Programming / Software Development / Corpus Linguistics
Below you can find some resources such as scripts and data sets that you may find useful.
R scripts
- Chi Squared test for subtables of 2*k tables (R script)
- Configural Frequency Analysis for data with only two level configurations (R script)
- Function for downloading text from websites to create web corpora (R script)
- Function providing nice summaries of simple linear regressions (R script)
- Function providing nice summaries of multiple linear regressions (R script)
- Function providing nice summaries of fixed-effects binomial logistic regressions (R script)
- Step-wise step-up model fitting of fixed-effects binomial logistic regressions (R script)
- Step-wise step-up model fitting of mixed-effects binomial logistic regressions (R script)
- Step-wise step-down model fitting of mixed-effects binomial logistic regressions (R script)
Biodata scripts & data sets (last updated 2015/02/09)
If you find any bugs in the code or mistakes in the results, please let me know.
- ICE Canada: word counts and biodata (R script, result)
- ICE GB-R2: word counts and biodata (R script, result)
- ICE India: word counts and biodata (R script, result)
- ICE Ireland 1.2.2: word counts and biodata (R script, result)
- ICE Jamaica: word counts and biodata (R script, result)
- ICE New Zealand: word counts and biodata (R script, result)
- ICE Philippines: word counts and biodata (R script, result)
- ICE Singapore: word counts (R script, result)
- ICE Hong Kong: word counts (R script, result)
- SBCAE: word counts and biodata (R script, result)
TestCorpus
A small sample corpus for testing functions.
(last updated 2024/06)