Part 8 Generating Interactive Notebooks

Markdown documents can also serve as the basis of interactive notebooks. These notebooks are generated from a GitHub repository using a template provided by the Australian Text Analytics Platform (ATAP) and the Language Data Commons of Australia (LDaCA).

The transformation from a static to an interactive notebook involves Dockerizing the GitHub repository, essentially creating a virtual machine based on the repository’s specified parameters. Subsequently, the interactive notebook is loaded into a Binder workspace, enabling others to interact with copies of your notebook. This sounds very fancy but is, fortunately, quite easy to do. One should be aware though that once a session concludes, the copy of the notebook is closed, although you have the option to save a copy at any point.

Interactive notebooks are very versatile and serve various purposes, including the creation of teaching materials for statistics, text analytics, or programming-focused courses or classes, rendering one’s work transparent by sharing analyses with colleagues or reviewers, creating notebook-based online tools, or showcasing the implementation of computational methods. They accommodate both quantitative statistical and qualitative interpretative work.

Notably, interactive notebooks empower others to execute and edit your code, facilitating data and analysis validation.

Before generating an interactive notebook, ensure you have three essential components:

  1. An R project containing an Rmd file.
  2. An Rmd file named rmd2jupyter.Rmd.
  3. A GitHub account.

The Rmd file should utilize only one programming language, such as R, Python, or Julia, without mixing languages. Additionally, all necessary resources, such as data, should be accessible within the GitHub repository. It’s highly recommended to generate interactive notebooks only once a project or analysis is complete.

Now, let’s proceed with the process of generating an interactive notebook.

Preparation of the Rmd

  1. Begin by duplicating the Rmd file. A simple convention is to append “_cb” to the filename, indicating that this version will be converted into an interactive notebook.

  2. Unlike R Notebooks, interactive Jupyter notebooks do not utilize a YAML metadata header. Consequently, remove the YAML metadata header from the duplicated Rmd file and replace it with a first-level header to denote the notebook’s title. It’s advisable to adjust all headers in the Rmd file to maintain the original structure.

  3. Create a separate script file to list all required packages for installation. Remove any code chunks responsible for installing packages, as Jupyter notebooks have limited options for hiding code chunks.

Converting your Rmd into a Jupyter notebook

  1. Open the “rmd2jupyter.Rmd” file and proceed to install and activate the “rmd2jupyter” package, which contains the necessary function for Rmd to Jupyter conversion. Since this package isn’t available via CRAN, utilize the “devtools” package for installation.
library(devtools)
devtools::install_github("mkearney/rmd2jupyter")
library(rmd2jupyter)
  1. Once the “rmd2jupyter” package is activated, use the “rmd2jupyter” function to convert the Rmd file to a Jupyter notebook.
# Convert one notebook
rmd2jupyter::rmd2jupyter(here::here("acvqainter_cb.Rmd"))

Creating a GitHub repository that connects to Binder

  1. Log in to your GitHub account and visit: https://github.com/Australian-Text-Analytics-Platform/r-binder-template.

  2. On the GitHub page, click on Use this template and select Create a new repository from the dropdown menu. Assign a name to the repository and optionally provide a brief description.

  3. Open the “install.R” file and specify the packages to be pre-installed by adding commands like install.packages(“dplyr”) on separate lines.

  4. Upload your Jupyter notebook by selecting Upload file from the drop-down menu. Additionally, if you need to create folders, create a dummy file first and specify its location as a subfolder.

Once all required packages, the Jupyter notebook, and any necessary data files are uploaded, initiate the notebook by following the provided link. Adapt the URL to point to your interactive notebook by adjusting the username, repository name, and notebook filename.

Alternatively, consider uploading your Jupyter notebook to Google Colab for sharing. However, note that Google Colab may terminate sessions if computations take too long or exceed resource limits.

Please anticipate an extended setup time for the interactive notebook’s first launch, as Docker image creation from the GitHub repository may take up to an hour for complex notebooks with numerous dependencies. Simpler notebooks typically start faster but may still require at least 10 minutes for the initial setup.