Bookdown



  • Bookdown::renderbook('index.Rmd', 'bookdown::gitbook') 4. Hosting the book. Now that the book is ready for the world to see, you simply need to tell GitHub where to find the rendered book. In your GitHub repository, enter the settings tab up the top and enable GitHub pages on the /docs folder.
  • I’m currently working on three different books: Mastering Software Development in R, Developing Data Products, and The Unix Workbench.My increased level of productivity has been made possible in part by bookdown, an R package by the incredible and prolific Yihui Xie which transforms R Markdown documents into a book that looks beautiful online with EPUB and PDF versions included.

tl;dr

I just self-published a book-length version of my project Statistical Rethinking with brms, ggplot2, and the tidyverse. By using Yihui Xie’s bookdown package, I was able to do it for free. If you’ve never heard of it, bookdown enables R users to write books and other long-form articles with R Markdown. You can save your bookdown products in a variety of formats (e.g., PDF, HTML) and publish them in several ways, too. The purpose of this post is to give readers a sense of how I used bookdown to make my project. I propose there are three fundamental skill sets you need basic fluency in before playing with bookdown. Those three are

This book will teach you how to program in R, with hands-on examples. I wrote it for non-programmers to provide a friendly introduction to the R language. You’ll learn how to load data, assemble and disassemble data objects, navigate R’s environment system, write your own functions, and use all of R’s programming tools. Throughout the book, you’ll use your newfound skills to solve.

  • R and R Studio,
  • Scripts and R Markdown files, and
  • Git and GitHub.

Start with R

First things first. Since bookdown is a package for use in the R environment, you’re going to have to use R. If you’re unfamiliar with it, R is a freely-available programming language particularly well-suited for data analysis. If you’ve not used R before, learning how to self-publish books is a great incentive to start learning. But unless you already have a background in programming, I think bookdown is poorly-suited for novices. R newbies should check out Roger Peng’s R Programming for Data Science or Grolemund and Wickham’s R for Data Science. Both are freely available online and, as it would turn out, made with bookdown. Also, new users should be aware that although you can interact with R directly, there are a variety of other ways to interface with R. I recommend using R Studio. You can find some nice reasons, here. For basic instructions on how to install R and R Studio, you might start here. And if you prefer video tutorials to help you with the installation, just do a simple search in your favorite video-sharing website and several should pop up.

Personally, I started using R—via R Studio—during the 2015/2016 winter break before taking a spring semester statistics course based around an R package. [In case you’re curious, it was a structural equation modeling course based around a text by Beaujean which featured the lavaan package]. At the time, I was already familiar with structural equation modeling, so the course was a nice opportunity to learn R. In addition, I was concurrently enrolled in a course on multilevel modeling based on Singer and Willet’s classic text. The professor of that course primarily used SAS to teach the material, but he was flexible and allowed me to do the work with R, instead. So that was my introduction to R–a semester of immersion in #rstats. Here are some other tips on how to learn R.

bookdown uses Markdown

If you work with R through R Studio, you can do a handful of things through dropdowns. But really, if you’re going to be using R, you’re going to be coding. As it turns out, there are a variety of ways to code in R. One of the most basic ways is via the console, which I’m not going to cover in any detail.

The console is fine for quick operations, but you’re going to want to do most of your coding in some kind of a script. R Studio allows users to save and execute code in script files, which you can learn more about here. Basic script files are nice in that they allow you to both save and annotate your code.

However, the annotation options in R Studio script files are limited. After using R Studio scripts for about a year, I learned about R Notebooks. These are special files that allow you to intermingle your R code with prose and the results of the code. R Notebooks also allow users to transform the working documents into professional-looking reports in various formats (e.g., PDF, HTML). And unlike the primitive annotation options with simple script files, R Notebooks use Markdown to allow users to format their prose with things like headers, italicized font, insert hyperlinks, and even embed images. So Markdown, then, is a simple language that allows for many of those functions.

Within the R Studio environment, you can use Markdown with two basic file types: R Markdown files and R Notebook files. R Notebook files are just special kinds of R Markdown files that have, IMO, a better interface. That is, R Notebooks are the newer nicer version of R Markdown files. The main point here is that when I say “bookdown uses Markdown”, I’m pointing out that one of the important skills you’ll want to develop before making content with bookdown is how to use Markdown within R Studio. It’s not terribly complicated to learn, and you can get an overview of the basics here or here or here, or an exhaustive treatment here.

If you’re a novice, it’ll take you a few days, weeks, or months to get a firm grasp of R. Not so with R Markdown files. You’ll have the basics of those down in an afternoon. That said, I had been an R Notebook user for more than a year before trying my hand at bookdown.

The first big edition of my Statistical Rethinking with brms, ggplot2, and the tidyverse project came in the form of R Notebook files and their HTML counterparts stored in one of my projects on the Open Science Framework. I don’t update it very often, but you can still find it here. If you’re not familiar with it, the OSF is a “free, open source web application that connects and supports the research workflow, enabling scientists to increase the efficiency and effectiveness of their research.” In addition to their wiki, you might check out some of their video tutorials.

You’ll need GitHub, too

I’m actually not sure whether you need to know how to use Git and GitHub to use bookdown. In his authoritative book, bookdown: Authoring Books and Technical Documents with R Markdown, Yihui Xie mentioned GitHub in every chapter. If you go to your favorite video-sharing website to look for instructional videos on bookdown, you’ll see the instructors take GitHub as a given, too. If you’re stubborn and have enough ingenuity, you might find a way to successfully use bookdown without GitHub, but you may as well go with the flow on this one.

If you’ve never heard of it before, Git is a system for version control. By version control, I mean a system by which you can keep track of changes to your code, over time. Even if you don’t have a background in programming, consider a scenario where you had to keep track of many versions of a writing project, perhaps saving your files as first_draft.docx, second_draft.docx, final_draft.docx, final_draft_2.docx… This was your own make-shift attempt at version control for writing. I’ve seen a lot of introductory material recommend Git and GitHub by leading with version control. And indeed, they do serve that purpose. But IMO, leading with version control is a rhetorical mistake when talking to non-programmers. I haven’t found Git and GitHub the most intuitive and if version control was the only benefit, they wouldn’t be worth the effort. But there are other good reasons to learn.

IMO, the best reason to learn Git and GitHub is because they allow you to make your work publically available. When you just use Git, the work stays on your computer. But GitHub allows you to save your files online, too. This makes it easy for others to review them and give you feedback. GitHub also allows you to save things like data files online. So if you’re a working scientist, Git and GitHub might allow you to make a site—a repository—to house the de-identified data and statistical code for one of your projects. It’s another way to do open science. In addition, you can repurpose GitHub to work as blog or an analytic portfolio. And if you’d like to use bookdown, Git and GitHub will be a part of how you manage the files for your projects and make your work more accessible to others.

If you’re new to all this, you could probably blindly follow along with the steps in Yihui Xie’s bookdown manual or any of the online video tutorials. But I suspect that’d be pretty confusing. Before attempting a bookdown project, spend some time getting comfortable with Git and GitHub, first. The best introduction to the topic I’ve seen is Jenny Bryan’s Happy Git and GitHub for the useR, which, you guessed it, is also freely available and powered by bookdown.

As I hinted, I found Git and GitHub baffling, at first. I checked out a few online video tutorials, but found them of little help. It really was Bryan’s book that finally got me going. And I’m glad I did. I’ve been slowly working with GitHub for about a year—here’s my profile—and my first major project was putting together the files for the individual chapters in the Statistical_Rethinking_with_brms_ggplot2_and_the_tidyverse project. They originally lived as R Notebook files, eventually rendered in a GitHub-friendly .md file format. After a while, I started playing around with README-only projects, which are basically a poor man’s GitHub version of blog posts (e.g., check out this one). For me, and probably for your future bookdown projects, the most important GitHub skills to learn are commits, pushes, and forkes.

I’d fooled around with GitHub a tiny bit before launching my Statistical Rethinking with brms, ggplot2, and the tidyverse project on the OSF. But it was confusing and after an hour or two of trying to make sense of it, I gave up and just figured the OSF would be good enough. After folks started noticing the project, I got a few comments that it’d be more accessible on GitHub. That was what finally influenced me to buckle down learn it in earnest. I’m still a little clunky with it, but I’m functional enough to do things like make this blog. With a little patience and practice, you can get there, too.

Let Yihui Xie guide you

So far we’ve covered

  • R and R Studio
  • Scripts and R Markdown files
  • Git and GitHub

You don’t have become an expert, but you’ll need to become roughly fluent in all three to make good use of bookdown. Basically, if you are able to load data into R, document a rudimentary analysis in an R Notebook file, and then share the project in a non-embarrassing way in GitHub, you’re ready to use bookdown.

I’ve already mentioned it, but the authoritative work on bookdown is Yihui Xie’s bookdown: Authoring Books and Technical Documents with R Markdown. Yihui Xie, of course, is the author of the package. It’s probably best to just start there, going bit by bit. He also gave an RStudio webinar, Authoring Books with R Markdown, which I found to be a helpful supplement.

The complete version of my Statistical Rethinking with brms, ggplot2, and the tidyverse project has 15 chapters and several preamble sections. Almost all the chapters files include a lot of computationally-intensive code, with the simulations for chapter 6 taking multiple hours to compute. I do not recommend starting off with a project like that, at least not all at once. If you follow along with Yihui Xie’s guide, you’ll practice stitching together simple files, first. After learning those basics, I then picked up other helpful tricks, like caching analyses.

Although I didn’t use these resources while I was learning bookdown, you might also benefit from checking out

  • Sean Kross’s How to Start a Bookdown Book,
  • Karl Broman’s omg, bookdown!,
  • Rachael Lappan’s Using Bookdown for tidy documentation, or
  • Pablo Casas’s How to self-publish a book: A handy list of resources.

Here’s where I park littleexamplesfor myself about bookdown mechanics that I keep forgetting.

The bookdown book: https://bookdown.org/yihui/bookdown/

41.2 About labelling things

You can label chapter and section titles using {#label} after them, e.g., we can reference Section 41.2. If you do not manually label them, there will be automatic labels anyway, e.g., this reference to the unlabelled heading 41.1 uses the automatically generated label @ref(heading-blah-blah).

41.3 Cross-references

Add an explicit label by adding {#label} to the end of the section header. If you know you’re going to refer to something, this is probably a good idea.

To refer to in a chapter- or section-number-y way, use @ref(label).

  • @ref(install-git) example: In chapter 6 we explain how to install Git.

If you are happy with the section header as the link text, use it inside a single set of square brackets:

  • [A picture is worth a thousand words]: example “A picture is worth a thousand words” via A picture is worth a thousand words

There are two ways to specify custom link text:

  • [link text][Section header text], e.g., “pic = 1000 words” via pic = 1000 words
  • [link text](#label), e.g., “RStudio, meet Git” via RStudio, meet Git

The Pandoc documentation provides more details on automatic section IDs and implicit header references.

41.4 Figures, tables, citations

Figures and tables with captions will be placed in figure and table environments, respectively.

Figure 41.1: Here is a nice figure!

Reference a figure by its code chunk label with the fig: prefix, e.g., see Figure 41.1. Similarly, you can reference tables generated from knitr::kable(), e.g., see Table 41.1.

Table 41.1: Here is a nice table!
Sepal.LengthSepal.WidthPetal.LengthPetal.WidthSpecies
5.13.51.40.2setosa
4.93.01.40.2setosa
4.73.21.30.2setosa
4.63.11.50.2setosa
5.03.61.40.2setosa
5.43.91.70.4setosa
4.63.41.40.3setosa
5.03.41.50.2setosa
4.42.91.40.2setosa
4.93.11.50.1setosa
5.43.71.50.2setosa
4.83.41.60.2setosa
4.83.01.40.1setosa
4.33.01.10.1setosa
5.84.01.20.2setosa
5.74.41.50.4setosa
5.43.91.30.4setosa
5.13.51.40.3setosa
5.73.81.70.3setosa
5.13.81.50.3setosa

You can write citations, too. For example, we are using the bookdown package (Xie 2021) in this sample book, which was built on top of R Markdown and knitr(Xie 2015).

41.5 How the square bracket links work

Context: you prefer to link with text, not a chapter or section number.

  • GOOD! Here’s a link to Contributors.
  • BAD. You can see contributors in 2.

Facts and vocabulary

  • Each chapter is a file. These files should begin with the chapter title using a level-one header, e.g., # Chapter Title.
  • A chapter can be made up of sections, indicated by lower-level headers, e.g., ## A section within the chapter.
  • There are three ways to address a section when creating links within your book:
    • Explicit identifier: In # My header {#foo} the explicit identifier is foo.
    • Automatically generated identifier: my-header is the auto-identifier for # My header. Pandoc creates auto-identifiers according to rules laid out in Extension: auto_identifiers.
    • The header text, e.g., My header be used verbatim as an implicit header reference. See Extension: implicit_header_references for more.
  • All 3 forms can be used to create cross-references but you build the links differently.
  • Advantage of explicit identification: You are less likely to update the section header and then forget to make matching edits to references elsewhere in the book.

Bookdown Tutorial

Downloads

Ebook Downloads

How to make text-based links using explicit identifiers, automatic identifiers, and implicit references:

Bookdown Themes

  • Use implicit reference alone to get a link where the text is exactly the section header:
    • [Introduce yourself to Git]Introduce yourself to Git
    • [Success and operating systems]Success and operating systems
  • You can provide custom text for the link with all 3 methods of addressing a section:
    • Implicit header reference: [link text][Recommended Git clients]link text
    • Explicit identifier: [hello git! I'm Jenny](#hello-git)hello git! I’m Jenny
    • Automatic identifier: [Any text you want](#recommended-git-clients)Any text you want