Runar Ovesen Hjerpbakk

Software Philosopher

Use the cache action to speedup GitHub Actions

My Norwegian site, Pappaperm.com, is built using Jekyll and I run a simple CI-workflow on every pull request using GitHub Actions. The workflow checks both my writing and technical matters, such that no links return 404 or that every image also have an alternate text.

name: CI

on:
  push:
    branches: [ master ]
  pull_request:
    branches: [ master ]

jobs:
  build:
    runs-on: ubuntu-latest
    env:
      JEKYLL_VERSION: 3.8
    steps:
    - uses: actions/checkout@v2
    - name: Check formatting, build and test site
      run: |
        docker pull andmos/markdownlint
        docker run --rm -v $PWD:/usr/src/app/files andmos/markdownlint **/*.md -i ./_drafts -i ./_my_tags
        docker pull jekyll/jekyll:$JEKYLL_VERSION
        docker run --rm --volume="$PWD:/srv/jekyll:delegated" --volume="$PWD/tmp:/usr/local/bundle:delegated" jekyll/jekyll:$JEKYLL_VERSION /bin/bash -c "chmod a+wx . && bundle check || bundle install && rake ci"

The build first pulls a Docker image containing a linter for Markdown and uses it to check my Markdown-files for style. Then it pulls a Docker image with Ruby correctly configured for Jekyll 3.8 before running tests on the actual website. The tests are run using a simple rake action within the Jekyll-powered container instance and could do with a bit of explaining.

Since Jekyll is a Ruby-tool, Gemfile.lock specifies the sites’s dependency graph, and bundle check is run first to check if the dependencies are already satisfied. If they’re not, they’ll be installed into the tmp directory using bundle install. This directory is mounted within the container instance as /usr/local/bundle, the place where the Ruby bundler stores the dependency graph.

All well and good and the build clocks in under 2 minutes. Sadly though, the majority of this time is spent doing unnecessary work. The sites’s dependency graph remains unchanged between runs more often than not. Thus, the script above wastes a lot of time downloading already known dependencies.

Enter GitHub Action’s Cache Action.

The top run utilizes caching and the bottom one doesn’t.

The difference is massive, caching cuts the build time nearly in half, and it’s easy to implement.

name: CI

on:
  push:
    branches: [ master ]
  pull_request:
    branches: [ master ]

jobs:
  build:
    runs-on: ubuntu-latest
    env:
      JEKYLL_VERSION: 3.8
    steps:
    - uses: actions/checkout@v2
    - name: Check formatting
      run: |
        docker pull andmos/markdownlint
        docker run --rm -v $PWD:/usr/src/app/files andmos/markdownlint **/*.md -i ./_drafts -i ./_my_tags
    - name: Pull latest Jekyll image
      run: docker pull jekyll/jekyll:$JEKYLL_VERSION
    - uses: actions/cache@v1
      with:
        path: tmp
        key: rubygems-v2-{{ hashFiles('Gemfile.lock') }}
    - name: Build and test site
      run: docker run --rm --volume="$PWD:/srv/jekyll:delegated" --volume="$PWD/tmp:/usr/local/bundle:delegated" jekyll/jekyll:$JEKYLL_VERSION /bin/bash -c "chmod a+wx . && bundle check || bundle install && rake ci"

The only real change is this new step:

- uses: actions/cache@v1
  with:
    path: tmp
    key: rubygems-v2-{{ hashFiles('Gemfile.lock') }}

The new build step uses the cache action to cache the files under the tmp directory, using as key the SHA-256 hash of Gemfile.lock with a prefix. As long as the contents of Gemfile.lock is the same as the previous run, the hashed value will be unchanged and the contents of tmp will be fetched from the cache. Since the contents of Gemfile.lock will change if dependencies are added or updated, the cache will then invalidate. Thus making this cache perfectly safe for use.

On a clean run, the caching action will not find any matching keys and all dependencies will be downloaded like normal. At the end of the workflow, provided the other steps were successful, the dependencies saved under tmp will be cached using the specified key and are ready to be used as-is on subsequent runs.

On the next build, provided Gemfile.lock remains unchanged, the cache action will restore the dependencies in less than a second.

All in all the cache action was easy to utilize, improved the build speed significantly and did not add any unneeded complexity. A good win.