Runar Ovesen Hjerpbakk

Programmer. Software Architect. Technical Manager.

HTMLProofer encoding woes

Pappaperm.com is my Norwegian rosablogg, a blog with recipes and other homely matters. Pappaperm is built using Jekyll and I use the HTMLProofer gem during the build to verify that all links are still reachable.

Good idea in theory and it works perfectly on this site. On Pappaperm however, HTMLProofer failed with:

Encoding::UndefinedConversionError: "\xC3" from ASCII-8BIT to UTF-8

HTMLProofer can be configured with a host of options, and mine looked like this:

options = {
    :assume_extension => true,
    :check_favicon => true,
    :check_opengraph => true,
    :check_html => true,
    :check_img_http => true,
    :only_4xx => true,
    :http_status_ignore => [ 403 ],
    :cache => { :timeframe => '12h' },
    :parallel => { :in_processes => 8 }
}
HTMLProofer.check_directory("./_site", options).run

Turns out the caching was the problem.

Pappaperm has a couple of internal links using the beautiful Norwegian language, full of æ, ø and å. And these characters proved a handful for the cache. Removing cache => { :timeframe => '12h' } made the checks run without error.

My Ruby is not that great, so I’ve filed an issue in the repo.