The Why
What do you do when you care about the functionality of your products? You write, of course, unit and E2E tests. What about performance? You write benchmarks. Security? You perform security audits. So, what about SEO? You write a library for SEO tests!
The Available Tools
While there are plenty of libraries and tools available, they are mostly focusing on general analyses, and there’s very little on using them in a CI environment. Lighthouse, for example, is mostly known as a performance audit tool, but it can run basic SEO checks too. However, even if used in a CI environment and much like the rest of the available tools, it lacks the ability to prevent specific regression that might be detrimental for a website’s SEO.
How We Got Here
We are maintaining dozens of sites and we are launching new ones periodically, so ensuring that we reduce SEO regressions as much as possible is no easy task. Using a mono repo proved to be a great decision because it enabled code and feature reuse and limited the amount of testing required during a release cycle - basically, write once, test once, and deploy multiple times. But this doesn’t exclude the risk of fixing a bug for a site servicing one country and breaking something in a site servicing another country, and SEO bugs are typically hard to notice because they are neither functional nor visual, as such, a specific tool was needed: a tool that checks each site against regressions.
Initially, Lighthouse was added to the CI and used to detect major degradations and mostly for performance. Then the first version of seo-slip was introduced to prevent embarrassing 404s or unexpected status codes for high traffic pages. These kinds of errors were easy to find by users as well, not only by Googlebot. Eventually seo-slip evolved and allowed us to catch other site specific regressions like incorrect URLs, canonical inconsistencies, redundant redirects, broken internal links or even broken CDN configurations.
Details
Seo-slip was built with flexibility in mind. It can be used in any CI environment and with any preferred JS unit test framework. The built-in checker list is also flexible: it’s up to the test writer to select what checkers are needed for a site under test, he can even write new checkers by implementing a fairly simple interface.
But the real power of this library comes from the checkers-rules separation. Each checker is a piece of JS code focused on a very specific task, for example there is a checker that verifies the status code, another checker for canonical URL, another one for hreflang URLs, etc. And each of these checkers is blind without the rules telling them what status code to expect, what kind of canonical URL to expect or what kind of hreflang URLs to expect.