Comparing visual artifacts can be a powerful, if fickle, approach to automated testing. Playwright makes this seem simple for websites, but the details might take a little finessing.
Recent downtime prompted me to scratch an itch that had been plaguing me for a while: The style sheet of a website I maintain has grown just a little unwieldy as weâve been adding code while exploring new features. Now that we have a better idea of the requirements, itâs time for internal CSS refactoring to pay down some of our technical debt, taking advantage of modern CSS features (like using CSS nesting for more obvious structure). More importantly, a cleaner foundation should make it easier to introduce that dark mode feature weâre sorely lacking so we can finally respect usersâ preferred color scheme.
However, being of the apprehensive persuasion, I was reluctant to make large changes for fear of unwittingly introducing bugs. I needed something to guard against visual regressions while refactoring â except that means snapshot testing, which is notoriously slow and brittle.
In this context, snapshot testing means taking screenshots to establish a reliable baseline against which we can compare future results. As weâll see, those artifacts are influenced by a multitude of factors that might not always be fully controllable (e.g. timing, variable hardware resources, or randomized content). We also have to maintain state between test runs, i.e. save those screenshots, which complicates the setup and means our test code alone doesnât fully describe expectations.
Having procrastinated without a more agreeable solution revealing itself, I finally set out to create what I assumed would be a quick spike. After all, this wouldnât be part of the regular test suite; just a one-off utility for this particular refactoring task.
Fortunately, I had vague recollections of past research and quickly rediscovered Playwrightâs built-in visual comparison feature. Because I try to select dependencies carefully, I was glad to see that Playwright seems not to rely on many external packages.
Setup
The recommended setup with npm init playwright@latest does a decent job, but my minimalist taste had me set everything up from scratch instead. This do-it-yourself approach also helped me understand how the different pieces fit together.
Given that I expect snapshot testing to only be used on rare occasions, I wanted to isolate everything in a dedicated subdirectory, called test/visual; that will be our working directory from here on out. Weâll start with package.json to declare our dependencies, adding a few helper scripts (spoiler!) while weâre at it:
{
"scripts": {
"test": "playwright test",
"report": "playwright show-report",
"update": "playwright test --update-snapshots",
"reset": "rm -r ./playwright-report ./test-results ./viz.test.js-snapshots || true"
},
"devDependencies": {
"@playwright/test": "^1.49.1"
}
}
If you donât want node_modules hidden in some subdirectory but also donât want to burden the root project with this rarely-used dependency, you might resort to manually invoking npm install --no-save @playwright/test in the root directory when needed.
With that in place, npm install downloads Playwright. Afterwards, npx playwright install downloads a range of headless browsers. (Weâll use npm here, but you might prefer a different package manager and task runner.)
We define our test environment via playwright.config.js with about a dozen basic Playwright settings:
import { defineConfig, devices } from "@playwright/test";
let BROWSERS = ["Desktop Firefox", "Desktop Chrome", "Desktop Safari"];
let BASE_URL = "
let SERVER = "cd ../../dist && python3 -m http.server";
let IS_CI = !!process.env.CI;
export default defineConfig({
testDir: "./",
fullyParallel: true,
forbidOnly: IS_CI,
retries: 2,
workers: IS_CI ? 1 : undefined,
reporter: "html",
webServer: {
command: SERVER,
url: BASE_URL,
reuseExistingServer: !IS_CI
},
use: {
baseURL: BASE_URL,
trace: "on-first-retry"
},
projects: BROWSERS.map(ua => ({
name: ua.toLowerCase().replaceAll(" ", "-"),
use: { ...devices[ua] }
}))
});
Here we expect our static website to already reside within the root directoryâs dist folder and to be served at localhost:8000 (see SERVER; I prefer Python there because itâs widely available). Iâve included multiple browsers for illustration purposes. Still, we might reduce that number to speed things up (thus our simple BROWSERS list, which we then map to Playwrightâs more elaborate projects data structure). Similarly, continuous integration is YAGNI for my particular scenario, so that whole IS_CI dance could be discarded.
Capture and compare
Letâs turn to the actual tests, starting with a minimal sample.test.js file:
import { test, expect } from "@playwright/test";
test("home page", async ({ page }) => {
await page.goto("
await expect(page).toHaveScreenshot();
});
npm test executes this little test suite (based on file-name conventions). The initial run always fails because it first needs to create baseline snapshots against which subsequent runs compare their results. Invoking npm test once more should report a passing test.
Changing our site, e.g. by recklessly messing with build artifacts in dist, should make the test fail again. Such failures will offer various options to compare expected and actual visuals:
We can also inspect those baseline snapshots directly: Playwright creates a folder for screenshots named after the test file (sample.test.js-snapshots in this case), with file names derived from the respective testâs title (e.g. home-page-desktop-firefox.png).
Generating tests
Getting back to our original motivation, what we want is a test for every page. Instead of arduously writing and maintaining repetitive tests, weâll create a simple web crawler for our website and have tests generated automatically; one for each URL weâve identified.
Playwrightâs global setup enables us to perform preparatory work before test discovery begins: Determine those URLs and write them to a file. Afterward, we can dynamically generate our tests at runtime.
While there are other ways to pass data between the setup and test-discovery phases, having a file on disk makes it easy to modify the list of URLs before test runs (e.g. temporarily ignoring irrelevant pages).
Site map
The first step is to extend playwright.config.js by inserting globalSetup and exporting two of our configuration values:
export let BROWSERS = ["Desktop Firefox", "Desktop Chrome", "Desktop Safari"];
export let BASE_URL = "
// etc.
export default defineConfig({
// etc.
globalSetup: require.resolve("./setup.js")
});
Although weâre using ES modules here, we can still rely on CommonJS-specific APIs like require.resolve and __dirname. It appears thereâs some Babel transpilation happening in the background, so whatâs actually being executed is probably CommonJS? Such nuances sometimes confuse me because it isnât always obvious whatâs being executed where.
We can now reuse those exported values within a newly created setup.js, which spins up a headless browser to crawl our site (just because thatâs easier here than using a separate HTML parser):
import { BASE_URL, BROWSERS } from "./playwright.config.js";
import { createSiteMap, readSiteMap } from "./sitemap.js";
import playwright from "@playwright/test";
export default async function globalSetup(config) {
// only create site map if it doesn't already exist
try {
readSiteMap();
return;
} catch(err) {}
// launch browser and initiate crawler
let browser = playwright.devices[BROWSERS[0]].defaultBrowserType;
browser = await playwright[browser].launch();
let page = await browser.newPage();
await createSiteMap(BASE_URL, page);
await browser.close();
}
This is fairly boring glue code; the actual crawling is happening within sitemap.js:
createSiteMap determines URLs and writes them to disk.readSiteMap merely reads any previously created site map from disk. This will be our foundation for dynamically generating tests. (Weâll see later why this needs to be synchronous.)
Fortunately, the website in question provides a comprehensive index of all pages, so my crawler only needs to collect unique local URLs from that index page:
function extractLocalLinks(baseURL) {
let urls = new Set();
let offset = baseURL.length;
for(let { href } of document.links) {
if(href.startsWith(baseURL)) {
let path = href.slice(offset);
urls.add(path);
}
}
return Array.from(urls);
}
Wrapping that in a more boring glue code gives us our sitemap.js:
import { readFileSync, writeFileSync } from "node:fs";
import { join } from "node:path";
let ENTRY_POINT = "/topics";
let SITEMAP = join(__dirname, "./sitemap.json");
export async function createSiteMap(baseURL, page) {
await page.goto(baseURL + ENTRY_POINT);
let urls = await page.evaluate(extractLocalLinks, baseURL);
let data = JSON.stringify(urls, null, 4);
writeFileSync(SITEMAP, data, { encoding: "utf-8" });
}
export function readSiteMap() {
try {
var data = readFileSync(SITEMAP, { encoding: "utf-8" });
} catch(err) {
if(err.code === "ENOENT") {
throw new Error("missing site map");
}
throw err;
}
return JSON.parse(data);
}
function extractLocalLinks(baseURL) {
// etc.
}
The interesting bit here is that extractLocalLinks is evaluated within the browser context â thus we can rely on DOM APIs, notably document.links â while the rest is executed within the Playwright environment (i.e. Node).
Tests
Now that we have our list of URLs, we basically just need a test file with a simple loop to dynamically generate corresponding tests:
for(let url of readSiteMap()) {
test(`page at ${url}`, async ({ page }) => {
await page.goto(url);
await expect(page).toHaveScreenshot();
});
}
This is why readSiteMap had to be synchronous above: Playwright doesnât currently support top-level await within test files.
In practice, weâll want better error reporting for when the site map doesnât exist yet. Letâs call our actual test file viz.test.js:
import { readSiteMap } from "./sitemap.js";
import { test, expect } from "@playwright/test";
let sitemap = [];
try {
sitemap = readSiteMap();
} catch(err) {
test("site map", ({ page }) => {
throw new Error("missing site map");
});
}
for(let url of sitemap) {
test(`page at ${url}`, async ({ page }) => {
await page.goto(url);
await expect(page).toHaveScreenshot();
});
}
Getting here was a bit of a journey, but weâre pretty much done⊠unless we have to deal with reality, which typically takes a bit more tweaking.
Exceptions
Because visual testing is inherently flaky, we sometimes need to compensate via special casing. Playwright lets us inject custom CSS, which is often the easiest and most effective approach. Tweaking viz.test.jsâŠ
// etc.
import { join } from "node:path";
let OPTIONS = {
stylePath: join(__dirname, "./viz.tweaks.css")
};
// etc.
await expect(page).toHaveScreenshot(OPTIONS);
// etc.
⊠allows us to define exceptions in viz.tweaks.css:
/* suppress state */
main a:visited {
color: var(--color-link);
}
/* suppress randomness */
iframe[src$="/articles/signals-reactivity/demo.html"] {
visibility: hidden;
}
/* suppress flakiness */
body:has(h1 a[href=" {
main tbody > tr:last-child > td:first-child {
font-size: 0;
visibility: hidden;
}
}
:has() strikes again!
Page vs. viewport
At this point, everything seemed hunky-dory to me, until I realized that my tests didnât actually fail after I had changed some styling. Thatâs not good! What I hadnât taken into account is that .toHaveScreenshot only captures the viewport rather than the entire page. We can rectify that by further extending playwright.config.js.
export let WIDTH = 800;
export let HEIGHT = WIDTH;
// etc.
projects: BROWSERS.map(ua => ({
name: ua.toLowerCase().replaceAll(" ", "-"),
use: {
...devices[ua],
viewport: {
width: WIDTH,
height: HEIGHT
}
}
}))
âŠand then by adjusting viz.test.jsâs test-generating loop:
import { WIDTH, HEIGHT } from "./playwright.config.js";
// etc.
for(let url of sitemap) {
test(`page at ${url}`, async ({ page }) => {
checkSnapshot(url, page);
});
}
async function checkSnapshot(url, page) {
// determine page height with default viewport
await page.setViewportSize({
width: WIDTH,
height: HEIGHT
});
await page.goto(url);
await page.waitForLoadState("networkidle");
let height = await page.evaluate(getFullHeight);
// resize viewport for before snapshotting
await page.setViewportSize({
width: WIDTH,
height: Math.ceil(height)
});
await page.waitForLoadState("networkidle");
await expect(page).toHaveScreenshot(OPTIONS);
}
function getFullHeight() {
return document.documentElement.getBoundingClientRect().height;
}
Note that weâve also introduced a waiting condition, holding until thereâs no network traffic for a while in a crude attempt to account for stuff like lazy-loading images.
Be aware that capturing the entire page is more resource-intensive and doesnât always work reliably: You might have to deal with layout shifts or run into timeouts for long or asset-heavy pages. In other words: This risks exacerbating flakiness.
Conclusion
So much for that quick spike. While it took more effort than expected (I believe thatâs called âsoftware developmentâ), this might actually solve my original problem now (not a common feature of software these days). Of course, shaving this yak still leaves me itchy, as I have yet to do the actual work of scratching CSS without breaking anything. Then comes the real challenge: Retrofitting dark mode to an existing website. I just might need more downtime.


















