Serializable transactions with Django and PostgreSQL

October 11, 2024 / Swen Kooij

PostgreSQL's transaction isolation level SERIALIZABLE can provide much stronger guarentees than the default READ COMITTED level. We explore an example of where a SERIALIZABLE transaction can be used to solve a real-world problem and how we can integrate this with the Django ORM.

10 Likes

categories / Tech
tags / postgres, django, transactions, serializable

Catch SEO Regressions Before Googlebot Does It

February 16, 2022 / Cristi Ingineru

The Why

What do you do when you care about the functionality of your products? You write, of course, unit and E2E tests. What about performance? You write benchmarks. Security? You perform security audits. So, what about SEO? You write a library for SEO tests!

The Available Tools

While there are plenty of libraries and tools available, they are mostly focusing on general analyses, and there’s very little on using them in a CI environment. Lighthouse, for example, is mostly known as a performance audit tool, but it can run basic SEO checks too. However, even if used in a CI environment and much like the rest of the available tools, it lacks the ability to prevent specific regression that might be detrimental for a website’s SEO.

How We Got Here

We are maintaining dozens of sites and we are launching new ones periodically, so ensuring that we reduce SEO regressions as much as possible is no easy task. Using a mono repo proved to be a great decision because it enabled code and feature reuse and limited the amount of testing required during a release cycle - basically, write once, test once, and deploy multiple times. But this doesn’t exclude the risk of fixing a bug for a site servicing one country and breaking something in a site servicing another country, and SEO bugs are typically hard to notice because they are neither functional nor visual, as such, a specific tool was needed: a tool that checks each site against regressions.

Initially, Lighthouse was added to the CI and used to detect major degradations and mostly for performance. Then the first version of seo-slip was introduced to prevent embarrassing 404s or unexpected status codes for high traffic pages. These kinds of errors were easy to find by users as well, not only by Googlebot. Eventually seo-slip evolved and allowed us to catch other site specific regressions like incorrect URLs, canonical inconsistencies, redundant redirects, broken internal links or even broken CDN configurations.

Details

Seo-slip was built with flexibility in mind. It can be used in any CI environment and with any preferred JS unit test framework. The built-in checker list is also flexible: it’s up to the test writer to select what checkers are needed for a site under test, he can even write new checkers by implementing a fairly simple interface.

But the real power of this library comes from the checkers-rules separation. Each checker is a piece of JS code focused on a very specific task, for example there is a checker that verifies the status code, another checker for canonical URL, another one for hreflang URLs, etc. And each of these checkers is blind without the rules telling them what status code to expect, what kind of canonical URL to expect or what kind of hreflang URLs to expect.

statusCodeRules:
  code: 200
  exceptions:
    "/": 301
    "/blog": 301
    "/register": 302
canonicalRules:
  - url: "(.*.site.com)/(..)/search/(.+)(\?.+)?"
    expected: "($1)/($2)/search/($3)"
  - url: "(.*.site.com)/(..)/product-(\d+)(\?.+)?"
    expected: "($1)/($2)/product-($3)"
hreflangRules:
  - url: "(.*.site.com)/(..)/search/(.+)(\?.+)?"
    expected:
      en: "($1)/en/search/($3)($4)"
      ro: "($1)/ro/search/($3)($4)"
  - url: "(.*.site.com)/(..)/product-(\d+)(\?.+)?"
    expected:
      en: "($1)/en/product-($3)($4)"
      ro: "($1)/ro/product-($3)($4)"

As illustrated, the rules can be written in a human-friendly way using JSON or YAML. The latter is better because it has anchors and aliases that can be leveraged to reuse some of the rules across multiple sites and environments or even running the tests against multiple user agents covering both desktop and mobile implementations.

Seo-slip is using a simple crawler to find and download content and each HTML, CSS, image, etc. that was found is validated against the checkers. The best way is to start the crawl from a high traffic page and go to 2 or 3 levels deep, basically downloading enough content only to sample the site. Checking a minimal, but relevant, area of the site will make the tests run fast, which means they can be included in a CI or used as a monitoring tool. An exhaustive analysis is also possible, but the tool was not envisioned to be used in this manner. Eventually, it's up to the test writer and/or SEO specialist to decide how to use it.

There are sites with a complex URL structure making it almost impossible to describe the expectations using rules as illustrated above. In this case, the checkers can be used to only pull the SEO data without asserting anything, store it in a CSV as a "SEO snapshot" that can further be used as a reference for snapshot testing.

Future

Seo-slip is not meant to become popular. Most of the time Lighthouse or other general purpose equivalent or even occasional in-depth crawls are enough. However, for catching specific regressions or catching them early in a development or staging environment seo-slip might be a good candidate.

10 Likes

Test driving Cloudflare Workers with a GitHub public repository monitor

March 12, 2021 / Ovidiu Sabou

In our previous article, Stupid simple serverless with usage-only pricing, we described how we ended up trying out Cloudflare Workers for a GitHub automation that notifies us whenever a GitHub repo is made public. In this piece, we detail the experience we had with the platform.

Code of the experiment and setup

The tool, called github-hawk, is published on GitHub and it’s surprisingly simple (and useful!). Setting up the project was trivial with the Wrangler CLI that Cloudflare provides.

Running the tool involves setting up an account with Cloudflare, setting the variables, called “secrets” and publishing the worker. It was smooth without surprises.

What exactly are Cloudflare Workers?

Cloudflare Workers are a variation of Web Workers. This variation is meant to be used server-side, distributed across the fleet of edge nodes belonging to Cloudflare, which is one of the major CDNs.

This means code gets executed as part of a V8 JavaScript engine instance, which in turns means they run JavaScript or WebAssembly code. Besides the usually expected features like a JS runtime and network access, workers also support a global key/value store called … Cloudflare Workers KV.

Unlike conventional containerization methods, isolates have much lower overhead per user process. By no means is this specific to Cloudflare. For a more in-depth overview of workload isolation methods, see this article from fly.io. In the context of… — Unlike conventional containerization methods, isolates have much lower overhead per user process. By no means is this specific to Cloudflare. For a more in-depth overview of workload isolation methods, see this article from fly.io. In the context of that article, Cloudflare Workers isolate workloads through a language runtime

Analysis

The simplicity of the implementation proves that the approach is workable. Being a simple forwarding proxy, there’s no concern regarding costs or performance. Cloudflare’s documentation is quite good for getting started and their CLI tool, wrangler, is also easy to use.

However, there are some drawbacks that make it less than ideal as a general computing platform. The debugging experience leaves a lot to be desired while developing locally. This stems from the fact that all development code actually runs in the cloud, which means logs have to be remotely captured. It’s not a big issue with published workers, but with development workers it is, because by default, exceptions are swallowed and the only way to get them is to catch and log them yourself or, alternatively, hook up a centralized service for this purpose.

Being derived from a client language runtime (V8, which is Chrome’s JavaScript engine), it comes with a design baggage biased towards the client-side. For example, there is a crypto API, but it’s based on the Web Crypto API, which lacks features that are useful on the back-end. Take the crypto package from Node.js, which has a timing-safe comparison function, which is missing from Cloudflare Workers. This would’ve been useful in the isSignatureValid() function.

There are some use cases that still make the platform very attractive. The main idea is that with these custom workers, the level of flexibility in the behaviour of the CDN is increased dramatically. Unlike a conventional configuration-based approach to CDN rules, workers enable websites to granularly and intelligently define CDN behaviour, including caching policies and filtering logic. With some extra investment, it can become a promising approach not only for deploying “intellligent” CDN behaviour but also for general computing “on the edge”.

13 Likes

Stupid simple serverless with usage-only pricing

March 09, 2021 / Ovidiu Sabou

While discovering a small repo that we made public by mistake, we wondered how could we prevent this or at least react sooner. We immediately devised a solution based on a polling agent that would repetitively query the GitHub API for our organization’s public repos and track the changes in time, hooked up with some notification system (Slack channel in our case). Later on we chose to handle the specially designed GitHub hooks for this, but for the sake of the exercise, it’s still interesting to think about the polling agent with history persistence approach.

Because this is the kind of program that runs from time to time and because a tech company gets to build numerous such automations, it makes sense to deal with it systematically. The distilled list of requirements below:

Minimal setup costs - don’t pay per application, don’t pay per developer, don’t waste time configuring much, if anything at all; if the transactional cost of starting up such an app is low, we’ll get the chance to automate more
Minimal running costs - no dedicated computational resources, so that means no recurrent costs per unit of time, which basically means it should be usage-based
General purpose - it should be in a conventional tech stack and it should be capable of supporting a diverse set of needs (computation, persistence, networking)

Looking at our exiting tech stack, there are a few contenders:

AWS Lambda - excluded because we don’t consider AWS to be a developer-friendly platform which means setup costs aren’t minimal; we use it extensively, but shield our application developers from it
Heroku - excluded because of running costs; for every automation we develop in isolation, we’d need one web dyno running, regardless of throughput
A Linux VM, e.g. DigitalOcean droplet or EC2 machine - excluded because the setup is very involved - secret isolation, multi tenancy of apps and the low-level nature of the approach make it unattractive

Although there are ways to work around these issues, as a great Python programmer once said, there must be a better way!

Serverless platforms to the rescue

Some of the more popular serverless computing platforms we took into account

One interesting approach that piqued our interest was Serverless.com. Because they rely on a cost / developer pricing model and we have hundreds of developers across the organization, it's an option that’s hard to digest, however, Cloudflare, one of our service providers, already has a solution in that space, with the added benefit of running the code close to the users with a very reasonable usage-based pricing. So we gave it a try. They are by no means the only ones that do it, the landscape being filled with solutions, like Firebase Cloud Functions, however, Firebase’s pricing model is not as simple and, apparently, not as low.

In part 2, we describe test driving Cloudflare Workers with our implementation of the GitHub automation.