On Cookies

Over the last few weeks I’ve been exploring various web fundamentals as part of some research for The App. I’ve been building apps for the web that use cookies for over 10 years, but I’d never appreciated just how much they do.

Ah, the humble cookie. A fundamental technology that has helped power the web since the early days. Most of the time they just work. There are only two times when most people ever need to interact with cookies:

  1. to “consent” to the use of cookies, thanks to ham-fisted implementations of GDPR conformance that have now become the accepted norm
  2. to clear all cookies, when something’s gone wrong and you don’t know what else to try

Even as a developer, you can go many years without really understanding what cookies are or how they work. Occasionally you’ll have to run through some security checks as part of deploying a new website or app, but by and large you can ignore the cookies.

Cookies first landed in browsers in 1994. That’s pre-Internet Explorer! As with a lot of web tech in the ’90s, there was no standardisation and implementations varied widely. Eventually things settled down.

HTTP, which is the basic way of sending information for the World Wide Web, is a fundamentally stateless protocol. That makes it tough for a server to know which requests came from which users. The server could take a look at where the request came from, but that’s not reliable because multiple users might be using the same IP address (e.g. in the same house). The client (your web browser) could send your username and password along with every single request, but that’s not great for security or privacy. For starters, what do you do if you don’t even have an account yet?

Cookies solve this problem by letting a server provide an instruction to the client. HTTP allows a client to send a request to a server, and for that server to send a response back to that client. The request and the response have different structures, but both can contain headers and a body. As part of the response, a browser can include a Set-Cookie header. This is one of a handful of well-defined standard headers that most clients can interpret. Other standard headers include Content-Length (how long is the body?) and Content-Type (how should I interpret this response body? Is it HTML? JPEG? JSON?)

Your browser reads that Set-Cookie header, and remembers it for later. Whenever it sends another request to the server it may decide to include that cookie in a Cookie header. Because the cookie was generated by the server, potentially using some encryption algorithm to make it hard to guess or to fake, the server can trust that two requests carrying the same cookie came from the same user session. There are of course more subtleties, and various improvements, but the basic mechanism has remained the same for over 30 years. When you think about how much the Web has changed since 1994, that is seriously impressive.

Here’s a grab bag of various cookie knowledge, a lot of which I only figured out over the last week or two.

HTTP requests and HTTP responses are both considered by the spec to be “HTTP messages”. As such, they have very similar format. In particular, HTTP headers in request and response are separated by new lines and sent in the form name: value. So you would expect that Cookie headers in a request would look similar to Set-Cookie headers in a response, right? Well, that is not the case. An HTTP request typically contains at most one header called Cookie. Indeed, both client-side JavaScript and many web servers treat HTTP headers as if they are a map from name to value, where all the names are unique. An HTTP response will often contain multiple Set-Cookie headers.

Cookies consist of a name, a value and optional attributes. The name and the value represent the “state” held by the cookie, while the attributes give additional details on how the browser should use that cookie. New attributes have been introduced over time, usually to help improve security. Any browser that doesn’t understand the new attribute can ignore it, while a modern browser can adopt those attributes and offer improved functionality. The end user doesn’t have to do anything. Website maintainers barely have to change anything. This is the web platform working at its best.

The name and value of the cookie are arbitrary (fairly short, ASCII-subset) strings. That gives a lot of flexibility. If a server is storing encrypted values, for example, they can adopt a new encryption algorithm without the end user or the browser needing to change in any way. Because the attributes only serve to explain how the client should use the cookie, there’s no point sending attributes along in the request. This means a request’s Cookie header contains only the name-value pairs, cutting down on bytes being sent.

The Domain attribute allows a cookie to say that it needs to be accessible on other domains. By default, this can be left out and a browser will only include the cookie on requests where the domain is an exact match. This is a sensible default: secure and efficient. The most common use case for this is when your authentication/authorization server lives on a different domain to your site. Perhaps you have a whole suite of apps that work together with Single Sign-On. When a user authenticates successfully using auth.megacorp.com, the response can Set-Cookie: userId=fdguhsrdhtge; domain=.megacorp.com. This means that requests to docs.megacorp.com and to mail.megacorp.com will both include Cookie: userId=fdguhsrdhtge.

The SameSite attribute ensures that a cookie is only sent with requests to the site that the user is currently on. For example, when a user is clicking around on http://suspicious.net , and they click a link to https://bank.com/send-money , should the browser include the user’s bank.com cookies when making the request to GET https://bank.com/send-money? Probably not! On the other hand, as a developer of https://social.cool I might decide it is OK for people to be able to click the embedded “heart” button (POST https://social.cool/api/heart) from anywhere on the net, even https://uncool.biz , without having to log into social.cool on every single site. There are three options

  • SameSite=Strict is best for bank.com: the cookie is only included if the user is already on bank.com
  • SameSite=None is best for social.cool: the cookie is included in any request, including cross-site navigation (following a link) and cross-site requests (embedding a widget)
  • SameSite=Lax is a happy medium: the cookie is included if the user is already on the site, or on cross-site navigation. SameSite=Lax is common in practice, but for better security you could set SameSite=Strict and then trigger a reload if you detect that the user may be logged in - possibly by prompting the user.

The Secure attribute says that a cookie should only be sent to https://mywebsite.online, not to http://mywebsite.online. Your sites should be running over HTTPS. You should set all cookies to be Secure.

The HttpOnly attribute lets you specify that client-side JavaScript should not have access to this cookie. It is sometimes useful for some sites to be able to get or set cookies via JavaScript. For example, you might set a cookie that represents whether this user agreed to the terms of service, and does some client-side logic based on the results. But many cookies are used for security purposes, especially the sessionId cookie. If an attacker was able to inject https://evil.com/steal-cookie.js onto the page somehow, then they might be able to grab that sessionId cookie and use it to pretend to be the innocent user. This kind of injection could happen if the user installed a dodgy browser extension, for example. Since developers have limited control over what users do with their web browsers it makes sense to set HttpOnly on cookies unless you very specifically need that cookie to be accessible in JavaScript. If you do need a cookie to be accessible in JavaScript, make sure you are not also using it for anything security related.

Note that HttpOnly has nothing to do with whether or not a cookie is included in a fetch request. fetch respects the same Domain, SameSite and Secure rules as above.

Servers should not blindly trust the contents of a cookie. Users can modify cookies in the browser. Indeed, not all HTTP requests come from a web browser! This means you should not ever send Set-Cookie: userId=1. There is nothing to stop a user from sending back a request with Cookie: userId=2 instead. This would be bad! Instead, consider sending

  1. a cookie encrypted using a symmetric key, which you can later verify and decrypt to check for tampering, or
  2. generate random bytes (from scratch) as the session token, cryptographically hash it on the server, and then store the hash in a database to retrieve later (similar to how you would store a password hash) Option 2 is usually preferred, because it is low effort in many scenarios: you already have a database, and this can make it very easy to retrieve related user information via a foreign key relationship. Option 1 works without a database, but could lead to big problems if your server’s encryption key gets leaked.

MaxAge and Expires attributes let a server specify when the cookie should expire. By default they last the length of the browser session. Again, this is a good balance of security and convenience. A server can tell a client to remove a cookie by setting an expiry that has already past, such as Set-Cookie: userId=empty; Expires=1970-01-01. This is an easy way to implement log out!

There are other, more recent changes to the way cookies work. There is a draft proposal for a naming convention: cookies with names like __Secure-sessionId and __Host-allowPersonalisedAds will only be sent (by a well-meaning client) if the appropriate attributes were set on the cookie when it was originally sent. It’s not entirely clear to me what the benefit is. There’s also an asynchronous globaThis.cookieStore API coming to browsers soon, which will be a more convenient API compared to the synchronous document.cookie that has been working in browsers since the ’90s.

Cookies may be small, but they are mighty. Given how often people managed to screw up authentication by putting JWTs in localStorage, cookies still seem like a pretty solid option. Like many pieces of infrastructure, they’ve unfairly acquired a reputation for being bad because people only ever hear about them when something’s gone wrong. Keep calm and cookie on.