... apart from login cookies. Many sites which don't require you to login ask for permission to store cookies "required for the site to function" and don't let you refuse them. What exactly do these cookies do? Would appreciate answers from folks who are site devs for any such websites.
Putting opinions on cookies aside for a moment, session cookies, cookies that store the language chosen by the user, etc. are examples of what is usually considered "required cookies".
Now, the technical choices that lead to those things may not have been the wisest but the general thinking is that:
- the site is less functional without those cookies so requiring them ensures a common baseline for all users,
- they are used internally for things that are directly related to what the user is here for, so they are in a somewhat different league than all the cross-site targeting cookies everyone is concerned about, so they are less harmful.
But the truth is that saving to and reading from cookies is not an absolute requirement of building websites.
For language, use the Accept-Language header. For login, use basic/digest auth, or SASL (although, as far as I know, HTTP(S) doesn't have SASL, unfortunately). For dark mode, printing, mono/colours, use CSS media queries. For table of contents, ensure the <H1>, <H2>, etc is valid. For timezones and time formats, use the <TIME> command. For data tables, use the <TABLE> command; you can also include links to data in CSV, JSON, etc. For footnotes, use the <FN> command. For some things, you can use CSS classes which are not necessarily referred to in the CSS code that you have; the user can apply their own if they want special styles for some things that you don't have styles for.
In other words, many things will not need cookies/scripts. If you do use the, ensure that they are fully documented, and that the documentation can be viewed without cookies/scripts/CSS enabled. This allows the user to know exactly what is being done, as well as the possibility to enable only the cookies they want, or to manually configure read-only cookies for preferences.
Accept-language is an okish default, but web pages end up needing to provide overrides because users are not always in control of those values, or the user may have preferences that don't fit the scheme. A cookie is a reasonable place to store the override.
Http login is really hard to make user friendly, and has no way to logout in a user friendly way; the user needs to be prompted for a password again and not enter it. An HTML form and cookies just works better.
For languages, yes, overrides could be stored in a cookie, although they could also be stored in the URL (especially for documents intended to be downloaded). It could also support both. The reasons you mention are valid reasons to allow an override. (Still, if cookies are used for this or any other purpose, all cookies and their values should be documented, in case the user wishes to adjust them (or copy them) manually, and/or to understand what they are for.)
Logout (and the duration that the login data is kept) is the responsibility of the user agent. Unfortunately, none that I know of provide the user any control over that.
It was a while ago now but I'm pretty certain that I read some comments here saying using Accept-Language is often a bad idea.
From what I remember, your IT department may muck it up, or you're using a shared computer at say a hostel and someone's set it and you don't know how to change it, etc.
Aside from that, most of your examples need a lot of foresight, work and knowledge to actually pull off.
That is a valid consideration; it can use the URL to override the Accept-Language header (and the document can also include links for other languages, if appropriate). Likewise, the URL can also be used to override the Accept header.
cookies that store the language chosen by the user
Preferences like that should be part of the URL schema. Picking language as a good example, a site can use a path variable (eg the part of the page URL after the domain) to store that information. https://example.com/en/page.html, https://example.com/de/page.html, https://example.com/es/page.html, etc. Alternatively a site could use a subdomain for the same thing. There are many advantages to that approach, such as making it easier to route traffic to a local data center or leverage better edge caching for localized assets. It also means users can deep link straight to a page in their chosen language.
I totally agree with you but OP asked about how things are, not about how things should be.
https://example.com/{en,fr}/page.html is indeed much better—and widely recommended, for that matter—but lots and lots of sites don't do that for one reason or another. If those sites use a cookie to store language choice, then it is IMO quite reasonable for the site operator to consider such a cookie as "required".
The answers to "What is the ‘strictly necessary’ exemption?" and "What activities are likely to meet the ‘strictly necessary’ exemption?" in this guidance FAQ gives some clarification from the UK Information Commissioner's Office:
An example that springs to mind is a CSRF token [1]. One might use a session cookie so that the server can have a CSRF token on any forms. In this way we still require a session to be present even if the user doesn't have to login.
The page you linked to literally says "CSRF tokens should not be transmitted within cookies."
It correctly suggests putting CSRF tokens in either hidden fields or custom request headers. If you're putting the token in to a cookie then you're persisting it for some length of time beyond the existence of the page, in which case you've broken your CSRF token mechanism because they're not supposed to persist across multiple requests. You should generate a new token for every request.
To implement the synchronizer token pattern you usually store the randomly generated CSRF token in the session to validate it on the subsequent request, even if you generate a new one for each form.
You could also handle this stateless without the session using encryption or HMAC, but then you need to manage secret keys and not screw up.
I think parent was referring to the session cookie. The linked article mentions putting the generated token into the server side user session and then to validate it on the next request. You might need a session cookie for that.
Session cookies persist for the length of the session. That's still too long for a CSRF token. You should be generating a new one in every request that needs a token in the response.
Literally none. All a cookie does is crowbar stateful data in to series of requests. You don't need at all that if you build something that's stateless.
There are a lot of superfluous cookies because adding cookies is "best practice."
And frameworks make it easy.
And the added complexity makes programming mundane websites more interesting to the people condemned to doing so...inventing interesting problems makes people feel clever when cleverness is superfluous.
Besides the obvious login/session stuff, you can almost always find security-related cookies for things like CloudFlare, as well as preferences/features toggles.
Now, the technical choices that lead to those things may not have been the wisest but the general thinking is that:
- the site is less functional without those cookies so requiring them ensures a common baseline for all users,
- they are used internally for things that are directly related to what the user is here for, so they are in a somewhat different league than all the cross-site targeting cookies everyone is concerned about, so they are less harmful.
But the truth is that saving to and reading from cookies is not an absolute requirement of building websites.