As soon as I opened the article I pressed Cmd+F and searched for "dataclasses".
This article was correct and addressed a very real need in Python programming—for year 2016. By now it is obsolete and today's standard library module `dataclasses` does all of that and more.
attrs classes still have several features that dataclasses don't, and likely never will, [like validators and converters](https://www.attrs.org/en/stable/why.html#data-classes). So it's not obsolete, particularly for anyone already relying on those features.
dataclasses have methods for iterating fields and inspecting types, so any feature can be added. There is also the benefit of type checking. My grief with dataclasses is how I can't inherit from dataclasses with default fields without making all child class fields also have defaults.
Echoing the other commenter, attrs is generally superior to dataclasses (dataclasses is a feature-limited std library "backport" of attrs). It will be updated less often and support less stuff. The only real reason to use dataclasses is if you want to avoid a third-party dependency, which is sometimes valid but doesn't make the more featureful version "obsolete".
Is there a reason those features weren't added to data classes?
I don't know much about attrs, only professionally coming to python since 3.7, but I'm not going to bring it in if there's something sufficient in the language
Not really, other than they decided to keep data classes very minimal, as standard library comes with maintenance costs (you can't get rid of a feature once you add it), so they kept the strict features, since everyone uses those, and not the complex validation features.
attrs was first, dataclasses design is actually based on attrs! dataclasses is inferior in features and actually a shame because I see this ignorant attitude everywhere and I have to defend and explain attrs every time we need it.
IMO Pydantic is way more ergonomic, has great defaults, and easier to bend to your will when you want to use it a little differently.
Lots of love to Attrs, which is a great library and is a component of a lot of great software. It was my go-to library for years before Pydantic matured, but I think a lot of people have rightly started to move on to Pydantic, particularly with the popularity of FastAPI
I'd say the opposite. Specifically, Pydantic tries to do everything, and as a result (partly b/c they favor base classes over higher order functions), it isn't as composable as attrs is.
I've done some truly amazing things with attrs because of that composability. If I'd wanted the same things with Pydantic, it would have had to be a feature request.
Apples and oranges. pydantic is a serialization and input validation library, attrs is a class-generator library. Completely different features, completely different use cases.
Yes, you can do validation in attrs, but it's not meant to be used the same way as pydantic. For serialization, you need cattrs, which is a completely different package.
Do you have concern with speed or memory footprint of pydantic compared to the rest (attrs, dataclasses etc)? Pydantic seems insistent on parsing/validating the types at runtime (which makes good sense for something like FastAPI).
We always used attrs with the runtime type validators anyway. Getting those types checked in Python was way more valuable to my teams than the minor boilerplate reduction.
If you’re worried about the performance hit of extra crap happening at runtime… dear lord use another programming language.
Dataclasses is just… meh. Pydantic and Attrs just have so many great features, I would never use dataclasses unless someone had a gun to my head to only use the standard library. I don’t know of a single Python project that uses dataclasses where Pydantic or Attrs would do (I’m sure they exist, but I’ve never run across it).
Dataclasses honestly seems very reactionary by the Python devs, since Attrs was getting so popular and used everywhere that it got a little embarrassing for Python that something so obviously needed in the language just wasn’t there.
Those that weren’t using Attrs runtime validators often did something similar to Attrs by abusing NamedTuple with type hints.
There were tons of “why isnt Attrs in the stdlib” comments, which is an annoying type of comment to make, but it happens. So they added dataclasses, but having all the many features that Attrs has isn’t a very standard-library-like approach, so we got… dataclasses. Like “look, it’s what you wanted, right!?”. Well no not really, thanks we’ll just keep using Attrs and then Pydantic
I wouldn't say adding dataclasses was "reactionary". One of the reasons for adding it was to use it in the stdlib itself, which is obviously something we couldn't do with attrs. And because dataclasses skipped ahead to just using type hints to define fields, it has less backward-comparability baggage than attrs has.
As I think I made clear in PEP 557, and every time I discuss this with anyone, dataclasses owes a lot to attrs. I think attrs made some great design decisions, in particular to metaclasses or base classes.
To the runtime-validation point; our team used attrs with runtime validation enforced everywhere (we even wrote our own wrapper to make it always use validation, with no boilerplate) and this ended up being a massive performance hit, to the point where it was showing up close to the top of most profile stats from our application. Ripping all that out made a significant improvement to interactive performance, with zero algorithmic improvements anywhere else. It really is very expensive to do this type of validation, and we weren't even doing "deep" validation (i.e. validating that `list[int]` really did include only `int` objects) which would have been even more expensive.
Python can be used quite successfully in high-performance environments if you are judicious about how you use it; set performance budgets, measure continuously, make sure to have vectorized interfaces, and have a tool on hand, like PyO3, Cython, or mypyc (you should probably NOT be using C these days, even if "rewrite in C" is the way this advice was phrased historically) ready to push very hot loops into something with higher performance when necessary. But if you redundantly validate everything's type on every invocation at runtime, it does eventually become untenable for anything but slow batch jobs if you have any significant volume of data.
The type hint helps mypy and pylint work, while the validator is a runtime check by Attrs
If your attribute name is longer or you have other parameters to set, that can be a very long line (or lines) of code that you repeat for every attribute
Pydantic is dataclasses, except types are validated at runtime? It's nice and looks just like a normal dataclass looking at https://pydantic-docs.helpmanual.io/
For any larger program, pervasive type annotations and "compile" time checking with mypy is a really good idea though, which somewhat lessens the need for runtime checking.
Pydantic types will be checked by mypy or any other static type analysis tool as well.
I don’t expect any type-related thing to be remotely safe in Python without applying at least mypy and pylint, potentially pyright as well, plus, as always with an interpreted language, unit tests for typing issues that would be caught by a compiler in another language
Likely the reason is that you really only need just mypy. It handles the type issues. Pylint is useful, but doesn't really overlap with mypy in terms of what it catches, and there's no need to use pyright or to write type-verifying unit tests, or to do runtime type validation if you have mypy.
Using mypy gives you the type safety equivalent of a compiled language. If you're using mypy, you don't need any additional validation that you wouldn't use in java or c++. I didn't downvote you, but the needless defense in depth is weird.
“Using mypy gives you the type safety if a compiled language” is far from the truth. Each tool has things that they don’t catch, or things that they sometimes false alert on.
Yeah mypy will get you maybe 90% of the way there, but the swiss cheese approach of stacking a few tools, even though there are redundancies, helps plug almost all of the holes.
People can argue about unit tests all day, but there’s very little cost to stacking multiple static analysis tools.
Pydantic is useful if you're dealing with parsing unstructured (or sort of weakly untrusted) data. If you just want "things that feel like structs", dataclassess or attrs are going to be just as easy and more performant (and due to using decorators and not metaclasses, more capable of playing nicely with other things).
I used attrs in a large python project, and it was more pain than it was worth. Off the top of my head, knowing the difference between factory vs default in the initializer was a bug that bit us, inheritance was painful because we were forced to redefine the attrs from the parent class in each child class (maybe they fixed this now). Validators were broken somehow. I even made a GitHub issue for this which was never addressed. Attrs are good for simple stuff. I feel plain old classes are more reliable once things get complex. And for simple things, dataclasses do just fine too.
I've never understood the appeal of these "define struct-like-object" libraries (in any language; I've never understood using the standard library's "Struct" in Ruby). My preferred solution for addressing complexity is also to decompose the codebase into small, understandable, single-purpose objects, but so few of them end up being simple value objects like Point3D. Total ordering and value equality make sense for value objects but not much else, so it really doesn't improve understandability or maintenance that much. And concerns like validation I would never want to put in a library for object construction. In web forms where there are a limited subset of rules I always want to treat the same way, sure, but objects have much more complicated relationships with their dependencies that I don't see much value in validating them with libraries.
Overall, I really don't see the appeal. It makes the already simple cases simpler (was that Point3D implementation really that bad?) and does nothing for the more complicated cases which make up the majority of object relationships.
Ignore all of the validation aspects. In python, you have tuples, (x, y, z), then you have namedtuples and then attrs/dataclasses/pydantic-style shorthand classes.
These are useful even if only due to the "I can take the three related pieces of information I have and stick them next to each other". That is, if I have some object I'm modelling and it has more than a single attribute (a user with a name and age, or an event with a timestamp and message and optional error code), I have a nice way to model them.
Then, the important thing is that these are still classes, so you can start with
@dataclass
class User:
name: str
age: int
and have that evolve over time to
@dataclass
class User:
name: str
age: int
...
permissions: PermissionSet
@property
def location():
# send off an rpc, or query the database for some complex thing.
and since it's still just a class, it'll still work. It absolutely makes modelling the more complex cases easier too.
Note that that "location" property should be a method instead of property to signal that it does something potentially complex and slow. Making it a property practically guarantees that someone will use it in a loop without much second thought, and that's how you get N+1.
Fair point! one of various @cached_property decorators might fix this, depending on the precise use case, but yeah this is an important consideration when defining your API.
well one appeal is that you dont have to write constructors, that‘s already enough of a win for me. then you get sane eq, and sane str, and already you remove 90% boilerplate
I really, genuinely don't get the appeal. I don't follow the "less code = better" ideology so maybe that's a contributor but I really don't see how this:
class Person:
def __init__(self, name, age):
self.name = name
self.age = age
is any worse than this:
@dataclass
class Person:
name: str
age: int
I'm not writing an eq method or a repr method in most cases, so it just doesn't add much for the cost.
The point is that for data-bag style classes, you end up writing a lot more boilerplate than that if you use them across a project. Validators (type or content), nullable vs not, read-only, etc.
The minimal trivial case doesn’t look much different, but if you stacked up 10 data classes with read-only fields vs. bare class implementations with private members plus properties to implement read-only, and you would start to see a bigger lift from attrs, as there would be a bunch of boring duplicated logic.
(Or not - if your usecases are all trivial then of course don’t use the library for more complex usecases. But hopefully you can see why this gets complex in some codebases, and why some would reach for a framework.)
The advantage of dataclasses is that they’re hard to mess up. They define all the methods you need to have an ergonomic idiomatic class that is essentially a tuple with some methods attached and have enough knobs to encompass basically all “normal” uses of classes.
It’s a pretty good abstraction that doesn’t feel half as magic as it is.
Given that code is for people, I've never found a certain amount of idiomatic boilerplate a problem. The desire to remove it all, or magicify it away (eg: Django) has always made me do a bit of an internal eye roll.
To start with, the non-`@dataclass` version here doesn't tell you what types `name` and `age` are (interesting that it's an int, I would have guessed float!). So right off the bat, not only have you had to type every name 3 times, you've also provided me with less information.
> I'm not writing an eq method or a repr method in most cases, so it just doesn't add much for the cost.
That's part of the appeal. With vanilla classes, `__repr__`, `__eq__`, `__hash__` et. al. are each an independent, complex choice that you have to intentionally make every time. It's a lot of cognitive overhead. If you ignore it, the class might be fit for purpose for your immediate needs, but later when debugging, inspecting logs, etc, you will frequently have to incrementally add these features to your data structures, often in a haphazard way. Quick, what are the invariants you have to verify to ensure that your `__eq__`, `__ne__`, `__gt__`, `__le__`, `__lt__`, `__ge__` and `__hash__` methods are compatible with each other? How do you verify that an object is correctly usable as a hash key? The testing burden for all of this stuff is massive if you want to do it correctly, so most libraries that try to eventually add all these methods after the fact for easier debugging and REPL usage usually end up screwing it up in a few places and having a nasty backwards compatibility mess to clean up.
With `attrs`, not only do you get this stuff "for free" in a convenient way, you also get it implemented in a way which is very consistent, which is correct by default, and which also provides an API that allows you to do things like enumerate fields on your value types, serialize them in ways that are much more reliable and predictable than e.g. Pickle, emit schemas for interoperation with other programming languages, automatically provide documentation, provide type hints for IDEs, etc.
Fundamentally attrs is far less code for far more correct and useful behavior.
I understand repr for debugging (though imo it's a deficiency of the language that custom objects don't have a repr which lists their attributes), but eq is a property of the domain itself; two objects are only equal if it makes sense in the domain logic for them to be equal, and in many cases that equality is more or less complicated than attribute equality.
> though imo it's a deficiency of the language that custom objects don't have a repr which lists their attributes
It makes perfect sense that attributes be implementation details by default, and `@dataclass` is one of the ways to say they're not.
> eq is a property of the domain itself; two objects are only equal if it makes sense in the domain logic for them to be equal, and in many cases that equality is more or less complicated than attribute equality.
dataclass is intended for data holders, for which structural equality is an excellent default,
If you need a more bespoke business objects, then you probably should not use a dataclass.
I was merely noting that dataclasses are mostly intended for data holder objects (hence data classes), and thus defaulting to structural equality makes perfect sense, even ignoring it being overridable or disableable.
This was in reply to this objection:
> eq is a property of the domain itself; two objects are only equal if it makes sense in the domain logic for them to be equal, and in many cases that equality is more or less complicated than attribute equality.
For anyone interested in differences between attrs and pydantic, how to use dataclasses, etc. I can't recommend the mCoding channel enough, this video : https://www.youtube.com/watch?v=vCLetdhswMg goes into the different libraries and there are other videos going more in depth on how to use them.
Came here to say this, dataclasses have been super helpful for a big part of the pain point highlighted by the author. More often than not, that is enough for me.
Yes. There needs to be a very good reason for me to pull in a third party library (in this day and age, given supply chain attacks, etc.). I don’t see what Attrs gives me that dataclasses does not.
Huh, it's the other way around for me. Just about every nontrivial Python project I write requires at least one third-party library, and one much more complicated than attrs at that, that I may as well use attrs too. All the complexity of having dependencies - both distributing them / setting up a virtualenv / etc., and whatever scrutiny I wish to do on dependencies - have a pretty big constant term from doing them at all; doing them for one more dependency (and one that doesn't change all that often) is only a bit more work.
Granted, I'm not reviewing my third-party dependencies line by line when I upgrade them. But also I'm more afraid of the security risks of large amounts of in-house code that aren't exposed to public scrutiny, and so a policy that dissuaded the use of even high-quality and well-regarded third-party dependencies seems like it would do more harm than good.
Besides that, it helps that I happen to have met the maintainer of attrs at PyCon (and attrs has only one uploader in PyPI), and therefore I'm less concerned about supply-chain attacks against it, whether of the malicious-maintainer variety or the maintaner-got-scammed-or-hacked variety, than, again, most of my other dependencies whose maintainers I've never heard of. I'm not sure this scales particularly well, but I do feel like there's still something in the open source community being a community.
Not that attrs or dataclasses has particularly significant attack surface, but when considering stdlib vs. 3rd-party you also have to consider the amount of maintenance and the release cadence. Attrs can release every few months if the rate of change demands it, whereas the stdlib has a fixed yearly release schedule that is tied to interpreter versions. Attrs has a small, focused development team whereas the stdlib is maintained by developers who are stretched very thin, and many packages within it are effectively abandoned. Upgrading dataclasses means upgrading everything in the stdlib at the same time, whereas attrs can be upgraded independently, by itself.
Supply chain attacks are a complex and nuanced topic so there are plenty of reasons to be thoughtful about adopting new dependencies, but it's definitely not as simple as "just use the stdlib for everything".
Python's STL is such that using it is a code smell. It's better to just use the right tool for the job: you are almost guaranteed to need at least one external library/module for any project of even moderate complexity. So bite the bullet, invest in the time/tooling to do packaging correctly, and use the very excellent Python ecosystem (isn't it why you are using Python to begin with?) that you have at your disposal.
Sticking with the "rusty, leaking batteries included!" in the STL is a bad call and I don't believe it is safe, either; most of the STL is abandonware that is just being shipped for backward compatibility sake. Don't make future product decisions, design decisions etc. based on Python teams' deprecation requirements!
I've been writing Python a long time and have grown quite frustrated by some of its warts. But every time I look at seriously investing in another language attrs is one of the few things I wouldn't want to give up. It's not perfect but I'll take very, very good when I can get it, yeah?
To clarify, both named tuples and dataclasses can be immutable (the former are always immutable and the latter can be made immutable with `frozen=True`).
There is no way, however, to make a named tuple mutable.
Namedtuples also behave like tuples, which is great when you want to incrementally turn tuples into classes but if you want an easy way of creating classes, it's probably not a good idea to have them behave like tuples. Plus dataclasses have more features.
From a users perspective data classes look kind of like a C struct and in particular include type annotations so fit well with type checkers. They also allow for default values and give more control over generating equality, hash, string and initialisation methods.
Comparatively named tuples are an older language feature which essentially allow you to define named accessors for tuple elements. IIRC, these days you can also define type annotations for them.
Their use case essentially overlap. Personally I much prefer data classes.
Costs: Depending on a relatively unknown library. Using arcane class decorators and unusual syntactic constructs: @attr.s and x = attr.ib() (a pun?).
Benefits: Saving at best 10-15 lines of boilerplate per data class. Much less if namedtuple works for you.
If you want to save lines in __init__ you can write "for k, v in locals().items(): setattr(self, k, v)". But you shouldn't.
Edit: Forgot to add to the most important cost: Magic. You don't need to know a lot of Python to understand how the standard self.x = x initialization works. However, you do need to understand a lot of Python internals to grok x = attr.ib().
there are newer (since 2020) syntactic constructs that might be more to your liking. take a look at the docs again.
Incidentally, I'd recommend against Named Tuples for non-trivial software. Because they can be indexed by integer and unpacked like tuples, additions of new fields are backwards-incompatible with existing code.
I don't like newfangled syntactic constructs since they hide what is going on. :) The example in the article was a Point3D class and for that namedtuple is a good choice (points should be immutable). It's unlikely that you'd want to add fields without also making other backwards-incompatible changes.
This is only remotely relevant but I recently learned that the related `dataclasses` is implemented by constructing _string representations_ of functions and calling `exec` on them.
You should look at the older implementation. The whole class itself was made by interpolating a huge string and the exec-ing it but was changed with the ability to dynamically generate classes with type(). Nothing other than motivation and a good proposal is standing in the way of a def() that operates similarly.
I'm writing python code that is not OOP at all, mostly functional (list and dict comprehension, pure function) and use only list, dict, set and combination of them. Am I alone ?
I know what attrs is, thanks. “Everyone should use X library” is always bad advice, and it’s especially bad when there is language native functionality that covers common use cases for the library. Python has a dependency problem, maybe we shouldn’t make it worse with libraries most people don’t need.
I think the author is overselling this library. A lot of the problems they mention can be avoided with a consistent application of discipline.
You can decompose classes that become too big for their own good. You can design your software, layer abstractions intelligently etc. so that having to do such refactoring isn't a big issue.
Python is a language that demands an above average level of discipline compared to many other programming languages I have used, but only because it IMO leans strongly towards empowering the developer instead of restricting them.
Agreed. There was some drama with twisted years ago involving python 2-3 interoperability iirc. The title reads like clickbait and having to hook me on an article this way only serves to repulse me physically.
...or one could just use Python without classes, just functions. TBH I never quite understood why Python has the class keyword, it's a much better language without.
I thought like you for several years. Then one day, I needed a custom type. You can use dicts (or lists, or namedtuples, I guess?), but it just ends up being cleaner and more idiomatic to define a class for the type, because you can define common methods for them.
The article mentions quaternions. If you make a quaternion type (class), you can define addition, multiplication, comparison, etc. for it (methods). If you represent a quaternion any other way, you can't say a * b. Or maybe you can, but I don't know how.
You can represent it as whatever and monkey patch it's __mul__. It's a bit complicated with standard library structures that may be implemented in C, though.
attrs really isnt about OO-style classes - it's specifically meant to provide struct-like declarative data containers, and these can help bridge the gap between the toolset that Python provides and functional (data-first) programming styles.
This is an obvious troll post but python is a pure OO language, everything is a an object with an associated class. Functions are just instances of FunctionType.
Using @dataclass the example from OP would look like:
[1]: https://docs.python.org/3/library/dataclasses.html[2]: https://www.python.org/dev/peps/pep-0557/