I've never understood the appeal of these "define struct-like-object" libraries (in any language; I've never understood using the standard library's "Struct" in Ruby). My preferred solution for addressing complexity is also to decompose the codebase into small, understandable, single-purpose objects, but so few of them end up being simple value objects like Point3D. Total ordering and value equality make sense for value objects but not much else, so it really doesn't improve understandability or maintenance that much. And concerns like validation I would never want to put in a library for object construction. In web forms where there are a limited subset of rules I always want to treat the same way, sure, but objects have much more complicated relationships with their dependencies that I don't see much value in validating them with libraries.
Overall, I really don't see the appeal. It makes the already simple cases simpler (was that Point3D implementation really that bad?) and does nothing for the more complicated cases which make up the majority of object relationships.
Ignore all of the validation aspects. In python, you have tuples, (x, y, z), then you have namedtuples and then attrs/dataclasses/pydantic-style shorthand classes.
These are useful even if only due to the "I can take the three related pieces of information I have and stick them next to each other". That is, if I have some object I'm modelling and it has more than a single attribute (a user with a name and age, or an event with a timestamp and message and optional error code), I have a nice way to model them.
Then, the important thing is that these are still classes, so you can start with
@dataclass
class User:
name: str
age: int
and have that evolve over time to
@dataclass
class User:
name: str
age: int
...
permissions: PermissionSet
@property
def location():
# send off an rpc, or query the database for some complex thing.
and since it's still just a class, it'll still work. It absolutely makes modelling the more complex cases easier too.
Note that that "location" property should be a method instead of property to signal that it does something potentially complex and slow. Making it a property practically guarantees that someone will use it in a loop without much second thought, and that's how you get N+1.
Fair point! one of various @cached_property decorators might fix this, depending on the precise use case, but yeah this is an important consideration when defining your API.
well one appeal is that you dont have to write constructors, that‘s already enough of a win for me. then you get sane eq, and sane str, and already you remove 90% boilerplate
I really, genuinely don't get the appeal. I don't follow the "less code = better" ideology so maybe that's a contributor but I really don't see how this:
class Person:
def __init__(self, name, age):
self.name = name
self.age = age
is any worse than this:
@dataclass
class Person:
name: str
age: int
I'm not writing an eq method or a repr method in most cases, so it just doesn't add much for the cost.
The point is that for data-bag style classes, you end up writing a lot more boilerplate than that if you use them across a project. Validators (type or content), nullable vs not, read-only, etc.
The minimal trivial case doesn’t look much different, but if you stacked up 10 data classes with read-only fields vs. bare class implementations with private members plus properties to implement read-only, and you would start to see a bigger lift from attrs, as there would be a bunch of boring duplicated logic.
(Or not - if your usecases are all trivial then of course don’t use the library for more complex usecases. But hopefully you can see why this gets complex in some codebases, and why some would reach for a framework.)
The advantage of dataclasses is that they’re hard to mess up. They define all the methods you need to have an ergonomic idiomatic class that is essentially a tuple with some methods attached and have enough knobs to encompass basically all “normal” uses of classes.
It’s a pretty good abstraction that doesn’t feel half as magic as it is.
Given that code is for people, I've never found a certain amount of idiomatic boilerplate a problem. The desire to remove it all, or magicify it away (eg: Django) has always made me do a bit of an internal eye roll.
To start with, the non-`@dataclass` version here doesn't tell you what types `name` and `age` are (interesting that it's an int, I would have guessed float!). So right off the bat, not only have you had to type every name 3 times, you've also provided me with less information.
> I'm not writing an eq method or a repr method in most cases, so it just doesn't add much for the cost.
That's part of the appeal. With vanilla classes, `__repr__`, `__eq__`, `__hash__` et. al. are each an independent, complex choice that you have to intentionally make every time. It's a lot of cognitive overhead. If you ignore it, the class might be fit for purpose for your immediate needs, but later when debugging, inspecting logs, etc, you will frequently have to incrementally add these features to your data structures, often in a haphazard way. Quick, what are the invariants you have to verify to ensure that your `__eq__`, `__ne__`, `__gt__`, `__le__`, `__lt__`, `__ge__` and `__hash__` methods are compatible with each other? How do you verify that an object is correctly usable as a hash key? The testing burden for all of this stuff is massive if you want to do it correctly, so most libraries that try to eventually add all these methods after the fact for easier debugging and REPL usage usually end up screwing it up in a few places and having a nasty backwards compatibility mess to clean up.
With `attrs`, not only do you get this stuff "for free" in a convenient way, you also get it implemented in a way which is very consistent, which is correct by default, and which also provides an API that allows you to do things like enumerate fields on your value types, serialize them in ways that are much more reliable and predictable than e.g. Pickle, emit schemas for interoperation with other programming languages, automatically provide documentation, provide type hints for IDEs, etc.
Fundamentally attrs is far less code for far more correct and useful behavior.
I understand repr for debugging (though imo it's a deficiency of the language that custom objects don't have a repr which lists their attributes), but eq is a property of the domain itself; two objects are only equal if it makes sense in the domain logic for them to be equal, and in many cases that equality is more or less complicated than attribute equality.
> though imo it's a deficiency of the language that custom objects don't have a repr which lists their attributes
It makes perfect sense that attributes be implementation details by default, and `@dataclass` is one of the ways to say they're not.
> eq is a property of the domain itself; two objects are only equal if it makes sense in the domain logic for them to be equal, and in many cases that equality is more or less complicated than attribute equality.
dataclass is intended for data holders, for which structural equality is an excellent default,
If you need a more bespoke business objects, then you probably should not use a dataclass.
I was merely noting that dataclasses are mostly intended for data holder objects (hence data classes), and thus defaulting to structural equality makes perfect sense, even ignoring it being overridable or disableable.
This was in reply to this objection:
> eq is a property of the domain itself; two objects are only equal if it makes sense in the domain logic for them to be equal, and in many cases that equality is more or less complicated than attribute equality.
Overall, I really don't see the appeal. It makes the already simple cases simpler (was that Point3D implementation really that bad?) and does nothing for the more complicated cases which make up the majority of object relationships.