Hacker News new | past | comments | ask | show | jobs | submit login
Building a higher-level query API: the right way to use Django's ORM (dabapps.com)
103 points by j4mie on May 10, 2012 | hide | past | favorite | 29 comments



The main point raised by the article is spot-on, and I'm ashamed to say that I had never recognised it as an issue before reading it. It applies even more strongly for more complex lookups (possibly involving Q objects), which I've always felt would find a better home in models.py than in views.py. And I too cringe every time I come across the django.db.models.manager source code.

Some thoughts:

The approach goes slightly against the commandment of "there should just be one way of doing it". It's probably best to apply it sparingly, only where there are very clear readability and/or DRYness improvements to be had.

The `PassThroughManager.for_queryset_class(TodoQuerySet)()` bit is a bit intimidating, especially to someone not familiar with django-model-utils. I'd probably take the time to write a `manager_with_queryset(queryset, manager=models.Manager)` function to make things more readable.

When you're trying to convince someone that the way they currently code isn't optimal, it really helps if the code you use to illustrate the current way looks like something your reader is likely to write. The strangely formatted filter chain at the start is thus really off-putting. Instead of adding three disclaimers asking the reader not to focus on the implementation details, why not just write it the way most people would:

    todos = Todo.objects.filter(
        owner=request.user,
        is_done=False,
        priority=1,
    )
No need to make the current way of doing things seem unnecessarily convoluted, the point you're making still stands. Though I'll admit that when you're forced to throw an .exclude() into the chain, it starts looking a lot like your example.


I've been using django-model-utils for a while and really love it.

I contributed the patch for PassThroughManager that adds the for_queryset_class.

I agree that it's not pretty, but there's a bit of history that made it that way.

There are lot of custom QuerySet snippets floating around. One was `manager_from` by George Sakkis which Carl included in django-model-utils in July 2010. It was great except that the QuerySets it returned couldn't be pickled. It is currently pending deprecation.

It was replaced by Paul McLanahan's PassThroughManager.

You used it like

    objects = PassThroughManager(MyQuerySet)
That looks great except when related managers are instantiated, they aren't passed MyQuerySet (I haven't looked at the code in while and you have to dig around, but check out https://github.com/django/django/blob/master/django/db/model...).

You can still use it the old way but if you had an `alive` method on your QuerySet, you couldn't do:

    home.occupant_set.alive()


Agreed. I found the contrived chained filters at the beginning of the article off putting but the rest of the article was quite interesting.


I completely agree, but I think this was author's way of demonstrating how ugly a call to the ORM can get. With his very simple to-do list app I think this was the easiest way for him to do that.

In real life you would obviously have all your filter in one call (most of the time), then perhaps .values call with an .annotation call in there too.


I can't agree more. Always wrap your ORM with your own logic that makes sense to you and your application. You'll find code reuse will go way up, readability will go way up, testing is easier. You can actually test the model without bootstrapping the whole app. I've been preaching this in my SqlAlchemy talks and tutorials for years.


For our project I've been thinking about this a lot lately. Could you maybe elaborate on how you did this with SqlAlchemy and how you write your tests against the new model? Do you now of any more in-depth articles on the subject?

This really seems to hit a sweet spot of moving business logic into the model and I would love to use this in some form.


Wait, what's wrong with:

  class Todo(models.Model):
    content = models.CharField(max_length=100)
    # other fields go here..

    @classmethod
    def incomplete(cls):
        return cls.objects.filter(is_done=False)

    @classmethod
    def high_priority(cls):
        return cls.objects.filter(priority=1)


There's a couple reasons that's not ideal.

The big one is that you'd lose filter chaining. With your example you couldn't do

    Todo.objects.high_priority().incomplete()
The other issue is a semantic one. In your example you could do:

    Todo.objects.all()[0].incomplete()
which will return a QuerySet of all incomplete Todo items. This, at least to me, doesn't make sense.

The last reason is that by using a Manager you are encapsulating this filter data. If you later decide that you want to create a new model with similar types of filters, then you'd have to rewrite these methods. With a Manager, both models can simply use the same manager.


No, I believe you've read the code above incorrectly. This would be, for example:

  Todo.high_priority().all()
and would allow, for example:

  Todo.high_priority().filter(id__gte=1)
I haven't tested chaining these, but this might work:

  Todo.high_priority().incomplete().filter(id__gte=1)


Sorry, my example was explaining how it would be done if you were using Managers. Using your example:

    Todo.high_priority().incomplete()
would fail because the QuerySet doesn't have an incomplete() method.


Your last example wouldn't work. Todo.high_priority() returns a plain QuerySet, which won't have your "incomplete" method (as that's defined on the Model class in your example).


For simple approaches you can also bitwise through Q objects:

  >>> from todo.models import Todo
  >>> 
  >>> Todo.objects.incomplete()
  [<Todo: 2>, <Todo: 3>, <Todo: 4>]
  >>> 
  >>> Todo.objects.high_priority()
  [<Todo: 1>, <Todo: 2>]
  >>> 
  >>> Todo.objects.incomplete() & Todo.objects.high_priority()
  [<Todo: 2>]


> Personally, I'm not completely convinced by the decorator-based idea. It obscures the details slightly, and feels a little "hacky".

The purpose of the decorator is, indeed, to obscure the implementation details in favour of more semantic code. But then again we're using an ORM which makes heavy use of metaprogramming to obscure the details of the database layer from us; I don't see how this is a bad thing.


Yep, and my criticism of your suggested approach wasn't intended to be particularly strong by any means. I could definitely be sold on the idea. It just felt like a workaround to a problem that could probably be solved in a nicer way.

I think my main objection is that these query methods should conceptually be on the QuerySet, and so defining them on the Manager (the "wrong place") and magically copying them to the QuerySet (the "right place") feels somehow worse than the opposite.

I appreciate that you raised the discussion on the mailing list, as it highlights the fact that this is a common problem in big Django codebases. Even pulling something like PassThroughManager into core might work (perhaps with a nicer "manager_with_queryset" API, as suggested by Aramgutang).


Why not just pass a QuerySet implementation into models.Manager as an optional argument? This would be much cleaner, I think:

   class QuerySetSubclass(QuerySet):

      def new_method(self):
         pass

   class Model(models.Model):

      objects = models.Manager(QuerySetSubclass)


This is the approach I'd like to see; I'll be proposing it for addition in Django 1.5.


But my problem is that most people wouldn't even think of subclassing QuerySet.

When we write methods that operate on collections of things, we typically use @classmethod. Without @classmethod, we'd have to write a custom metaclass (and instruct our class to use that) if we wanted even a single class method on a class. Multiple inheritance would break (or at least be difficult to reason about) when classes defined class methods, because there would have to be both an instance method resolution order and a class MRO. Fortunately Python's built-in `type` provides the descriptor protocol, which allows us to have class methods and instance methods and properties and all these other nice things without having to metaprogram or hack the interpreter.

All I'm asking for is a similar (if less ornate) interface for Django models, wherein the methods that operate on collections of things can be defined alongside the methods that operate on individual things, without requiring a knowledge of Manager/QuerySet internals.


A few years ago I was experimenting with exactly this problem, and I came up with an API that worked using an inner class on the model. I can't find the code now, but from memory it looked something like this:

    class BlogEntry(models.Model, MagicManagerMixin):
        title = models.CharField(max_length = 128)
        is_published = models.BooleanField(default = False)

        class QuerySet(models.QuerySet):
            def published(self):
                return self.filter(is_published = True)

    entries = BlogEntry.objects.published()
Where MagicManagerMixin was some scary code that made sure the objects Manager would use the queryset subclass.


> But my problem is that most people wouldn't even think of subclassing QuerySet.

I'd argue that's a documentation issue. QuerySet is Django's abstraction of a set of filterable database results. Keeping this conceptually separate from other parts of the ORM, such as the model class, is valuable IMO, and I'm not sure we gain much by trying to hide the details. One of the things I love about Django's philosophy is that it generally provides shortcuts that encapsulate common patterns without obscuring them.

There's a fine line between discussing the minutiae of API design and bikeshedding, and I think that anything Django can do to help here is valuable, so I won't push the issue any further. Let's pick it up again on the mailing list or Trac at an appropriate time.


I see what he's trying to do but this just seems like a ton of extra "boilerplate" code when you're trying to make an app.

I'd rather spend extra time tracking down where I've used the is_done field if I later change it to a status, than spend all this time writing a manager for every query I do. Unit tests help with catching it if you've missed somewhere.

On the other hand, if you have an extremely complicated query, then this might make sense. Maybe he chose that example for illustrative purposes but it's not really the kind of thing he's recommending you use this for?


It's not much overhead and considering the savings in views the LOC will probably be equal or less. It doesn't make sense for every model or attribute, but if you find yourself doing the same types of queries multiple times, this can be a big savings.


Momentarily arguments sounds right(perhaps because it's intellectually appetizing) but I think we are forgetting basic philosophy of every layered architecture that "lower layer provides generic api to its higher layer" allowing higher layer to customize its every possible needs using this api. Django ORM does exactly same thing.

author seemed to be concerned about these issues:

1. embedding businnes logic in views:

making query isn't business logic, business logic is in your database constraints or sometimes if db constraints are not enough then by overriding save() and validate(). queries belong in views because all we are suppossed to do in views is mearly fetch data from a data structure(already modeled according to biz. logic) and representing it as we see fit(thus the name views). theoretically this representations(views) could be of infinite types and changes over time so queries would change for every representation and over time but we can't go about implementing all possibilities in models. And this is what's antipattern because we are talking abaout putting views in models(partially though).

2. code reusability:

agreed, some queries could be repeted many a times and if complicated enough may clutter the code. I recommend putting querries into functions and put the functions in views or any similar aproach(I will think of one or you figure out one and share) but they just dont belong in models. although I believe full reusabilty can be achieved but in some cases if we can't- well we are choosing 'division of functionality' over 'reusability'.

and most important of all if django's documentaion does not suggest inherting for eg. queryset class then we shouldn't (even if you yourself coded the django framework) because these implementation details are supposed to be concealed and hence they are free to change it in future versions making our code 'upgrade-ugly'(if that's the right term)


Correct. Another benefit: easier to implement cache between the view and the DB down the road.


I'm a freelancer and I've been using this approach since ever. I wish more developers did too because it's really frustrating when they don't and you need to maintain their code.


Lately I've been re-discovering the value of the ORM and putting as much business logic on your models as possible. This, after spending way too long writing out lots of query logic in views instead. It's amazing how you can many times reduce complexity from 10-20 lines to 2-3 lines, and gain reusability, just by putting business logic where it belonged in the first place.


    > putting business logic where it belonged in the first place
I've been thinking a lot about where to put business logic, and I think the models are the closest, but not the best place for them.

A significant portion of this logic for me affects more than one model, and while you could solve this by using public interfaces, you still need to put in any of the models, which seems like a suboptimal approach.

The ORM is already responsible for several layers, and custom business logic is not relevant there imho.

Still puzzled how to organize my code. MVC has started to fall apart for me. Just my two cents.


A bit OT. In Yii framework (php) these are called "scopes" and there is a nice abstraction to define non-parametric scopes via arrays. http://www.yiiframework.com/doc/guide/1.1/en/database.ar#nam...


This article is great and provides some awesome insight from someone who clearly has been down this road before. It couldn't have come at a better time for me. I was just about to implement my own manager today, but I'll take this much cleaner approach.

Thanks!


Spot on. Never (okay very rarely) do business logic in your views. Just don't. Model methods for row-level logic, managers for table-level logic. It's easier to test, it's easier to abstract and it's easier to change your views later.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: