Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Tips for Building High-Quality Django Apps at Scale (doordash.com)
191 points by stanleytang on May 17, 2017 | hide | past | favorite | 120 comments


I'm so glad to see the mention of "app" directories here. I've only dabbled in Django development, but I've always thought the desire to divide things into apps didn't really make any sense. It felt like the developers of Django had said "well, intuitively, there must be some unit of reuse at this level" and then stuck the notion of apps in there in an attempt to provide that reuse.

This seems to me to be somewhat unpythonic, unnecessary complexity introduced right from the start - why not make the reuse optional?. The file system layout becomes confusing as a result. More generally, I would expect that the most minimal Django site would be a single file, that would seem most Pythonic, but it isn't designed like that. Django doesn't "feel" like most Python projects do.

I've always assumed this must be because I'm a total noob/idiot when it comes to real-world development, and that projects like Django must be doing things right and I just don't get it (although it remains entirely possible that this is the case).

It's heartening, however, to see I'm at least not _entirely_ wrong for having a gut unease about this, and it makes me wonder how on earth I can judge whether it's me or a given framework/language in the general case.

Do I have reverse Dunning–Kruger? Or did I just guess right in this case? I have no idea.


Part of it is that Django has been around for quite a while now. It's maybe _the_ most succesful Python framework out there, so there are some paradigms that aren't common anymore but are hold-overs from years past. In particular, it pre-dates the current "microservice" trend and assumes a fairly monolithic environment.

And I think "apps" _do_ make sense in certain contexts. Consider this situation:

- I start a "polls" app like the Django docs example.

- I get a lot of users and they like my functionality, so I want to engage the community.

- I decide to start publishing blog posts, but I'm too lazy to spin up a whole new Wordpress install

- I find a "blog" app and drop that into my project with the '/blog' URL prefix.

Now I can start writing blog posts right away and anyone can go to my site '/blog' and see it. There are no cross-app database dependencies to worry about.

The same would be true for adding a basic "social" app that shows a user-profile page for django Users at '/users/<id>' or something like that.


Is it actually possible to find and drop in apps like you suggest? And have it just work? Is a repository of various blog apps out there? What if one of them does something nasty to your database?

It seems like it'd be easier to spin up a Wordpress instance...


Yes, there's a whole universe of them.

Here's a good place to start looking for things to help make development easier:

https://djangopackages.org/


Thanks!


Wordpress is a CMS more comparable to something like django-cms (haven't used it). Django is more generic than Wordpress.

An example is worth 1000 words? Here's my app setup for the website I've been developing for 5 years.

    # these are the "drop-in" apps we've been discussing    
    THIRD_PARTY_APPS = (
        # django-allauth
        'allauth',
        'allauth.account',
        'allauth.socialaccount',
        #'allauth.socialaccount.providers.facebook', # to add later on
        'django_hosts',
        
        'bootstrap3',
        'imagekit', # for thumbnails
        'paypal.standard.ipn',
        'template_debug',
        'import_export', # CSV import and export
        'jfu',
        'extra_views',
        
        'post_office',
        'django_xworkflows',
        'markdown_deux',
        
        'django_mobile_app_distribution',
        'hijack',
        'compat',
    )
    
    # Ones I've written...
    LOCAL_APPS = (
        'membership',
        'awards',
        'reports',
        'entryhandling',
        'folio',
        'judging_console',
        'email_manager',
        'printing',
        'judges',
        'logs',
        'honours',
        'api',
        'questions',
    )
    
    INSTALLED_APPS = DJANGO_APPS + THIRD_PARTY_APPS + LOCAL_APPS


> What if one of them does something nasty to your database?

Same question is valid for Wordpress plugins.


Wordpress has already done nasty stuff to your database...


The polls app example is given in the introductory Django tutorial...


Yes, that's what I said...?


Sorry, misread that line.

Point being, how many examples do you have where someone writing a Django site is going to reuse their own apps?


I reuse my own apps constantly, whenever I do new sites.


Excellent - could you give some examples of what you reuse?

I'm interested in whether this is a worthwhile abstraction in the general case, or whether for most people (like me and the post) it's an unhelpful complication.


It's a worthwhile abstraction for any generic feature that isn't specific to the one project you're working on:

- User profiles

- Content management/publishing

- Messaging and chat

- Searching your database

- Admin

- Analytics on your database

Segmenting into apps makes the most sense when there are no cross-app dependencies on migrations: additional read-only capabilities on your existing models in another app, or new capability using entirely new, stand-alone models.


Where app directories are incredibly useful is in Django library development. They're a really convenient way to hook in models, static files, template helpers, etc. to an existing Django site, without forcing users to do a bunch of setup work when integrating your library. It's not a coincidence that most Django libraries have a very similar setup:

1. Install the package 2. Add the package to settings.INSTALLED_APPS (and middleware, etc). 3. Profit!

For an instance of a Django site, apps are less useful, and the article's recommendation to be more careful when splitting your site into multiple apps is great advice.


Yeah, I agree. Makes sense for libraries. So keep that separate for library writers, don't expose me to the complexity of that.


The benchmark I use is "Is it probable that this app might become shared between projects?"

Using a Django 'app' for this makes sense. If it's generic enough that I might consider moving it into it's own repo then it's probably not going to be heavily intertwined with the rest of the project.


Have you checkout out O'Reilly's Lightweight Django book? It basically goes through all of this, starting with a single file Django app.


+1 for Lightweight Django. I had the pleasure of working with one of the authors. The book is well laid out and covers some really great topics that aren't always covered well elsewhere in the context of Django (i.e. Websocket services with Django). It's a great companion for Two Scoops of Django.


In my opinion that book should be the starting point. The official tutorials have you way deep in the weeds with database migrations and stuff before you even have a basic understanding of where you're at and why. Worst "hello world" ever.


No! I read a lot of books, but missed that. Thanks!


>I'm so glad to see the mention of "app" directories here. I've only dabbled in Django development, but I've always thought the desire to divide things into apps didn't really make any sense. It felt like the developers of Django had said "well, intuitively, there must be some unit of reuse at this level" and then stuck the notion of apps in there in an attempt to provide that reuse.

Yes, let's say you install a module that has a specific Django functionality from pypi, it will typically be installed as an app.

It confuses newbies sometimes who think that they need to worry about how to divide up their application into different applications. 99% of the time they don't and shouldn't (putting everything in a single app called 'core' in your project is a pretty common pattern).

Unfortunately the docs were never very clear on this.

>This seems to me to be somewhat unpythonic, unnecessary complexity introduced right from the start - why not make the reuse optional?. The file system layout becomes confusing as a result. More generally, I would expect that the most minimal Django site would be a single file, that would seem most Pythonic, but it isn't designed like that.

It can be made to behave like that if you really want:

http://olifante.blogs.com/covil/2010/04/minimal-django.html

Most people aren't using Django for that though, so it's not widely advertised nor known.

>It's heartening, however, to see I'm at least not _entirely_ wrong for having a gut unease about this,

It's not like the framework is perfect. There's plenty wrong with it, but it largely managed not to fuck everything up, which is pretty rare.


You could write a Django site in a single file if you wanted to.

See this for an example: https://www.safaribooksonline.com/library/view/lightweight-d...


A good rule of thumb is that a Django "app" should be (1) standalone and (2) reusable.

If you were to write a Django project to automate some complicated internal process at your particular company, it's likely that you'd break up the code into separate Python modules to help give it an easy to understand structure. But it's unlikely that any other company would necessarily benefit from any of those separate modules by themselves.

However, let's say you come up with a clever idea about how to log errors in your project. Now, that's the sort of thing that other people might want to use in their own projects. If you were to restructure your code a bit and make it a bit more abstract then you could create a separate "app" that could be dropped into other Django projects.


I'm not affiliated with them, but you think seriously about Django, you should take a look at the book https://www.twoscoopspress.com/products/two-scoops-of-django...

I still have the 1.5 edition, but it helped me a lot. I'm think about upgrading. Do you recommend?


It depends what you're doing. It shines on low level features like authentication and image cropping, where I need ways to do these things, and I can use them across the board.

It breaks down when you have a lot of views and urls that are trying to stay close to their related models, but are too interconnected to have any real kind of separation of concerns.

I find myself using all of the above. There's nothing wrong with building a small site in a single app, or nesting apps to provide a little more structure for how they're connected or separated.


> why not make the reuse optional?

Well, it is optional. You can just not divide you stuff and place everything inside the main app. But notice the article recreates the same layer of reuse inside the main app, and got some extra problems because of it (renaming the tables is probably caused by this).

Django may need an extra paragraph at the tutorial, telling how it is ok to place everything on the main app when everything is application specific software, and how you should break generic functionality on other apps.

By the way, what would be reverse Dunning-Kruger?


> Well, it is optional

What I mean is, you still have to have a "main" app. Why? I don't want to reuse it. So I don't want an "app". I just want to build my website. Stop making me put things in an unnecessarily complicated directory hierarchy. If it were Java, I wouldn't think twice about it, but it doesn't seem Pythonic.

> reverse Dunning-Kruger

Good question! Probably... illusory inferiority AKA imposter syndrome. I assumed Django knows best, but maybe it doesn't.


Sure. One of the Djangoesque opinions is that if you're building a site, you'd like it to be maintainable, and that you'll probably add on to it.

If you've been working in the Python ecosystem long enough to know that you're building a site that doesn't have those assumptions, you'll probably pick something different to start out with, like Flask or Bottle.


> you still have to have a "main" app

Well, it has to be organized some way. It happens to have the organization that gives you more flexibility, instead of forcing something that only fits very small sites onto you. That means you'll have an extra directory on your tree.

Anyway, nobody uses only the main app. The article creates a big amount of problems by trying to only write on it, but I bet that even them use the Django's builtin apps.

> AKA imposter syndrome

Oh. Dunning-Kruger study detected that one too. It's within the effect: people tend to self evaluate into a small band near (but often under) the top, whatever their real evaluation is.


I've got more than 5 years of experience with Django on a number of teams and at a couple of companies and in my experience almost everything in this article is completely incorrect.

The only things I would agree with is the point about project layout and avoiding django's squashmigrations for the truncate the migrations table, delete the migrations, and create a new initial migration.

Practically everything else in this article is wrong, in my opinion.


I don't know as much about Django as you or the authors of the article.

So I can't really tell which of you has a better point, or is better in context and so forth.

I do, however, see that the authors wrote a long article, and backed up each point with an example of what could go wrong and how to avoid it.

You, on the other hand, just asserted that you largely disagree.

So if you disagree, perhaps you can take some place where you feel they were particularly wrong and explain why. Otherwise, we are left with the impression that they are correct.


I've got 10 years' experience on projects small and large and I have to agree. The title talks about building at scale but the article doesn't stress that which makes some of the advice downright weird.

>If you don't really understand the point of apps, ignore them and stick with a single app for your backend. You can still organize a growing codebase without using separate apps.

This is where the article lost me. If this is for building at scale, maybe, I don't know. I never hit a point where designing the project in apps became a problem. Regardless, if you don't know why Django wants to use apps, that suggests you are new to Django and probably not building at scale, so this feels like poor advice. Much of the article is telling readers to do things exactly contrary to Django's philosophy; the problem with that is there are lots of articles and StackOverflow answers out there based on Django's philosophy. There isn't a similar body of reference based on the authors' approach.

I don't know why explicitly naming your database tables is imperative for running at scale. Now we're breaking from Django's convention because some day we might want to stop using Django and we will be annoyed by its table-naming convention? Avoiding "fat models" is another place where it feels more like opinion than anything to do with performance or good design.

It would be good to know what database engine the authors are running into such serious migration issues with-- MySQL?


> Avoiding "fat models" is another place where it feels more like opinion than anything to do with performance or good design

So in the Java world, the general pattern is that:

Views:

  - Accept and sanitize query parameters

  - Call call one or more service methods.

  - Catch errors and return an appropriate error response

  - Render a JSON response based on the results of the service methods if nothing goes wrong.
Service methods:

  - Perform business logic

  - Manage persistence

  - Bubble up errors
The nice thing about this architecture is that each piece of the codebase tells a complete story about what it's doing. That is from looking at the view you can see what parameters it accepts, how they are sanitized, what service method it calls, each of the errors that can be returned, and what the 200 response looks like.

And looking at the service method we can see what business logic it performs, and what the database queries look like.

In each case there isn't any reason to look at other methods to understand the 'story' of what's happening in your app. This makes it very easy to read the codebase and audit it for correctness.

The problem with fat models is that they're not telling a story about what's actually happening in the app, e.g. looking at them doesn't tell you anything about the business logic the endpoints are performing. And what's worse, you also can't look at the views or services and know what they're doing either.

As someone who strongly prefers Python and Django over the Java ecosystem, I'll say hands down that in terms of how web app are architected they got it right and the Django people got it wrong. As far as I can tell the whole Domain Model Architecture thing seems like a bunch of bullshit that was invented to sell consulting. If the advocates of this approach can't even write a coherent Wikipedia article, it should give you a clue as to what the code ends up looking like. [1]

[1] https://en.wikipedia.org/wiki/Domain-driven_design


Yeah, I don't disagree with that at all. I came to Django from C# after playing with Ruby on Rails a little bit and the lack of an explicit Controller in Django confused me and I think it is part of the driver behind the "fat model" approach. I like the idea of the logic for the business object being inside it and all testable on its own but I think it has its limits-- thinking about my own Django codebases, the number of class/ static methods I have on models is a code smell from me learning OOP on C# where I had to stick those methods some place.


Is there a good place or pattern for service methods in Django? I've got some very fat models right now, and it's a DRY improvement over having the fat in the views, but like you say it takes a lot of effort to trace what's going on.


Let's say you're following the approach of breaking down your project into separate apps, so you have an app called user_accounts. This app would be a folder containing files like:

  views.py, services.py, models.py, test_views.py
So in views.py you'd have a User class, with:

  A POST method that calls services.create_user(username, email_address, password)

  A GET method that calls services.get_user_profile(request.user)

  A PUT method that calls services.update_user_profile(x, y, z)

  A DELETE method that calls services.inactivate_user(request.user)
The return value of each of these views can just be whatever services.get_user_profile(request.user) returns, rendered into JSON.

Then each of the services performs whatever business logic it needs to, preferably directly in the method. But if it would be more readable split into multiple methods, then you can create some private helper methods in the services file prefixed with an underscore. You can also have a separate folder somewhere for utility functions meant to be reused across the app, e.g. get_user_emails(request.user, is_active=True, is_verified=True)

Basically though each view sanitizes the data, e.g. strips XSS out of strings, makes sure booleans are actually booleans, etc.

Then each service first does field-level validation with serializers, e.g. ensuring that usernames meet the appropriate requirements for usernames. Next if there is other business logic validation that needs to happen, it happens, e.g. making sure that only users with verified email addresses can perform certain actions.

After that you perform the actual business logic, e.g. transforming any data. Then you perform your CRUD operation, e.g. creating a user model. And lastly you return something, e.g. returning the user model.

Each endpoint and service method can be written pretty much following this pattern, which makes the codebase super readable because once you understand one endpoint you understand all of them. And the service methods are the reusable component of the architecture, so e.g. if you want the ability for admins to create users, then they are created with the exact same service method. (But called from your admin endpoints/services.)


That's almost exactly what I do, except for using a big monolithic app for the entire backend (called "core"), and making "services" a package with several modules inside.

It also resembles very much what I see in Java projects which use DDD (Domain Driven Design).

What's your take on the article's point that you should have fewer rather than more Django apps (citing the problem of inter-app-FKs)?


So my startup is built the way you describe, in terms of just having one main app, and I personally prefer this style.

The basic argument in favor of breaking down the Django project into multiple apps is that it makes the components more decoupled and reusable. But personally I think this is bullshit. If you want your apps to be reusable and decoupled then you need to put a ton of time into architecting them this way, the idea that you're going to get these benefits just from putting stuff into different folders is magical thinking. It seems like pretty much the textbook example of cargo cult programming.

That said for the client I'm currently working for, the decision was made to do it the 'standard Django way' in terms of breaking it into multiple apps. So far I haven't run into any issues here. I like it slightly less because I think having all the views in the views folder, and all the services in the services folder makes folks more likely to reuse code just by making it easier to find. But yeah, so far no real problems, but I'm also not expecting to see any magical benefits either.


I'm not aware of architectural patterns for service methods in Django (would like to find some as well), but what I did was to somewhat mimic a Java structure.

All the project is in one single app, which I unimaginatively called "core", and inside this app there's a "services" Python package (i.e. a folder with a __init__.py file inside). These have roughly one Python module (.py file) for each "category" of services. For example, there's thin layers like "user_service.py" (basically passes through to the relevant models), to more complex services like "dependency_x_integration_service.py", which connects to external service "X" and pulls some relevant data (say, user interaction datapoints), and bridges them to the models in the system.


We do roughly this where I work, so I broadly agree.

That said, it's fairly common for the unit of reuse to be below service methods. Also, depending upon how exactly you manage transactions, another thing to look out for is making multiple non-idempotent service calls from the view - this will be an area ripe for race conditions you likely aren't testing.


> it's fairly common for the unit of reuse to be below service methods.

In terms of utility functions or serializers? What does that look like exactly?

E.g. in our codebase service methods can call helper methods (non-reusable), utility methods (reusable), and serializers (non-reusable).


I agree the doordash article gets some stuff right and most stuff wrong, almost to the point where it's difficult to read. But (somewhat tangentially) I admit I have struggled in the past with separating out Django apps for reasons not mentioned in the article.

Specifically, say I have two apps, with a second more specific app heavily dependent on a first more general app. What I find in this scenario is that I sometimes need hooks into the general app from the specific app, which means that I wind up importing modules from the specific app into the general app. This hasn't generally been a showstopper in my experience, but it creates some friction because:

a) I would prefer for the general app to have no dependencies on the specific app

b) This results in circular imports (which can themselves be addressed, but this is an implementation detail I would prefer not to have to worry about)

I realize these issues can be mitigated with signals, but I try to use signals sparingly for various reasons (https://code.djangoproject.com/ticket/16547#comment:2). It also helps that foreign keys can be expressed using a string literal rather than the actual model, but in the end, I still occasionally run into situations I don't feel great about.

Please note that I'm not advising against separating out functionality into apps. Instead, I'm merely citing an issue about having multiple apps that bothers me.


Agreed. I've built and maintained a moderate size Django app for 5 years now, and had similar issues.

GOOD app division: my "members" app which has classes for UserProfile, MemberType, and communicates with an upstream 3rd party membership API. It doesn't have any dependencies, but a bunch of other apps that depend on it. Another one I've just started work on is a generic Questionnaire/Survey app. This one is a definite candidate for spinning out as an open-source third-party app later, it lets you attach Questions and Answers to any of your own model objects via generic relations.

BAD app division: I have separate "Entry" and "EntryHandling" models across two apps, the latter is a OneToOne with an Entry. Originally this was a separation of concerns, but it's become a mess. Like the parent, the generic app ends up depending on the specific app, and migrations have to be handled gently and sometimes manually edited.

If you treat your Django apps as points that would be logical for splitting as micro-services, you'd probably be just fine.


> I never hit a point where designing the project in apps became a problem.

The article literally describes why this is a problem in the first place. Cross-app model relations are a PITA, and splitting "sections" of your site into separate apps often has you end up with cross-app relations.

The more general point here is that: The functional separation between Django apps and the logical separation between "parts" of your site often do not match up, and thus you should be careful about splitting up your site into multiple apps.

In my experience, this is absolutely true and happens often. The article recommends separating the parts of your site into modules and packages within a single app, which is a great idea and something the Django docs don't make obvious as a choice.


>Cross-app model relations are a PITA

A PITA how? They say they ran into migration issues due to the apps approach but it sounds like they ran into issues due to the sheer number of migrations happening across a bunch of developers. That sounds like a likely problem on big teams, but I don't think it's one best solved by not using the app approach and I don't think it's one whose underlying problem is ForeignKeys to models in other apps. Again, it would be nice to know if this was on MySQL or a different database as what finally caused me to move to Postgres 8 years or so ago was the heartburn of migration on MySQL. I think I've run into one or two migration knots since then and they were both due to me moving a little too fast.

And at scale, I would assume you aren't actually running the migrations but generating SQL from them and running that. Still could run into the same problems, but you could sort that out by hand when you do. Not the best answer, but from the article it feels like a more formalized/ strict approach to who gets to modify the database and when would be good.

>splitting "sections" of your site into separate apps often has you end up with cross-app relations.

It also encourages you to do some up-front data modeling which is a skill that gets rarer as ORMs get more common.

/old man yells at cloud


Don't see how up-front modelling would have predicted the high planes of Unicode & MySQL's utf8 encoding being supplanted by utf8mb4 so users of your product could message taco emoji to each other.

I have an agenda against Getting It Right First Time, Every Time, since it encourages brittle code that's a struggle to adapt to new, seemingly-similar use cases.


I don't understand how that first sentence is relevant. How would using apps organization vs the approach described in the article protect you from making the decision to change your database's underlying encoding?


One of the projects I worked on in the past had a legacy app which contained most of the core logic... and as the site grew, new apps were created because it got impossible to manage that one massive app. Splitting it into multiple pieces was incredibly painful because the codebase wasn't designed for that... so while I think this may help you get up and running faster, it will probably cause problems down the line.


Could you please elaborate?

I wouldn't say it is "wrong", but I skimmed it, and the advice seems either generic (organize your apps inside a package... like, duh?) or awfully specific to their own services.

In particular the things about dealing with migrations and the database. In my experience, the database structure doesn't change THAT much to warrant 3 or so sections of ramblings about dealing with migrations. And migrations don't tend to be dramatic either. My experience is that they are rather anti-climatic (I always sort of expected them to choke and kill my DB, since I started using South ages ago, but I've been pleasantly surprised so far).

Of course, this requires more than 5 minutes of planning on the developer's part.

And the article never touches things like how to actually run a django app at scale. I've seen an alarming number of places that just run their apps via the builtin django server (via the "runserver" command). This is, as I understand it, a very bad idea (for performance and security reasons).

Running your app under uwsgi (for example) optimally isn't exactly trivial, and I'd like to see them touch upon that.


Yep, this article has absolutely nothing to do with application performance and more to do with managing complexity, but avoiding complexity was not mentioned enough in the article, and they went as far as to suggest that users NOT use the ORM and to build a middle-layer for CRUD, which, well, is just flat out insane. Sorry, just is...


I replied to someone else who asked. https://news.ycombinator.com/item?id=14361539


Can you expand on what do you think is incorrect in the post?


- FK/M2M across reusable, packaged apps is only bad if you don't match the interfaces correctly. See: almost every third-party Django app that is built to integrate with another application's models.

- Sometimes you want the app concept just for organization. There's nothing bad about that and it makes sharing things inside of a project easier.

- Explicitly naming your database tables doesn't make any sense. You're using an ORM. Accept it or don't use it.

- Explicitly declaring through on a m2m field if you're not adding metadata to the relationship is pointless, but, if you're not using the table naming from Django I could see why they'd be invisible because there's no pattern to follow.

- GenericForeignKeys are dangerous but not for any of the reasons listed. It's because they implicitly force a two way join which seems magic until it becomes debilitatingly slow.

- The entire section on migrations leads me to believe that the first time the migrations are being run is on a production deploy. If you don't know SQL and you don't test your migrations prior to deployment, yes, it will be fairly difficult to determine what kind of performance/locking they're going to have.

- No to Fat Models? This breaks down to "The framework we chose to use suggests a pattern, we also chose not to follow that pattern." That's fine if you want to do that but I wouldn't suggest it to others.

- It's not hard to get signals to not fire in certain circumstances, you put the conditional in the signal callback like almost every other event-driven pattern. Also, bulk updating models doesn't fire signals in Django because save isn't called. Read the documentation.

- Avoid using the ORM? Why choose a framework as complete as Django where 80% of the features are built around or on top of the ORM and then don't use it?

- Caching complex objects makes cache invalidation hard. Well, yes, yes it does.


Re: naming tables, older versions of Django require (not sure about the latest) you to name your app in each model (via Meta) if you've broken your models into several modules within an app; maybe they are trying to speak to that?


I do appreciate some of the sentiment here. "Organize your apps inside a package", "Keep migrations safe", "Don't cache models" and "Avoid GenericForeignKey" are the ones I agree the most with, so I'll go over some of the others. Some of the other migration-related ones I don't have a strong opinion on...

> If you don’t really understand the point of apps, ignore them and stick with a single app for your backend. You can still organize a growing codebase without using separate apps.

This is still possible, but gets very painful down the line when you need to split it out into separate apps because detangling the mess will be next to impossible. I generally try to think about apps as distinct features which sometimes helps in splitting out the functionality.

> Explicitly name your database tables

If you're using apps this provides a nice separation between apps and makes it easier to see where data is coming from, rather than having custom table names that could be coming from anywhere.

> Avoid fat models

Models are really the only core location you have to add functionality to an object without worrying about copying it down the line. Fat models can be a pain to deal with, but it's better than the suggested alternative of building an additional access layer on top of the models themselves. Models are Objects and it makes sense to use them as such.

> Avoid using the ORM as the main interface to your data

Why? Building an additional layer on top of an already useful layer to do things it already supports seems a bit crazy. The part of this that I think is the strangest is this line: "Apart from signals, your only workaround is to overload a lot of logic into Model.save(), which can get very unwieldy and awkward." Those are the main two workarounds... and it's interesting to see them say "Be careful with signals" the section before then admit they're useful but recommend not using them.

Those were the main things I noticed. There are obviously some useful tips in here, particularly around migrations, but the portions where they go against direct recommendations relating to django.

If you're looking for a good book on recommendations from people who have been writing Django applications at scale, I strongly recommend Two Scoops of Django (https://www.twoscoopspress.com). There's a new version (https://www.twoscoopspress.com/products/two-scoops-of-django...) coming out soon which may be worth checking out (I've read previous versions and am recommending based on those).


I found signals (especially post_save hooks) incredibly useful for updating related models and caches. Their rational for avoiding them was weak.


Django signals are too magical, and very difficult to debug

We've had too many cases of phantom bugs which turned out to be caused by an errant signal in some distant unrelated model.


It's not really helpful when people come along and just say, "That's wrong", without explaining why...


I agree, but can you elaborate?

edit: 100% earnest request here


Same thing here, no clue why people are upvoting this.


I find that the safest way to migrate DB schema is gradually, spreading out intermediate steps across a few deploys.

Migrate the schema creating the (initially unused) fields in advance. Step by step change your logic where it talks to the storage—start querying new field values with a fallback, use new fields for incoming data. Migrate the existing data. Remove the fallbacks. Ensure the old fields are not used at this point. Migrate the schema, finally deleting the old fields.

Don’t rush, let every step reach production. In deployment sequence, apply migrations after the new code is already running. Goes without saying, monitor error rates for spikes.

Yes, at some points in time your production DB schema won’t be fully normalized. In return for prolonged messiness you get smoother flow, no single fateful and stressful deployment that attempts to get it right all in one go.

Migration handling is the only part of the article I find a bit foggy and debatable. The rest roughly resonates with what I’ve arrived at through my years of experience with Django web apps.


I appreciate this article for talking about scale in terms of project size and cognitive overhead, instead of in terms of traffic/computation/storage. Writing maintainable services isn't easy, and tips like these are painful to learn on your own.


I was feeling okay about this article until seeing this colossal punt:

> That said, the real intention behind this pattern is to keep the API/view/controller lightweight and free of excessive logic, which is something we would strongly advocate. Having logic inside model methods is a lesser evil, but you may want to consider keeping models lightweight and focused on the data layer. To make this work, you will need to figure out a new pattern and put your business logic in some layer that is in between the data layer and the API/presentational layer.

Having fat models is definitely a problem I'm having, and it's nice to see it's a problem for the author too, but the advice "figure it out" is presented without any explicit suggestions.


Procedural/functional code. Splitting your models up more by features than high level things (e.g. having a separate CustomerBilling rather than putting it in Customer). Component based architectures. Service layer to coordinate models. Etc. Also, this is/was a common problem (at least, the god class aspect) in games that the industry has slowly solved over the past 20 years, so that's another place you can look for inspiration.

Further, unless you know you're going to scale in the beginning, I would recommend refactoring/evolving over time. It doesn't do anyone any favors by having 10 models to represent your Customer if your Customer is already relatively thin.


My team has run into the same problem in a large Rails app. We decided on a combination of service objects and data access objects (DAO) where appropriate. In a lot of cases this has really helped with the grep-ability of our codebase.

Service objects allow us to encapsulate complex operations that touch several domains and tables through a simple API. They're just plain objects.

DAOs fulfill a specific case of providing a simple CRUD API to a domain. We use a lot of DynamoDB so the DAO allows us to hide the complexity of reaching out to several tables in order to return a result. We also make sure that the DAO returns an immutable object that can't reach back into the datastore like an ActiveRecord object can (#save! methods come to mind).

The difference between a service object and a DAO is a little blurry but is enforced through quick design discussion and code reviews.


"Don’t cache Django models"

This is too broad to be good advice. A typical solution to stale models is to increment a cache version key, effectively invalidating the old cache.

Version keys can be coarse- or fine-grained. For example, you may have one version key for the entire application (supported by Django by default) and one version key per model or application (you have to roll your own solution). If your app's models change, you can increment the app version key and the next time you try to fetch a model instance from the cache, you will miss and instead fetch the instance from the DB.



It seems like author has problems with correctly architecting his apps and blames Django for his inability to do that. Only thing that makes sense is a tip for migrations.


About ContentType : "If you ever move models between apps or rename your apps, you’re going to have to do some manual patching on this table"

Thanks for this, just had a kind of struggle recently


It's interesting that they're still using the default ORM. Stranger still they're letting the ORM scaffold tables (?!?) now they're at this size?


This reads way more like a list of Django caveats and anti-patterns, than a guide to running any kind of Python application at "scale". Maybe a sprinkling of good hygiene, but that has nothing to do with scalability (unless you're talking about developer scalability and cognitive overhead).

Further, to suggest NOT using the ORM and to build out a middle layer on top of the ORM just for CRUD, is, well, insanity. On one hand the author is saying to cut down on bloat, and then on the other hand they are saying to add needless abstraction on an already bloated ORM.

I don't use Django, but I do use Python for application development nearly every day, and I have to say... this article reads more like a list of (predictable and basic) problems their team had than a guide to scaling an application. Maybe they mean, how to scale their team, workflow, and their efficiency, NOT the application itself.

The only thing I agree with here is the avoidance of some Django features... However, they should take it a step further, and just avoid Django in the first place.

Also, am I the only one that has never needed to use "migrations" or any similar sorts of features? You're doing something very wrong and overcomplicating things if you seriously need to lean on migrations that heavily.


> This reads way more like a list of Django caveats and anti-patterns, than a guide to running any kind of Python application at "scale".

This isn't about scaling in the sense of handling more page views, but rather about scaling in the sense of having a codebase where the cost of adding each new feature and developer is O(n).

The issue is that a lot of the 'recommended' Django patterns are terrible, and using all the features that are built-in would lead to a completely unmaintainable app.

I don't understand the authors point about not using the ORM, in terms of why creating multiple tables inside a transaction doesn't solve the issue he's talking about. But aside from that all this advice is dead on, and these are exactly the same practices that I recommend to all consulting clients.


I agree with all of your points about the article, but...

> However, they should take it a step further, and just avoid Django in the first place.

Django is a tool, and like most tools it has a use. I find Django indispensable for writing specific kinds of applications, and it's admin interface is by far the best thing since sliced bread for internal/backoffice applications. It's not perfect by any means but it's amazing for quickly getting a CRUD app with an awesome administration interface up and running with minimal effort. Django rest framework, migrations and the amazing ecosystem (reversion, django-mptt, django-polymorphic, debug toolbar, django-currencies to name very few) are also what makes Django appealing and rather awesome.

Making a blanket statement like 'Avoid Django' is rather silly, given all that.

> Also, am I the only one that has never needed to use "migrations" or any similar sorts of features?

I'm not sure why you put quotes around migrations as if it's some alien, obscure or weird feature. If you've never written a web application that needs migrations then you're not writing the kinds of applications (or indeed any 'serious' application) that would benefit from Django IMO.


> it's admin interface is by far the best thing since sliced bread for internal/backoffice applications.

By the way, some best practices here that I've discovered:

- Use the Django admin for performing actions closely tied to specific models that don't involve any business logic. E.g. changing which one of a user's email addresses is their primary address.

- For admin business logic that spans multiple tables, (e.g. inactivating a user's email address and logging that action in a different table), just create an app called admin_api or something similar and then create DRF endpoints for performing this sort of admin logic.

The benefit of having your admin business logic wrapped in REST endpoints is that you are writing and testing all your admin logic the same way as you write and test all your other endpoints. And since all your admin business logic is just another Django app, you can create models for your admin business logic, e.g. for logging the results of your database integrity checks. And then you can use the standard Django admin on top of those tables, so you're basically putting an admin on top of your admin.

And because all your admin logic is encapsulated in rest endpoints, you have the option of either hitting those endpoints from Postman or some custom admin front end, or else hitting the service methods that perform the business logic for those endpoints directly from the Django admin actions dropdown list[1].

[1] https://docs.djangoproject.com/en/1.11/ref/contrib/admin/act...


> The benefit of having your admin business logic wrapped in REST endpoints is that you are writing and testing all your admin logic the same way as you write and test all your other endpoints.

You write all your business logic in REST endpoints? That's insanity... Are you even doing RESTful things with all those endpoints?

Have you even considered the impact of HTTP overhead? This is a thread about scalability, after all.

Don't over complicate shit. This article isn't even about performance, it's about complexity it seems, but you're here promoting a HORRIBLE idea as a "best practice".

Microservices are one thing, but replacing your data access layer in INTERNAL code with a RESTful endpoint is kind of insane, and will only lead to problems later.

For example, very recently, I had to audit an app that was very very slow. They had recursive data calls that took 1000x longer because the idiot that slung them together used an inline CURL call via their main production API endpoint.

That one request then led to 1000s of other requests, which overwhelmed the load balancer because every one of those API calls triggered more in-kind API calls to fetch other data. But, because the request went back out through the load balancer, was made to another server which did not have the needed data in memory, so it makes a similar API call to fetch it, which then goes to another server behind the load balancer, and then it just devolves into a clusterfuck cacophony of bullshit and massive overhead and slowness where a simple Foo.get(id=blah) would have sufficed in the first place.

Their developers proposed solution? "hit localhost instead of the load balancer". Guess what, it was still very very slow, because of HTTP overhead. They finally listened, and killed that CURL request, and replaced it with a recursive call back to self, and suddenly IO dropped, requests were responsive, and they were able to remove half of their servers from the load balancer pool.


> replacing your data access layer in INTERNAL code with a RESTful endpoint is kind of insane

Nothing is being replaced. Each view has a service method that performs the business logic, so you can either call the endpoint or else call the service method. There is zero performance implication, and basically zero extra complexity.


I don't see exactly what the mentioned problem has to do with writing REST endpoints though.

Agree with OP. REST endpoint in many cases are the way to go. Particularly today when the front end might undergo who knows what transition and not really have much to do with Django at all.


> I'm not sure why you put quotes around migrations as if it's some alien, obscure or weird feature. If you've never written a web application that needs migrations then you're not writing the kinds of applications (or indeed any 'serious' application) that would benefit from Django IMO.

I have been programming for almost 15 years, and have never once needed to use migrations. I have worked on projects for multiple years with multiple developers and both small and large teams, and grew and scaled them, massively overhauled schemas, etc... and still, have never needed migrations.

Have I ever had to add a column to a database? Absolutely, have I ever needed a massively overcomplicated "migration" tool? Hell no, because I put more than 5 minutes of thought into my application logic and data structures before I even started writing code or designing the database schema.

I've single-handedly built, and maintained with a team, applications that did millions of dollars of revenue per day, with 10s of thousands of users per minute normally, with 100s of thousands per minute during peak load and used various styles of SQL data stores with between 20 and 50 tables on some projects (sounds pretty serious to me)... and still never once had a need for migrations.

My point is, if you need to lean on migrations even remotely often, you're doing something very very wrong.

I think, though, the type of developer using Django for its large pool of third-party extensions isn't really the type of developer who puts a lot of thought into what they are doing, though. Maybe I'll catch flak for this, but it's pretty true in my experience. It's the same as some front-end JS devs who sling various frameworks and libraries together, and end up with a pile of unmaintainable mess in the end...

Django is used to sling backend web apps together, fast, and needing migrations is evidence of that.

However, if one is doing proper clean development and following some simple best practices, an automated and complex migration should never be needed in the first place.


You get your DB model 100% correct the first time, always? Even years later your DB model is able to accommodate all those always changing business requirements?


If you get your DB model 100% correct the first time I want to work for your clients!


Yeah, pretty much.

Haven't had a problem yet where I needed automated migrations, and I have been doing this for years upon years upon years.

Like I said, if you're leaving heavily on migrations, you're doing something wrong.


Haven't had a problem yet where I needed automated migrations, and I have been doing this for years upon years upon years.

Just out of curiosity, have you had to maintain/modify any of these apps over these years and years? If so, without any schema changes?


> Just out of curiosity, have you had to maintain/modify any of these apps over these years and years?

Absolutely.

> If so, without any schema changes?

Usually, to be honest, yeah... Core, basal units of any given business process usually don't change. Only how they are abstracted, and that abstraction lives in code, not the database.

With a proper storage architecture it is very very very rare to need to change old data structures. It most definitely doesn't happen often enough to necessitate the use, overhead, and complexity of a baked-in migrations library.

If you think "adding a table" or "adding a field" is a "schema change" then that is where the problem lies... I think. Extending a schema should not require a migration library. CHANGING a schema, as in, destroying old data, and making new data, could surely use the help of a migration library, but, generally, if you're doing that often, you're doing something very wrong, and if you're not doing that often, then you definitely don't need a baked in migration library.

Ultimately, if things are so bad that you need a migrations library, you're better off fixing that shit at a low level, and starting over from scratch if necessary. Sometimes, instead of creating years and years of tech debt and burning resources on working with a clusterfuck, you just need to take what you can from that clusterfuck, and do it right. Then you only have a single migration, to go from old clusterfuck data to new extensible data structure.

Chances are, if you have old fucked up horrific data, it stays that way, and then some person or team comes along and writes a nice clean API on top of it. And then people have to maintain that horrific thing, and it's only horrific because it's interfacing with shit data layers in the first place.

Don't just abstract bad data architecture away into a service layer, is my point... Do it right.

Extensible is the key word here, but with an emphasis on decoupling.

If you have some dynamical data structure that is truly changing, structurally, often, and isn't simply being appended to or extended, then it probably should not be stored in a traditional SQL database in any way that necessitates schema changes. If you can't figure out how to represent data in an effective, decoupled, and extensible manner, then you're screwed from the get-go. Migrations library won't help you.


I'd like to know what I am doing wrong, as I use migrations all the time. The option is change the database later (and use migrations) or write a whole ton of crappy code to avoid using them. I have tried both ways and usually migrations work better.


I dunno what to tell you without specific details about your application and environment in general.

Clearly, you need a better system architecture if you're having to do migrations constantly.


I always thought a data model should be as agnostic to the system architecture as possible. Insofar I don't understand how a system architecture can help to avoid data model migrations.


Data structure, both storage, and in-memory, are very much a part of architecture.


So requirements are never added / changed in your systems? Thats usually why I need migrations.


Have you had a project where you needed automated testing?


You've caught a lot of flak for your statement.

With the knowledge I have now it would easy for me to say "and rightly so", but I think I can understand where you're coming from.

Django didn't always have schema migrations built in. I think they appeared in version 1.7 about 2 and a bit years ago. Before then there were separate add-on apps that could do migrations for you.

When Django added built-in support for migrations I still didn't use them for a while because at the time I was primarily a solo developer and wanted to have full control over what happened in the DB. And because I was a solo developer it didn't matter. So in that sense you're comment is in some ways correct.

However, in recent times I've started to work as part of a team. And that's where migrations really start to shine, because there might be several different people making app changes that result in DB schema changes. Django stores migration files with information about which other migrations each one is dependent upon.

Without that kind of migrations system it would simply be impossible for developers working on combinations of different branches of a development tree to build a working version of the code.


How do you add columns to your db?

Do your requirements never change?

I am fully capable of doing everything done with migrations with SQL - but I always use migrations (even for custom sql outside of the ORM's capability). Why? because then we have a record of the schema state at any given moment (that is in sync with the application code at that point) and how it got to where it is now.


If you're leaning on migrations often, with SQL, then I think you should not be using SQL, obviously (or you're just doing something wrong) Or, one should use a SQL database that allows more abstract data types (like JSON blobs in Postgres)


I prefer to make use of the strong type system and inbuilt integrity checks that a database gives. json blobs mean reimplimenting this at app level. The people paying me are paying me to build their products not reinvent the wheel.

Migrations are an excellent tool if you aren't using them you are doing something wrong.+

Edit + assuming that you are working with a relational database and have an agileish development process or ever have requirements changes.


So where is your data model enforced if not in your tables?


> So where is your data model enforced if not in your tables?

It's enforced...

...in the models.

:P


And how do you keep your database schema under version control?

Edit: Do all you downvoters not keep track of your schema changes as you develop your application (via migrations, or even sql files), so you can easily roll back changes if needed?


The models live in version control. It's that simple.

If your data structure is changing so often in such a way that you need to keep a history, then I'm afraid you have some fundamental problems that no amount of tooling or process can solve. You will not be able to scale your development efforts with this type of constantly changing data model, even with a migration library.

If these are truly problems that you or your team has, then you should hire someone to help you organize your application, data, and workflow in a more effective manner.


> I have been programming for almost 15 years, and have never once needed to use migrations.

> Have I ever had to add a column to a database? Absolutely,

... then you've used a migration. A statement like "I've never once had to use a migration" is so obviously incorrect that it's comical. Your contradiction two lines down makes it more so.

The rest of your comment is fairly misinformed about a few things, full of FUD, misses the point of Django migrations entirely and seems desperate to paint any use of migrations as the result of being a "bad programmer" rather than changing requirements or a natural part of development. Seeing as you've used migrations, you're a bad programmer as well.

As such I'm not going to bother writing a reply to the rest of it like I usually would.


Please keep this kind of programming flamewar off HN. If someone is wrong and you want to explain how, do so civilly.

We've unfortunately had to warn you a bunch of times already about being uncivil on HN. This process isn't infinite and ends in an account getting banned, so would you please fix this?


The gp was being extremely provocative with their generic and missinformed insults...

The person you are warning was robust in their response but sometimes robustness is needed in a civilised debate.

The I think that this warning is misplaced and makes hn a worse site.


Oh please, tell me how I'm misinformed?

If you need a migration library because you added a column to your database, then you have a serious problem.

If you need a migration library because you don't know how to write SQL, then you have a serious problem.

If your data structure is truly changing that often, then you shouldn't be using a SQL database in the first place, and you have a serious problem.


Please don't get involved in technical flamewars on HN (or other flamewars). As agitation goes up, information goes down, and these discussions turn into back-and-forth spats that benefit no one, including the combatants.


Okay, so you changed the DB schema in your 'dev' environment to add a column.

1. In what manner/format do you define or commit the change to VCS so it is in sync with the corresponding code change (that uses the new column)?

2. What happens when a teammate comes to deploy that schema change to production?

3. What happens when we need to roll back production to some previous commit? The need to rollback may/may-not relate to the schema change (not suggesting you 'got it wrong').

If you can answer 1, 2, 3 by some convention ... you've created a migration system that should probably be codified.


1. Change model code to add the new field, test, and commit it to version control. Then it's merged to staging, where it's tested in a production-like environment with a recent production data snapshot. After that's approved, it goes to master/production servers.

2. git pull then restart some wsgi/worker processes

3. git checkout to desired revision and restart some wsgi/worker processes

It's perfectly fine to have models comprised of only a subset of the data they interact with.

The models do not have to reflect all fields that exist in the database.

The models only reflect information that is necessary for the application in which said models are used.

Basically, if you clobber your database, no amount of migration tooling or process will help you. Ever. It will just get in the way, to be honest, and will prevent you from doing things properly in the first place.


You missed the point of all those questions (e.g. in 2 you don't mention how the schema change gets applied to the prod database). Unfortunately I'm frankly out of patience explaining any further, so we'll have to leave it there.


There is no schema change, that's the point. You just don't get it... but that's okay.


Okay you didn't read the question then.

> [...] you changed the DB schema in your 'dev' environment to add a column [...]

The 'point' is we are adding a column, and we want a sane workflow that works in a team from dev through to production.


Again, FUD. If you really need me to spell out why then I will, but I doubt it will change your viewpoint.

Let me spell it out for you:

You build your awesome [app] for [client]. You get your schema bang on first time and it's all working absolutely fine. Everyone is happy.

Then [client] comes to you and says "[app] is awesome, but we need [x]". You think, and [x] needs some database modifications. Maybe it's a new column to store something, or perhaps it's a column that now shouldn't be nullable. Maybe it's a new table entirely. Perhaps what you need is already implemented in library [y], which needs database modifications of some kind like it's own tables. The possibilities are endless.

Great. So how do we do this. Obviously we need to run some SQL on the database. So you change all your code, press deploy, and really quickly log onto the production server and smash out some artisanal SQL to modify the database before any requests hit the server. Everyone is happy! You quickly write an email all your co-workers to do these migrations locally so they can run the app.

Oh no, but wait, you forgot to do operation Z! Now your app and everyone elses is broken. You resolve to ensure this doesn't happen again by writing a .sql file for each migration and running them in sequence. Great, you can even check them into your VCS and share them with your co-workers. Awesome! You can even make them part of your CI build (you do that, right?).

You do another deploy, but nuts, something went wrong! You need to roll a migration back! But you only added forward migrations in your .sql files! Ok, so lets add [migration]-backwards.sql and [migration]-forwards.sql from now on. Awesome! So you've got a forwards and backwards migrations. But... wait, we need to run some Python code to do something as part of the migration. Ok... lets add a [migration]-{forwards}.py as well.

Fantastic. But then you start to install package [z] because it does exactly what the client wants and you realize the benefits of using well tested, widely used libraries in your systems and not writing everything by hand. This requires some database modifications. The author of this package has the migrations it needs but it's in it's own home-grown format and it's only for MySQL and Oracle! Nuts, ok, so you translate them by hand and add them to your migrations. But you made a mistake somewhere in the translation and everything breaks. You fix it by hand.

Congratulations. You've just written a shit version of Django migrations that everyone hates and that doesn't really work.

tl;dr: schema changes are a natural part of development. You, me, and everyone else working on any kind development project has done them. They happen, this is a fact. If you need to:

1. Share the migrations in your team

2. Run them as part of your CI build

3. Use third party migrations for packages (including built in contenttypes or other core Django tables)

4. Have them in a format that works for any supported database

5. Have rollback and the ability to execute arbitrary code as part of the migration

Then you can either write your own shitty version or use a well supported, built in system that handles the complexity for you. This is not a bad thing.

I can't comprehend your viewpoint, which mostly boils down to "these damn kids are on my lawn", so I can only imagine your setup is some ad-hoc .sql files that you run by hand.

This is a bad thing.


You are setting up a comical hypothetical situation in which the developer is blundering through their job in a mindless inept manner.

I'll just stop you right there.

No amount of best practices, process, or tooling will fix an incompetent developer who is part of a disorganized team that are working on a pile of bad architecture.


Django is not meant to be run at scale. At least not horizontal scale. If you're not going to use the ORM, which prevents you from scaling horizontally, why are you using Django? Django can be scaled by getting large instances with lots of RAM, which is quite affordable these days. It can also be scaled with caching. But if you want horizontal scaling, don't use Django. That's not what it's for.


There's nothing really inherent to Django that prevents it from scaling horizontally, not even the ORM. For the typical SQL read-heavy workload it will scale as well as anything else depending on the scalability of the backing RDBMS. Should always use proper HTTP response caching with something like Varnish or via 3rd-party CDN where possible.


> depending on the scalability of the backing RDBMS

As far as I know, none of the backing RDBMSs that the ORM supports are horizontally distributable. At least not officially, right? CockroachDB with its Postgres interface sounds good, or Citus? But as of today, are any of the supported DBs distributed?


MySQL Cluster [1] has been around a long time for applications that need linear scalability with ACID guarantees

[1] https://www.mysql.com/products/cluster/


How does the ORM prevent horizontal scaling?


The ORM works against a relational database. The relational databases that Django supports are not distributed, right?


May depend on your workload and what you mean by 'distributed'... there are some scalablleity options for postrgres. There are backend drivers for Oracle and MS SQL as well, but I have never used either and I'm not sure just how good they are.

But there is nothing that I am aware of inherent to the ORM itself that prevent horizontal scale.


You scale your app server out and your database server up. You can have 100 django servers hitting one database with a read-replica, for example.

Depending on your workload, this is often a great way to scale and pretty typical for Django. In most cases your app servers will saturate way before the database does.


Thanks for this. Yes this actually makes sense. I considered that a given, though, since scaling the Django app is trivial since it is stateless. Scaling the state is always the difficult thing.


You can get horizontally scaled eventually consistent on reads just fine on postgres/mysql via replication. There are also write scaling options.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: