Open-sourcing MonkeyType – Let your Python code type-hint itself

alexchamberlain · on Dec 21, 2017

Fantastic contribution back to the community; I look forward to trying it out.

I must say this is the first time I've been disappointed with the quality of discussion on HN. For a community that promotes using the right tool for the job at the time, I would have thought people would be more open to the choices the early engineers made. I'm sure that Instagram are using a variety of tech across their stack.

blub · on Dec 22, 2017

Most people would expect software to crash, hang, be slow or somehow leak their personal information. That's normal behaviour for the products of the software industry.

For a long time there have been efforts to ensure at least a degree of quality and robustness through processes, practices and verification tools. One such tool is a type system which allows encoding requirements and expectations that will be automatically verified with the help of a checker. This is exactly what Instagram is attempting in an effort to increase quality and ease maintenance.

It seems many are wondering why they haven't done this from the beginning. A typical (and probably correct) answer is development speed and flexibility: early stage companies need to be nimble and compromise on quality if they want to survive. Fair enough, I understand that. We can applaud Instagram's success on the markets, but that doesn't have to mean that they're a role model of technical excellence for building a million line Python code base.

This tool is the proof that Python has significant problems at scale, which is something the Python community has denied for a long time. They're still doing it in this thread, but the lesson looks pretty clear to me: if you plan on building large scale, don't use Python. Or PHP while we're at it (see Facebook). Definitely not JavaScript (see FirefoxOS).

The founders of a start-up can continue to do whatever they want in the name of success. Instagram at their beginnings was basically a different company from today's Instagram, no one would have used them as a technical role-model. Now they're at best a role model for large Python code bases, but many here seem to be drawing the wrong conclusion, namely that it's a good idea to do large-scale Python in the first place.

techdragon · on Dec 22, 2017

The reality borne out by the evidence [0][1] is that Python is at the very least perfectly suitable for the development of web stacks powering companies worth in excess of hundreds of million dollars.

Putting aside concrete technical issues regarding the python runtime’s performance envelope (eg: startup time, FFI inter-op call time, etc) and memory footprint, there is no reason not to use Python. Again, concrete technical issues aside, Python will not be an issue until your business is big enough that having to push the performance of your product is a nice problem to have and you will probably have the money to spend solving it by either optimising your python or rewriting parts of your application, as all these large companies have done.

0 - https://www.linkedin.com/pulse/top-10-sites-built-django-fra... 1 - https://worldwebtech.weebly.com/blog/top-ten-most-popular-we...

nothrabannosir · on Dec 22, 2017

You can substitute Python for assembly, point at the Apollo mission and your argument still holds.

Rarely is a programming language chosen because it actually is the best tool for the job. Often, it’s whatever the people at the ground floor were most comfortable getting a prototype out the door with. Prototype working well? Fix this bug, add yonder feature. Before you know it you’ve built Facebook in PHP. “PHP is at the very least suitable for the development etc etc”, yes, that’s why Facebook invested all that effort in Hiphop VM.

There’s an irony in saying “there’s no reason not to use X” in reply to an article that is about a company spending a ton of effort working around a problem in X.

My point is: as a community, it’s our duty to learn from these mistakes. Let’s admit there is a problem, investigate, adapt, overcome. We can improve whatever will be the next Python so the next Instagram doesn’t have to go through this. But that won’t work if we keep saying “this is a nice problem to have, there’s nothing wrong.” It isn’t. There is. Look at the article.

blub · on Dec 22, 2017

I'm not claiming that one can't be successful by using Python (or pretty much any programming language). Market success or user count are unfortunately not tightly correlated with technical excellence, as many of us have bitterly found out.

Development speed, maintainability, error count are also important development issues. Unfortunately we don't have much data to judge, but the little that we have such as this article indicate that dynamic typing has a non-negligible negative impact on the above.

""having to push the performance of your product is a nice problem to have and you will probably have the money to spend solving it by either optimising your python or rewriting parts of your application, as all these large companies have done.""

Not considering the topic at all is negligent. I don't understand why you're so sure that the only possible outcomes are either not reaching scale or having the money to optimise or rewrite.

First of all, not even Facebook had the money to rewrite their PHP code base, so that's probably out of the question. And it's very well possible that one will reach scale and not have the money (or worse the time) to optimise, if optimising means inventing type checkers.

At least one should spend some time thinking about this topic and picking a language that is flexible and can scale at least somewhat. Having to stop writing production code in order to invent a Ruby static type checker or native compiler is decidedly not a good problem to have.

bonesss · on Dec 22, 2017

> ...we don't have much data to judge, but the little that we have such as this article indicate that dynamic typing has a non-negligible negative impact on the above.

In my own experience I see it on a spectrum: there are design decisions you can avoid by using dynamic typing that give you speed in the small that start to show their absence in the large. Savings on inputs (quicker coding), tend to evaporate as complexity grows because you're using so much time analysing outputs (the system), to determine behaviour. Every bit of up-front work saved gets amortised across issue after issue, and runtime testing during development, and increasing stagnation in high-level architecture.

For me the ideal is the type system of Haskell with the linguistic power of Haskell and the type inference of Haskell... only on a mainstream platform I can convince management to use.

outsideoflife · on Dec 22, 2017

> Market success or user count are unfortunately not tightly correlated with technical excellence, as many of us have bitterly found out

It depends on how you define technical excellence. I would suggest a low memory usage might get you a thumbs up from HN, but isn't real technical excellence the ability to solve users problems?

krotton · on Dec 25, 2017

Nope. Publishing an offer on Craigslist to help with homework via Skype is solving some users' problems, yet has nothing to do with "technical excellence".

moreless · on Dec 22, 2017

> This tool is the proof that Python has significant problems at scale, which is something the Python community has denied for a long time. They're still doing it in this thread, but the lesson looks pretty clear to me: if you plan on building large scale, don't use Python. Or PHP while we're at it (see Facebook).

Yeah, absolutely, don't do this! These are two examples of successful companies that did it and look at them now! </s>

Being able to move fast and produce a winning product on time is much more important for startups. What does it matter if you used <your_cool_scalable_thingie> for a project, when it never went past 10 users because you were concentrating on wrong side aspect of your business? PHP is fine. Python is great. Use the tools that fit your problem and you know how to use, not the latest toy.

blub · on Dec 22, 2017

There are two things that are critically incorrect about your argument:

1) That market success implies having quality software. Average seems to be enough in my experience.

2) That start-ups are a good example to follow if one wants to achieve good quality. In fact they should be ignored, because they will absolutely murder quality in order to stay alive. Sometimes the product doesn't even work and is held together with duct tape in order to get past that important demo... It's quite pointless to discuss quality and start-ups.

The lesson I mentioned should be heeded by mature companies that are able to do some project planning, complexity estimation, etc.

zerkten · on Dec 22, 2017

How do you identify the inflection point and then execute when it hits? There are conservative choices that would scale all the way through, but many startups would avoid them due to the hit on "velocity" (which is itself a very fuzzy topic.)

It seems that this same pattern plays out with many tools, and not just languages. When you've built something and you now have a team, processes, etc. built up it becomes difficult to see the forest for the trees, or to make the hard decisions because it might involve replacing people.

1_2__4 · on Dec 22, 2017

HN has a favorite pastime: dismissing tech that has demonstrated incredible utility because it’s lacks some kind of ideological purity they demand.

kbd · on Dec 22, 2017

> This tool is the proof that Python has significant problems at scale...

This does not prove your point. Annotating a large dynamically-typed codebase with type information is a large amount of work, regardless of the language. This tool makes that easier.

blub · on Dec 22, 2017

I'm bemused by your reply and curious to know why you think Dropbox and Instagram are working on static type analysers for their large Python code bases. Instagram at least gave us a hint: "we’re keen to make our code easier for new developers to read and understand, as well as more amenable to static analysis that shrinks the domain of possible bugs".

It seems to me it's so difficult to manage such a code base, that they decided to do that "large amount of work" in addition to the large amount of work required to develop the necessary tools!

In which case it seems prudent to avoid the said amount of work by picking another programming language for one's large-scale code base.

bonesss · on Dec 22, 2017

> ... it seems prudent to avoid the said amount of work by picking another programming language for one's large-scale code base.

While I'm a type-adherent in my day-to-day (F# represent, wut wuuut), I think this isn't reflective of the chicken and egg dilemma for startups... There are an immense number of things they "should be" doing at scale that they can't do early because they're relying on their early product to scale. Time to market, and windows of opportunity, are critical to startups.

Any immature technical decision at that part of the lifecycle needs to be made not with an attempt to make perfect forever from the start, but rather with an eye to transitioning to smarter solutions aggressively as you scale up. The company may be 6 pivots away from success, so better to validate solutions in the market than prognosticate.

I don't accept the premise that a lack of typing significantly improves development velocity, per se, but language decisions are about ecosystems, key components, and local talent. Where these companies get into coo-coo land is not integrating those immature components into better systems as they're getting bigger. Next thing you know someone is writing a whole compiler chain for PHP in an attempt to reinvent a sane programming language, or trying to get Python to be Java.

Smart, modern, functional languages provide development velocity and pleasure equal to Python with fundamental type safety and guarantees. But in a world where just putting a button on a webpage means multiple dialects in multiple languages I think we should be ok mixing and matching on the backend to scale smartly.

outsideoflife · on Dec 22, 2017

Isn't dropbox doing it partially to aid a future migration to Python 3?

> In which case it seems prudent to avoid the said amount of work by picking another programming language for one's large-scale code base

When I first saw Instagram I thought it was nonsense, I was wrong. When the founders started Instagram, I wonder if they had the amazing foresight to see what it would become. I would suggest getting to market was more valuable to them than worrying about what maintenance they would have to do once they had a billion dollars

baq · on Dec 22, 2017

Yes it does and it doesn't matter, because an argument can be made that there wouldn't be a Dropbox or an Instagram if they started out in a statically typed language back then.

nine_k · on Dec 22, 2017

Use the right tool for the job.

"Scale" can be seen under different angles. You can run thousands of boxes with relatively simple code if it's designed to scale horizontally. Twitter used to run on Ruby this way, until they accumulated money and expertise enough to rewrite the whole thing and save on operations and further development costs.

Your code can be millions LOCs and run on a few boxes; it's a very different sort of "scale".

In one company where I worked they had a crazy mix of codebases, from modern Scala down to ancient PHP code. But since it was architected reasonably, it was possible to replace the PHP code piecemeal, without stopping the system. Do the devs that started it 15 years ago with PHP deserve blame for choosing a poor language, or praise for coming up with a serviceable architecture?

You can see the original post as a step in the right direction: in a complex codebase, static typing has a large number of advantages. Barring a wholesale rewrite, how would you gradually transform your code to using it? Yes, by documenting the current state in a formal way, introducing a typecheck build step, then maybe splitting out certain components and rewriting them in other languages, etc. Look at TypeScript.

Unfortunately, there's no way around using quick-and-dirty prototypes at the very early stages, if nothing else, for lack of technical expertise among founders and their first employees. They have business ideas first and foremost. So a tool like that would help exactly the step you want done, switching to a nicer and more manageable stack as growth allows / demands it.

JosephRedfern · on Dec 21, 2017

This sounds similar to Dropbox's PyAnnotate -- Guido van Rossum writes about it here: http://mypy-lang.blogspot.co.uk/2017/11/dropbox-releases-pya...

Would be interesting to see how MonkeyType and PyAnnotate compare.

ambivalence · on Dec 21, 2017

MonkeyType is Python 3-centric, PyAnnotate is Python 2-centric.

PyAnnotate couples re-applying the types with the tool, and uses type comments for this.

MonkeyType generates .pyi files that you can either use directly or re-apply them to your code as proper type annotations.

Other than that, MonkeyType is used on a daily basis internally and solves a bunch of common annoyances of systems like these, like duplicates in unions, applying better types to stuff that was already hand-annotated with Any, etc.

sbuccini · on Dec 21, 2017

I'd also like to hear more about this -- both the feature set and the development process. It's interesting that two large engineering organizations responsible for some of the most popular applications on the planet spent hundreds of engineering hours building almost the same exact tool at around the exact same time.

I'd also like to know how much quicker or better it could have been completed if it had been done out in the open.

ambivalence · on Dec 21, 2017

The idea to gather types at runtime is as old as PEP 484. The Dropbox and Facebook teams working on Python type checking know each other. We both worked on our implementations independently since we wanted to first test internally whether the idea holds water. For example, I personally thought it wouldn't be as useful in practice as it turned out to be!

We knew we're going to open-source each others' implementations, esp. that Instagram's is focusing solely on Python 3 which isn't useful for Dropbox at the moment. It just took a while to get through the process of open sourcing what we had (cleaning up the early implementation with limited documentation, decoupling from internal data stores, etc.).

Would it be cheaper if this started out in the open? Probably, but I don't think by quite the margin as you expect.

Traveler42 · on Dec 25, 2017

Even though MonkeyType is focused on Python 3, can the .pyi files be used with Python 2?

maxxxxx · on Dec 21, 2017

"I'd also like to know how much quicker or better it could have been completed if it had been done out in the open. "

I assume they scratched their own itch so it was probably faster to do it on their own addressing their specific needs.

troels · on Dec 21, 2017

Interestingly, I made a similar tool for php some time ago. It was recently revived and is now in active development to bring it up to speed with recent developments in the language.

https://github.com/troelskn/phpweaver

muglug · on Dec 22, 2017

That is fascinating. I’m the author of a static analysis tool for PHP that can generate types, but clearly not at the same level as runtime analysis. I’ll use it on my company’s codebase and report back.

troels · on Dec 22, 2017

It would be great if you can give any feedback. There are quite a few rough edges currently.

didibus · on Dec 21, 2017

Its interesting, this is a similar learning Clojure came around with, that the types weren't really useful unless everything is typed. Though Typed Racket's solution was to promote types to runtime validation at those borders between things with types and things without.

I do find it intriguing though, that adding back types manually is so hard and slow. Is it slower when done retroactively? Or is it just as slow when done at the same time, but we don't realize its overhead?

lmm · on Dec 22, 2017

It's a lot slower to do retroactively. You basically have to tell the computer why you believe something is correct - e.g. if you're moving to having a distinct type for non-empty lists because some functions are only valid for non-empty lists, you have to explain why you believe a list you're passing to such a function is non-empty. That's a lot easier to do at the same time you're doing it (even in Python you'd probably still ask yourself whether you knew the list was non-empty as you were writing it) than to come back months or years later and remember why.

mratzloff · on Dec 22, 2017

It's slower when done retroactively. A developer writing typed code has the problem domain and software design unmarshaled into his brain while writing his small section. Type testing occurs concurrently with feature testing. Types added retroactively need to be applied to a large codebase and undergo separate testing.

Cieplak · on Dec 21, 2017

Given that people have asked why not use a statically typed language, seems appropriate to mention that it's possible to write pythonic-looking C++:

http://preshing.com/20141202/cpp-has-become-more-pythonic/

I have been using C++ a lot lately but really wish there were more tools for reflection at compile time, e.g., ability to iterate over all the members of a class. Other than that, I'm really loving C++17's auto template parameters and type deduction capabilities, plus code that's 200x faster at runtime than most interpreted languages. I've found autocompletion in CLion to be slightly better than autocompletion PyCharm, but not quite as good as IPython or IntelliJ with Java.

Waterluvian · on Dec 22, 2017

That's kind of missing the point though. Languages are tools for different purposes. You may love chizels and lathes and saws but you wouldn't build a suspension bridge with wood. Just like I wouldn't build a cabinet with cement, steel, and rebar.

Picking C++ over Python is like picking woodworking over metalworking.

Python being slow is never an issue unless someone is insisting on using the wrong tool for the job.

dataflow · on Dec 22, 2017

> Python being slow is never an issue unless someone is insisting on using the wrong tool for the job.

So you think, for example, Numba (and everything that uses it) is misguided?

semi-extrinsic · on Dec 22, 2017

Numba is like laminating wood to build structural beams - it will get you close to the performance of metal, for some applications, if you can accept the weight increase etc.

Numba could be seen as misguided from some points of view. E.g. when using Python for high performance scientific computing, you will typically be writing your computational kernels, I/O etc. in some compiled, superfast language (C/Fortran/CUDA/whatnot) and all the input handling/case setup/etc. in Python. If 1% of your compute time is spent Python and 99% is carefully optimized C, Numba is obviously pointless.

But that's for one application. Python is used for so many different things that you can't make blanket statements like this.

lmm · on Dec 22, 2017

Why C++ though? In e.g. Scala you have much more pythonlike code than even modern C++, full type safety, much better performance than Python, strong IDE support, and the ability to do the things you would want compile-time reflection for (e.g. typeclass derivation and similar use cases for "iterate over all the members of a class") in a completely type-safe way without needing dangerously flexible macros or code generation. Other ML-family languages will be similar.

aub3bhat · on Dec 22, 2017

Please show me C++ equivalent of

    sorted([(k.weight, k.name) for k in somelist], reverse=True)

Wehrdo · on Dec 22, 2017

I'll bite.

  std::vector<std::pair<float, std::string>> in_order;
  std::transform(data.begin(), data.end(), std::back_inserter(in_order),
                 [](auto& item) {return std::make_pair(item.weight, item.name);});
  std::sort(in_order.begin(), in_order.end(), std::greater<>());

While I won't claim it to be as elegant as Python, it doesn't seem too ugly. Does anybody have an idea about how `auto` could be utilized to avoid the long vector<pair...> type declaration?

For the adventurous: https://repl.it/repls/RepentantGentleKudu

jdashg · on Dec 22, 2017

Templates, auto, and type toys can give you some serious build-your-own syntactic sugar: https://repl.it/repls/PleasingLovingQueenslandheeler

Pulling things out of tuples isn't great, though.

deathanatos · on Dec 22, 2017

Note: my C++ is extremely rusty.

Right now, I think that's approximately,

    vector<tuple<int, string>> output;
    transform(
        somelist.begin(), somelist.end(),
        back_inserter(output),
        [](const auto &f) { return make_tuple(f.weight, f.name); }
    );
    sort(output.rbegin(), output.rend());

Ranges, I believe, would reduce this a lot, possibly even to a single line. If I am reading the docs on it correctly, something like,

    vector<auto>(somelist | view::transform([](const auto &f) { return make_tuple(f.weight, f.name); })) | action::sort;

I think.

Two notes, however:

1. I feel like most of the desire for a static language is to know what type something is. Is C++ exactly as brief as Python? No, as I think you've demonstrated. But I think you're a lot more likely to know the type of something. Rarely do I think I find that Python has annotations, and annotations can be wrong.

2. C++ is, in general, I feel, much more explicit about where copies occur. I elided one of the copies in your example, opting instead for an in-place sort (but this is trivial to fix in the Python).

lmm · on Dec 22, 2017

> 1. I feel like most of the desire for a static language is to know what type something is. Is C++ exactly as brief as Python? No, as I think you've demonstrated. But I think you're a lot more likely to know the type of something.

Sure, but you don't have to choose between them, there are plenty of languages where you can have both Pythonic terseness and full type safety. E.g. Scala:

    (for {k <- somelist} yield (k.weight, k.name)).sorted.reverse

Many other strongly typed (ML-like) functional languages are similar.

dataflow · on Dec 22, 2017

> vector<auto>

Really? O.o How could this possibly work?

deathanatos · on Dec 24, 2017

Oh, that was probably a typo (it was late); replace that with a concrete type.

dataflow · on Dec 22, 2017

Here (http://rextester.com/QUIK26485):

  somelist | transformed([](auto &&k) { return make_pair(k.weight, k.name); }) | sorted | reversed;

Cieplak · on Dec 22, 2017

Easy:

    std::vector<P> people = { P{"jane", 47}, P{"mary", 71}, P{"john", 65} };

    ranges::sort(people, [](auto& x, auto& y){ return x.weight > y.weight; });

Compile this gist with `c++ -std=c++1z -I range-v3/include`:

https://gist.github.com/cieplak/dcd587c67d989768900e4110e776...

joshuamorton · on Dec 24, 2017

What if two weights are equal, the python code will also sort by name as well ;)

tigershark · on Dec 22, 2017

No idea about C++ but in C# seems quite more readable (and flexible)

    someList.OrderByDescending(x => x.weight).ThenByDescending(x => x.Name);

It's quite similar to expressing the concept in English, certainly more than using list comprehension in Python.

And how would you order it in Python by ascending on the first field and descending on the second using list comprehension?

joshuamorton · on Dec 24, 2017

That's not the same thing, you've sorted a list of objects, we're looking for a list of tuples of `weight, name`)

In answer to your question though,

    sorted(((k.weight, k.name) for k in some_list), key=lambda x: (-x[0], x[1]), reverse=True)

appears to work. This does use a non-obvious trick, but being more explicit is a smidge difficult, since the key function is called only n times, as opposed to O(nlogn) in the C# example.

Alternatively, you can use

    sorted(sorted(((k.weight, k.name) for k in some_list), key=lambda x: x[1], reverse=True), key=x[0])

Which is more like the original example, and if you're doing it in place, you get

    outs = [k.weight, k.name) for k in some_list]
    outs.sort(key=lambda x: x[1], reverse=True)
    outs.sort(key=lambda x: x[0])

Python's builtin sort is timsort, so despite sorting the list twice, this will still run in approximately NlogN comparisons, not 2NlogN.

You could also manually define a custom comparator, ie

   lambda s, o: (s[0] > o[0] * 10 + s[1] < o[1])

and pass it to `functools.cmp_to_key`.

nickelbox · on Dec 22, 2017

You could implement the following with a decent bit of work (declaring k and reversed to be variables of special, hand-written/macro-generated types with overloaded operator, and operator=).

sorted( (k.weight, k.name).for_(k).in(somelist), reverse=true));

You would never be able to get that past code review, however.

joshuamorton · on Dec 22, 2017

Sure, but at that point you've just reimplemented python in macros ;)

The other thing to note is that

    sorted([(k.weight, k.name) for k in somelist], reverse=True)

is essentially already typechecked:

    def biggest_ks(ks: K):
        return sorted([(k.weight, k.name) for k in ks], reverse=True)

The above code now has all the same type guarantees as your c++, actually maybe more since the macros you use are going to be...uhhh, mysterious.

nickelbox · on Dec 22, 2017

The macro would only be used to generate k for convenience. It could be all implemented in straight, macro-less C++98 in O(minutes).

My point was just that you can implement almost whatever you want (even without macros, they'll just expand the design space).

joshuamorton · on Dec 22, 2017

I don't see how you could implement the (k.attr).for_(k) part. That's essentially an assignment, a macro could maybe convert it to a lambda, but I don't see how it would be done macro-free.

BucketSort · on Dec 22, 2017

I come from a statically typed background (C++), but have been doing a lot of analytics in python in the past two years. It is frustrating not to have compile time guarantees when dealing with mathematical programs, because some things have to be a particular type (i.e. matrices of compatible dimensions). The result is a copious use of asserts, but it feels bad when you know that if you did this in a functional language,let's say, you could prove implementations are correct by the nature of the type system. In short, I'd love to see more strong type support in python.

joshuamorton · on Dec 22, 2017

>(i.e. matrices of compatible dimensions)

What language do you use where you can get these kinds of guarantees? As far as I know very few languages provide those kinds of dependent types statically.

saagarjha · on Dec 22, 2017

C++ makes this possible via templates. Generally the size is moved to a template argument, which allows the compiler to check this at compile time (of course, this restricts you to statically sized matrices).

joshuamorton · on Dec 22, 2017

Good to know. I was apparently unaware how powerful templates were.

zbentley · on Dec 22, 2017

Don't feel too bad, almost everyone is. And almost all of the people who are aware of how "powerful" they are tend to equate "hellish complexity" with "expressive power" and are not people you want to work with.

There are a few shining exceptions, but not many.

laverick · on Dec 22, 2017

https://stackoverflow.com/a/22645853

joshuamorton · on Dec 22, 2017

I think I knew that templates were Turing complete, but so are java generics, it's just that to get dependent types in generics you have to reinvent the integers within the generic system. Not so for templates, which I didn't realize. That's pretty nifty!

OskarS · on Dec 22, 2017

In graphics programming, you have specific types for lots of small vectors and matrices (for vectors, there are separate types for every sizes 4 and below, and matrices usually comes in a variety of sizes as well, at the very least 3x3, 3x4 and 4x4).

Typechecking is very useful here: if you try to transform a point in space represented by a Vector3 by a general 4x4 matrix, it fails compilation because you have to convert the point to homogeneous coordinates first. Very useful information from the type system.

joshuamorton · on Dec 22, 2017

I'm aware of that, I was more thinking in the general case, as comes up in machine learning for example, where you have sizes like 128, 192, and odd shapes like 12x3x100x100 4-tensors, etc.

That is, generalized matrix types, not simply rotation matrix types or whatnot for special cases.

antoinealb · on Dec 22, 2017

Eigen (a popular matrix library) does it in c++.

auxym · on Dec 22, 2017

I believe in the functional world, that sort of thing is implemented suing a feature called dependent types. It's not a very common feature, but Liquid Haskell implements it.

ValleyOfTheMtns · on Dec 22, 2017

FYI, it requires Python 3.6+. It mentions it in the article towards the end, but if you're like me and prefer to jump straight into trying something out you may not have seen it. I wasted a bit of time trying to figure out what the ContextManager is in the Python typing module and why it couldn't find it.

mkolodny · on Dec 22, 2017

This looks wonderful :) I'd love to try it out at some point.

One thing that I think could really improve the documentation is a few examples! One of my favorite things about the Python docs and the community is the wealth of examples. From looking at the docs, I couldn't find the main thing I wanted to see - what would MoneyType's annotations look like if I used it?

jimnotgym · on Dec 21, 2017

Reading all of the comments from engineers who seem to either posess a time machine to send current tech back in time, or are criticising the technical choices that made the founders $squillions, is making me a bit mad. As a diversion perhaps some of them could list a few billion dollar startups that made perfect choices at the start and never had any cause to refactor or reimplement code as they grew?

mbid · on Dec 21, 2017

It's not obvious (and IMO somewhat doubtful) that it was the technical choices of the founders that made them successful. Their choices could well have been bad, just not bad enough to make their business fail.

swsieber · on Dec 22, 2017

Whatsap? Just kidding, they probably had to refactor code too. But their technical story seems a lot better than most startups

dilap · on Dec 22, 2017

> At Instagram we have hundreds of engineers working on well over a million lines of Python 3.

Man, that's crazy. At the time they were acquired by Facebook, they had 13 employees.

kilpikaarna · on Dec 23, 2017

This was the part that stood out to me also. They must have a ton of new stuff in the pipeline, or their "display photos, insert some ads in between" loop is way more complex than it seems.

Or that's total number of engineers and way fewer actually twiddle the Python...

Lxr · on Dec 22, 2017

Is their goal to annotate everything or just the non-obvious things? Also how would a tool like this handle cases where the “correct” type is a generic base class but at runtime it only sees a certain subclass? To be pythonic, a function that accepts a tuple should usually also accept a list for example, but at runtime that may never happen.

ambivalence · on Dec 22, 2017

Since this is how gradual typing works, the goal is to annotate every last function.

Good question about abstract base classes! Paraphrasing a well known cliché: types in functions should be forgiving in arguments (what the function accepts) and strict in return values (what the function emits). In our case, the human reviewer needs to decide if the argument types collected by MonkeyType should be generalized. In fact, the collected types might not even work in all cases and the type checker might complain. It's because annotations describe "what should be" whereas MonkeyType finds "what is". This is why a system like MonkeyType shouldn't even attempt to use abstract base classes in place of concrete types that it collected.

muizelaar · on Dec 21, 2017

I wonder how this compares to PyAnnotate: http://mypy-lang.blogspot.ca/2017/11/dropbox-releases-pyanno...

maltalex · on Dec 21, 2017

Reading this as someone who writes mostly in statically typed languages, the whole exercise seems odd.

Having so much dynamically typed code to maintain that you need to run production code using a separate tool just to figure out the types sounds just wrong. Why not use a statically typed language for such a large code-base? Is this done by purpose, or did they end up with a million lines of Python code and are looking for ways to make the maintenance easier?

And before I get down-voted to hell - I completely understand using Python for many things. It a good technical choice for many different problems, but navigating a million lines of Python seems just daunting to me (although maybe I'm just not experienced enough with Python).

nawgszy · on Dec 21, 2017

>Is this done by purpose, or did they end up with a million lines of Python code and are looking for ways to make the maintenance easier?

Definitely the latter. I've seen this discussion a few times before, and it's always the same. Your initial developers are not looking down the road to the million lines of code milestone, they're just trying to make a product that might actually make some money here and now.

I'm sure Instagram was exactly that. They needed to handle images and some guy knew how to do it in Python. They wrote Python code, and then people liked Instagram. They eventually became a billion dollar company with millions of lines of code and no where along the road was there time to say "hey we need to refactor this whole thing". Or if that was said, management laughed and said "we need this feature".

So here is where you end up. The developers need to clean things up but they don't have time to clean it up by using a language, realistically, they probably don't know as well as the Python they wrote the millions of lines of code in.

Re: your last comment, navigating a million lines of any codebase is daunting, and especially more so if you aren't a developer in that language. I'm not sure what exactly "Python" has to do with that, besides that you're not a Python dev.

imiric · on Dec 21, 2017

To add to this, note that type hinting is quite a new feature in Python (introduced in v3.5, released in 2015), and this functionality simply wasn't available before. So any company heavily invested in Python today obviously wants to improve their runtime reliability, without having to rewrite parts of their stack.

Stricter typing goes a long way to achieve this, and gradual typing allows you to upgrade the code base at your own pace, which is great.

Consider this study[0] about TypeScript and Flow, which use the same approach for JavaScript, which found both able to detect ~15% of runtime bugs. So no wonder companies with large Python code bases would be the first to invest in this space.

Personally I feel this is a great addition to the language, and hope type checking becomes a first class citizen too, instead of being delegated to external tools like mypy[1] or pytype[2].

[0]: http://ttendency.cs.ucl.ac.uk/projects/type_study/

[1]: http://mypy-lang.org/

[2]: https://github.com/google/pytype

miohtama · on Dec 21, 2017

Dropbox is very heavily invested in Python. I am under impression they hired Guido van Rossum to do exactly this, among other things. First 100% statically type the old codebase, then port it to Python 3.

You can statically type Python 2 codebases, but the language does not offer native support for it. Thus, all needs to go to docstrings or comments.

jw- · on Dec 22, 2017

I'd consider those languages optionally typed, rather than gradually typed, as they don't insert run-time type checks. Regardless, I still think it's all interesting and valuable work.

forgotpassagan · on Dec 22, 2017

I can picture these poor souls vividly. Millions of lines of python, flowing like the mightiest of rivers. Nobody really knowing whence it cometh and goeth.

A hero arises, offering a sacred herb to calm the torrent and light the golden path. The hero is elevated, yet they continue to pray

btown · on Dec 22, 2017

They say there is a holy land called Haskell, but it is only revealed to the truest of believers without the weight of the chaotic-neutral entity “Shareholder” weighing ever so heavily on their shoulders. For those in Shareholder’s clutches, one must forgive their prayer. ‘Tis the best they can do.

forgotpassagan · on Dec 22, 2017

There is something else, in the darkest reaches. It has many incantations, but the non-believers have a singular name. Lisp.

arbie · on Dec 22, 2017

With it's comforting, reassuring warmth, Perl shines on as a luminous sun, lighting the way for youngling languages to learn from. Hushed whispers foretell the sunset, but none truly believe them...

vgy7ujm · on Dec 22, 2017

Back to Perl folks.. The freedom, the happiness !!

jsjohnst · on Dec 22, 2017

> forgotpassagan

They have password managers you know and also spell checkrs too. :P

tim333 · on Dec 22, 2017

Looking it up, it seems pretty much that:

Kevin Systrom " thought of combining location check-ins and popular social games. He made the prototype of what later became Burbn and pitched it to Baseline Ventures and Andreessen Horowitz at a party. He came up with the idea while on a vacation in Mexico when his girlfriend was unwilling to post her photos because they did not look good enough when taken by the iPhone 4 camera." (Wikipedia)

He used Django because I guess that was an easy way for one guy to do it fairly quickly. The app was Burbn which then pivoted into Instagram.

By the way I kind of surveyed the "what framework should I use" stuff on HN over the last year and Django still seems the most popular, probably followed by Rails and Phoenix.

IshKebab · on Dec 21, 2017

Navigating a large code base that is dynamically typed like Python is far more tedious than something like C++ or C#.

First you can't read what the types passed into and out of functions are. You have to find their usages to work it out. Second, you can't reliable do things like "find usages" or "go to definition" because of the dynamic typing.

andybak · on Dec 22, 2017

> Second, you can't reliable do things like "find usages" or "go to definition" because of the dynamic typing.

In my experience PyCharm can do both correctly for the vast majority of cases.

walrus1066 · on Dec 22, 2017

Still very limited, for example, if I have:

def some_func(foo):

   foo.run()

   ...

Find usages in the run() method will return dozens of results, the IDE can't help you any more, to find what 'foo' is at runtime.

andybak · on Dec 25, 2017

Isn't that basically saying "it fails in the kind of cases that wouldn't even be possible in a statically typed language"?

lmm · on Dec 22, 2017

Indeed; it only fails in the really fiddly cases where you need it the most.

toast0 · on Dec 22, 2017

Most of the c++ and c# code I see lately has so many things declared as auto, it's hard for me to figure things out too.

tigershark · on Dec 22, 2017

Are you using notepad to write c#? If you hover on the var you will see the tooltip with the type.

toast0 · on Dec 22, 2017

When I was writing c#, I did use visual studio, but since I'm used to developing in a terminal editor, all these tooltips and things are a little tricky -- sometimes they disappear, and then I can't get them back, etc.

But more often, when I'm looking at c# or c++, it's not code I wrote, it's not code I intend to change, it's code that's interacting with my code (written in another language) that I'm trying to see why it's misbehaving, so I can get the owner to fix it. I could be reading the code on GitHub or some other web view, I might have checked it out, but I have no interest in setting up a (probably new) IDE to look at it as the author would; I dig into too many projects to learn that many tools -- and deal with the upgrade cycle for them.

Sure, it would be useful to hover and get more information, but I'm used to loosely typed languages, so it's not awful. It's just jarring to see that the type information is apparently not important enough to write down the name in c++ or c# anymore.

klipt · on Dec 22, 2017

Would be nice if IDEs had a key combo to fill in the actual type for "auto"s!

maxxxxx · on Dec 22, 2017

That would be nice!

rusk · on Dec 21, 2017

I'm not sure what exactly "Python" has to do with that

Well as he said, a statically typed language is better in that kind of situation because it enables a better class of tooling and the typing system enforces certain style constraints, that enables better quality of code analysis en mass.

Python specifically is very lightweight in this regards with little in the way of naming constraints (vs for instance Ruby having different formatting rules for different types)

So yeah, it’s not “just like any other language” - horses for courses

scrollaway · on Dec 21, 2017

I really don't get people. Instagram and Dropbox, through typing annotations in Python, are gradually improving a language that has codebases running globally, from YouTube to NASA.

Clearly something is right with the situation when the incentives are aligned for a tech company to contribute back to the open source community in such fundamental ways. So why look for the mole and think "They should have done it differently", when doing it differently has a high likelihood to mean not being as successful as they are today, and not having the occasion to contribute back?

It's like telling a successful charity "You should just take everyone's money and spend it on lamborghinis instead of wasting time building wells in africa".

kod · on Dec 21, 2017

No, it's like asking someone who spent a lot of time building an octagonal wheel and is now trying to shave down the corners... why didn't you use a circle to begin with.

pfranz · on Dec 21, 2017

To me, this is a metaphor that might better explain it, "You need a different approach for getting your first million than your second." (I've heard it attributed to customers, revenue, personal income, etc). It sounds like Python (and features like dynamic typing) works very well at bootstrapping and developing. They're leveraging different features (more like static typing) once they get larger and more time is spent on maintenance (and rewriting everything isn't appealing [1]).

Honestly, your octagonal wheel metaphor works, too. Building the first car you spend a lot of time on octagonal (crude) wheels, but later spend a lot more money on round (precision) wheels. You could have gone bankrupt spending money originally on round wheels that were the wrong size.

[1] https://www.joelonsoftware.com/2000/04/06/things-you-should-...

ageofwant · on Dec 21, 2017

They did not spend a lot of time building an octagonal wheel, they built a billion dollar company using Python. Now, when the code base has been proven, and the business rules solidified, they retrofit what they believe will make the code base easier to maintain.

Python is an excellent enabler of this kind of dynamic system evolution.

angelsl · on Dec 21, 2017

You've totally missed the analogy. See pfranz's comment.

depressedpanda · on Dec 22, 2017

No, I the analogy just wasn't very apt.

Python allowed them to build a successful company. Now, when their stack is mature and maintenance is more important than rapid prototyping, Python allows them to add type hinting.

Because they are engineers, they built a tool (in Python) that allows them to do it in an automated manner.

And all of this is great!

They are evolving their code to fit their needs; it's nothing like making a octagonal wheel and wishing you'd have gone for a round one in the beginning.

tjoff · on Dec 22, 2017

The point is that there is no property of python that allowed them to build a successful company that any statically typed language don't have. It is a completely unnecessary detour.

The cost of doing it right from the start is negligible.

c22 · on Dec 22, 2017

The first language I learned (after Applesoft BASIC) was C. I wrote C for a long time. About 6 years ago I picked up Python. Today I find it much easier and much more pleasant to spin up a new idea in Python, to the point that it is my default choice for new projects with fuzzily defined goals. None of my ideas have become companies, but I could totally see just sticking with Python even past the point that it became unwieldy.

tigershark · on Dec 22, 2017

Between c and Python there is an ocean of languages...

walrus1066 · on Dec 22, 2017

I'm with the OP on this, I've experience with Java, C#, Python & C++.

I'm a big fan of rapid prototyping with Python to map out problem domains, and once the domain has been mapped properly, rewriting in a statically typed language if necessary.

Python is much better for prototyping than the other langs I've used. Because the syntax is almost pseudocode, and the duck-typing makes a lot of design patterns and boilerplate obsolete, so I can dedicate my headspace to the problem at hand.

Right tool for the right job, as they say.

c22 · on Dec 22, 2017

True enough. I've dabbled in several. These are just the two I happen to have the most experience in and the ones that seemed relevant to my point.

oblio · on Dec 22, 2017

You're assuming that the original developers would have been just as productive in a statically typed language as they were Python.

Big assumption.

Both because they might have known Python already and also because Python is quite a bit more newbie-friendly, concise and expressive than the mainstream statically typed languages.

tjoff · on Dec 22, 2017

Obviously what the programmers knew to start with is of importance. But that's no property of the language (well, sure, being easy to pick up increases the risk).

But no, ignoring that it is not a big assumption really. The benefits of static typing comes pretty quickly, especially if there are more than one programmer.

ageofwant · on Dec 22, 2017

I think the general consensus is that that is not true. Python's dynamic nature is a clear advantage it has over statically typed languages. Add the fact that you can elect to tune down the dynamism when it makes sense to with very little impact on your existing stack makes Python the technological superior choice for the majority of applications.

tigershark · on Dec 22, 2017

Absolutely not. Python is not the technological superior choice for the majority of the applications. If you think so then your experience in different domains and application types must be very limited.

The preconcept that dynamic languages are more productive is just an illusion because you can easily take shortcuts that will hamper your progress in the future. A proper typed language with HM type inference has the ability to mostly avoid writing the types with the guarantee that the compiler will catch most of the pitfalls. And if you don't do any logic error pretty much every time your code just works. Saying that in a million line application Python is a better choice than F# or Haskell it's frankly ridiculous in my opinion.

joshuamorton · on Dec 22, 2017

Empirically, there are a lot more million line python codebases than F# or haskell codebases, in fact I can name multiple million line python codebases, and 0 F# or haskell codebases. Given that, logic would indicate some sort of failure on the part of haskell and F#, or they would see wider adoption among the large codebases where they are so useful.

Do you disagree?

tigershark · on Dec 22, 2017

Given how much smaller is the F# community and how much more you can crank in less lines of codes in F# I can believe it. Between C# and F# there is about an order of magnitude of difference in the LOCs for big projects and C# and Python are comparable from this metric.

joshuamorton · on Dec 22, 2017

It appears you missed my point. I can't think of a 100kloc f# or Haskell codebase, so even if they were 10x as terse as python, which they aren't, python comes out ahead. If they're so much better, why don't people use them?

kod · on Dec 23, 2017

I can think of 100kloc Scala codebases, e.g. Kafka.

People do use ML family languages, and they are better. There are plenty of non-technical reasons they aren't as widespread as dynamic languages or shitty static languages.

joshuamorton · on Dec 23, 2017

Right, but Scala is different from Haskell or f#. It doesn't use the same kind of type inference (hm) as classical ml derivatives.

wumpus · on Dec 22, 2017

Does this mean I have to return all the money I made?

sidlls · on Dec 22, 2017

There's nothing here that tunes down the dynamism. Hints aren't statically checked or enforced. It's still possible to pass in an empty list to an int-hinted var and, e.g. have `if not var` evaluate to True (rather than raise an Exception).

Type hints allow external tools to check some things, but at this point you're basically imposing static types so why not use a language with the tooling and optimizations to take advantage of that?

Python is a good choice to prototype, write small (less than a few thousand lines of code) projects with non-trivial complexity, and somewhat larger projects with more boilerplate (e.g. Django webapps). Beyond that its utility diminishes until it starts to become a hindrance.

ubernostrum · on Dec 22, 2017

Ah yes, this old chestnut: "(language I don't like) is only suitable for teeny-tiny puny baby child's toy programs, and once you're not writing those anymore you must use a big strong grown-up language like all the other Real Programmers™ do!"

The empirical evidence of reality is against you: there are successful large (in terms both of codebase and contributors/development team) projects in these awful terrible children's languages, and there are unmaintainable failed piles of crap in even the most grown-up of languages you'd care to name. The choice of language, and choice of type system, seem not to correlate with the success or failure in a meaningful way.

blub · on Dec 22, 2017

It doesn't correlate with success, but the choice of language does correlate with development speed, number of faults, maintainability, etc.

The interesting thing to note is that a language that's perfectly acceptable at the above at small or medium scale might turn into a hindrance at large-scale. An otherwise fast to develop in language like Python won't be so fast if every change has to be painstakingly reviewed and tested due to the complexity of interactions in the code base.

Using a type system to verify assumptions/requirements is not a recipe for success, but it can improve reliability.

ubernostrum · on Dec 22, 2017

won't be so fast if every change has to be painstakingly reviewed and tested due to the complexity of interactions in the code base

You can write spaghetti code in any language, it turns out. Blaming the language for that is not really an indicator of understanding the problem.

blub · on Jan 2, 2018

Yes one can do a poor job in pretty much any situation, I'm afraid that's not an argument for anything though.

Here we're talking about average or best-effort: large code bases are complex in spite of the best intentions of their maintainers, so using tools that can manage that conplexity in an easier way through e.g their type systems could lead to better results.

joshuamorton · on Dec 22, 2017

>Type hints allow external tools to check some things, but at this point you're basically imposing static types so why not use a language with the tooling and optimizations to take advantage of that?

Python's type system is, imo, currently better than Java's, and the syntax is cleaner than java's or C++s. You get all the benefits of static typing without having to put `auto` and `List<>` everywhere. And at the same time, you get all of the advantages that python has over statically typed languages that aren't haskell (like comprehensions). And, when you need to, if you're doing something that's especially tricky or dynamic or whatnot, you can fall back to untypedness.

I think the closest parallel I can draw is to something like Rust. You get a huge set of guarantees for free, but can opt to do unsafe things when it's absolutely necessary, and better yet, you can start in unsafe land and then go back later and make sure your code is safe.

I'm curious what tooling you feel that say, Java, has over type-annotated python.

sidlls · on Dec 22, 2017

Hints don't provide any guarantees. It's still possible to silently and unknowingly pass the wrong type of value into any given argument, with or without the checker. The "tooling" the other languages have includes a compiler that performs these checks in a way that Hints + Checker-of-choice is unlikely (or unable) to. "What do you think these checkers do?" you might ask. The answer is: not nearly what a compiler does.

joshuamorton · on Dec 22, 2017

> It's still possible to silently and unknowingly pass the wrong type of value into any given argument

This is also possible in traditionally statically typed languages. Nothing stops you from doing unsafe casts or using reflection. Much like its exceedingly unlikely that you'll run across this in "normal" java or C++, its exceedingly unlikely for you to run into any issues with this in python. And, in fact, the typechecker has ways to handle unusual things like dynamically created attributes, for when that comes up.

And yes I mean this quite honestly. I've seen a lot of typechecked code, some of it quite ridiculously dynamic. Typecheckers perform absolutely fine.

>The "tooling" the other languages have includes a compiler that performs these checks in a way that Hints + Checker-of-choice is unlikely (or unable) to.

What way is that? Typechecking is static analysis. There's really no difference between how java or cpp does typechecking and how mypy does, other than that the python typechecker isn't installed by default.

>The answer is: not nearly what a compiler does.

This is not an answer.

sidlls · on Dec 22, 2017

> This is also possible in traditionally statically typed languages. Nothing stops you from doing unsafe casts or using reflection.

Neither of those is "silent" or "unknowing".

> What way is that? Typechecking is static analysis. There's really no difference between how java or cpp does typechecking and how mypy does, other than that the python typechecker isn't installed by default.

The typechecker can't handle un-hinted code (or, rather, it chooses something very permissive, like 'Any' for all hints). It's incomplete at best.

> This is not an answer. It is. That you don't like or agree with it doesn't make it not an answer.

joshuamorton · on Dec 22, 2017

>The typechecker can't handle un-hinted code

And in Java or c++, un-hinted code couldn't compile. The python type checker can do more than a java or c++ checker in this regard.

>Neither of those is "silent" or "unknowing".

They're exactly as silent or unknowing as you would get in typed python code. You appear to be comparing untyped python. That's an incorrect comparison. Offhand, I actually can't think of anything I could do in typed python that would get around the type checker, that wouldn't be considered reflection or a dynamic cast, and be very obviously so in python too. If you have an example of a silent or unknowing failure of well typed python code that passes on mypy, you should probably file a bug report ;)

>That you don't like or agree with it doesn't make it not an answer.

You're right. Its not an answer not because I disagree with it (I don't), but because it doesn't actually answer anything, which is why I don't disagree with it.

To summarize this:

Python typecheckers are capable of more type inference than Java, and require less syntax than c++ or Java to get well typed code. A typed python codebase can interact cleanly with an untyped python codebase, and within the typed parts of the code, you get equivalent safety guarantees to what the type systems of Java or C++ provide.

Your appear to be ascribing magical powers to compilers in other languages, when those compilers have exactly the same type information as mypy does.

In other words, going back to your first statement:

>Hints don't provide any guarantees.

Hints provide exactly the same guarantees as any other type system: "Assuming you write reasonable code that doesn't attempt to subvert the type system, the type system will catch any dumb mistakes you make."

That's the exact same guarantee you get in any statically typed language.

tigershark · on Dec 22, 2017

No it's absolutely not. In Haskell and F# you don't need to write types annotations to get that guarantee.

joshuamorton · on Dec 22, 2017

Nor do you in python in many cases, it's type inference is quite good, certainly better than c++ or java, which is what I was speaking of.

But you're right, I should have specified "traditional" static language.

tigershark · on Dec 22, 2017

"That's the exact same guarantee you get in any statically typed language." You wrote that and f# and Haskell are statically typed

tjoff · on Dec 22, 2017

You don't put auto everywhere. You write out the type in 99% of cases and save auto for 100+ character templated types.

There is no reason to to save the literally 0.2s (you can still spend that time reasoning about your code) it takes to write the type. It is better for yourself writing it and for readability to be explicit.

joshuamorton · on Dec 22, 2017

By everywhere, I mean where it's otherwise obvious:

    auto s = "Hello world";
    for (auto c: s) {
        cout << c;
    }

Those autos don't need to exist, they're completely inferable, otherwise you wouldn't use auto. It's not like you can use auto in function declarations, nor should you, I agree.

tjoff · on Dec 22, 2017

I always write std::string etc. in those cases. It is consistent and quicker to read and there is not tangible benefit to using auto.

joshuamorton · on Dec 22, 2017

Right, but my point is that there's really no tangible benefit to writing the type at all. In a language with good type inference (Haskell, ml, modern python), the string literal is known to be a string, and you don't need to do any extra work.

jsjohnst · on Dec 22, 2017

> Beyond that its utility diminishes until it starts to become a hindrance.

Tell that to any serious Numpy/Scipy/Pandas user.

sidlls · on Dec 25, 2017

Being the lead on a data science team I am one of those. They're great for exploratory research and prototyping, and for use in the very tiny fraction of code in a production system that deals with machine learning if I must. For everything else, from the data pipeline to delivering results, I'd prefer and recommend something else, like Haskell, C++, go or Rust.

zachthewf · on Dec 21, 2017

I agree with you but I think it's pretty straightforward how this sort of thing happens:

1. Startup builds thing fast in dynamic language because they need to optimize for development speed and iteration, not maintainability or scalability.

2. Startup grows and continues to hire for expertise in the tech stack they are mostly already using.

3. Repeat for some years and some hundreds of engineers and you arrive at this exact scenario.

gxs · on Dec 21, 2017

This is exactly it.

People are running a business, not writing an a treatise on code maintenance and hygiene.

blub · on Dec 22, 2017

Ok, then the business men among us should learn their lesson, that agility matters.

The software developers among us should also learn their lesson: don't build large-scale software in dynamic programming languages unless you can afford to spend time later adding a static type system on top.

walrus1066 · on Dec 22, 2017

I'd think the business men at Instagram have been very happy with the agility of development, that got them to a $2.8 billion revenue p/a company. More than enough to cover the engineering effort to help improve the maintainability of the code.

yen223 · on Dec 22, 2017

Considering the company that bought them runs on PHP, perhaps we should all ditch Python altogether.

gxs · on Dec 22, 2017

I agree - there's just always a condescending tone in the way people question the technical practices in hindsight of companies that are successful.

LrnByTeach · on Dec 21, 2017

> 1. Startup builds thing fast in dynamic language because they need to optimize for development speed and iteration, not maintainability or scalability.

> 3. Repeat for some years and some hundreds of engineers and you arrive at this exact scenario.

fiatjaf · on Dec 21, 2017

Isn't it faster to write everything in Go, for example, that has a compiler guiding you all the way and you rarely get runtime errors? I feel my developer time very much "optimized" when writing backends in Go than when I wrote then in Python (I also tried Node, which was a disaster).

meowface · on Dec 22, 2017

It's probably faster to verify that your Go code is working as intended, but many would argue that dynamically typed/interpreted languages are faster to write code in than statically typed/compiled languages. Others would argue that the pure "writing code" part isn't the majority of what takes up a developer's time. There's no one right answer.

jimnotgym · on Dec 21, 2017

Go wasn't a serious option for Instagram though. According to Wikipedia Instagram launched in 2010, Go launched in 2009. That would have made them very early adopters, was the library support there back then like it was for Python?

Just because there may be better tools now, should they scrap their working code that earned their fortune?

lmm · on Dec 22, 2017

Well, OCaml, Haskell or maybe Scala would've been mature options in 2010, combining safety with language-level productivity that's comparable to Python, though the extent to which web frameworks available in those languages at that time were Django-equivalent is arguable. (Personally I would - and did - happily choose Scala with Wicket over Django in 2010)

In terms of what Instagram should do now, I'd say they should do what Facebook did: introduce thrift or similar, gradually move business logic into backend services written in more suitable languages, leaving the Python to eventually become just a thin web frontend. Retrofitting types onto code involves a lot of the same effort as rewriting it into a better language, and the rewards for the latter are higher, IME.

fiatjaf · on Dec 21, 2017

I'm not talking about Instagram, I'm answering the parent comment, which talked about developer productivity in general.

xapata · on Dec 21, 2017

Empirically? Doesn't seem like it. But it's hard to run a good controlled experiment. The confounding effects of the team members, the project goal, and the vagaries of business are too noisy.

shados · on Dec 21, 2017

We're getting into a world where languages finally have type systems that dont suck for fast development. This wasn't the case until very recently. And many of the current options only became realistically viable in the last few years (or months!).

We still have to work with the world as it exists. Not as it should be or will be. And even with the crop of modern languages, its often still faster to start with less optimal languages and fix shit in the 1/10 chance you're actually successful.

mbid · on Dec 21, 2017

What are you talking about? Haskell is 30 years old.

rspeer · on Dec 22, 2017

You must have missed the part about "the world as it exists" and "fast development" and pretty much the entire point of the post you're replying to.

Haskell remains impractical for many use cases, it is not used much outside of academia, it's not documented to be used outside of academia, and it didn't even have a working package manager until a few years ago.

dmitriid · on Dec 22, 2017

Yeah. And you need 20 language extensions to sit at the top of your every file to do anything useful with it.

switchbak · on Dec 21, 2017

Many in the Haskell community talk about being guided by their types, perhaps you're seeing a similar benefit?

Not sure why the negative vibes toward this comment, I think it's fairly sensible (of course, this is a very subjective subject).

AlexCoventry · on Dec 22, 2017

I think a lot depends on how solid your requirements are. If there's no need for further design as you develop, types are great. If you're writing one to throw away, they're a waste. If you're chopping and changing as you write, it can go either way.

Kalium · on Dec 21, 2017

It's rarely easy to re-implement large or complex codebases in a new language. That kind of major effort requires significant signoffs from leadership and a large effort.

This tool? This tool takes an existing set of codebases and makes them safer. No major boiling of oceans required.

Yxven · on Dec 22, 2017

Personally, I would love to ditch python for a language with strong types and type inference, but what is the replacement for django? Where do I find a well-designed, well-documented, battle-tested framework that I can easily hire developers for?

Currently, I think the nearest competitor is node with typescript, and I'd rather stick with python. Please tell me if I'm wrong.

weavie · on Dec 22, 2017

I dream of a time when OCaml becomes the obvious answer to this question.

bennofs · on Dec 22, 2017

Java? But that's another kind of hell

conceptme · on Dec 22, 2017

PHP 7

lambda · on Dec 21, 2017

> Why not use a statically typed language for such a large code-base? Is this done by purpose, or did they end up with a million lines of Python code and are looking for ways to make the maintenance easier?

Being in a similar (though much smaller scale) situation ourselves, I suspect your latter suggestion is the answer; they wound up with a large amount of Python code due to expediency (Python is good at quickly getting things done), and are now finding the code base quite hard to maintain.

Something to remember is that as of 10 years ago, statically typed languages consisted of C (for some definition of "statically typed"), C++ (complicated and cumbersome, and didn't yet have widespread availability of "modern" C++ features), Java (cumbersome, heavyweight, lots of missing opportunities for abstraction back then), C# (heavily tied into the Microsoft ecosystem, not yet open source), and a bunch of weird academic languages that required tutorials about burritos to learn (ML, Haskell, etc).

While statically typed languages like C, C++, and Java were the mainstream languages of the 90s, C was too limited for a lot of people, C++ and Java too cumbersome to use, and so lightweight dynamically typed languages like Perl, Python, PHP, Ruby, and JavaScript picked up a lot of steam due to how much easier to pick up and more productive many programmers found themselves in those languages. But now we have large, fairly un-maintainable code bases in these languages, and people are realizing the value of static typing, in part due to the maintenance hassle and in part due to newer, more expressive and accessible statically typed languages being available (Elm, Rust, TypeScript, Scala, Go, Kotlin, as well as improvements to C++, Java, and C#).

But that leaves all of these old codebases, that are hard to maintain. Rather than doing a complete rewrite, adding static typing capabilities that can be applied to existing codebases is a way to make them more maintainable without spending all of the time of a complete rewrite and having everyone have to spend all of the time learning the new language while still maintaining the old codebase.

ageofwant · on Dec 21, 2017

Also, these old code bases can now be fairly easily and optionally retrofitted using modern Python's support for type hints and tools like mypy and now MonkeyType. So people can continue using their existing code base and the vast Python ecosystem, with the added benefit of type checking where appropriate.

SatvikBeri · on Dec 21, 2017

My job is mostly machine learning and statistics. I would love to use something like Haskell or even Java, but the NumPy/SciPy/Scikit-learn/Pandas etc. ecosystem is just so far ahead of everything else that it's not worth it.

jpmoyn · on Dec 21, 2017

This tool seems to be built for teams that have already created large codebases in non-typed languages. This is a pretty common thing today. They are addressing a very real problem that a lot of people face which is "oh we fucked up, what's the most painless way we can fix it".

We just recently moved our entire codebase from JS to TypeScript which was pretty hard but super worth it.

walrus1066 · on Dec 21, 2017

Think you trade maintainability for initial productivity with a dynamic language.

You can churn out a greenfield project faster as you don't have to spend upfront time mapping out interfaces, DTO's etc. It's also easier to 'hack'.

Of course the above makes the code harder to maintain and reason with, unless written by very disciplined engineers.

So these languages are a natural fit for startups (and things like prototyping and scripting).

Instagram would have been in this category, and it's worked out well for them.

user5994461 · on Dec 21, 2017

>>> but navigating a million lines of Python seems just daunting to me (although maybe I'm just not experienced enough with Python).

It's not that bad.

The first thing you learn when you're in a million lines codebase is that you will only work within your project of maybe a hundred files.

Once in a while, there is a guy who is asking for help on his project or there is an old weird bug to fix and you dive in other stuff. Otherwise, it's like it's not there.

AlexCoventry · on Dec 22, 2017

Sounds like the projects you worked on had excellent encapsulation.

user5994461 · on Dec 23, 2017

I wouldn't say excellent encapsulation. It's just normal to use sub directories to split things. The project was over ten millions lines of python.

AlexCoventry · on Dec 24, 2017

Subdirectory structure is sort of orthogonal to the issue of encapsulation. It's more about providing API's and clean abstractions which prevent incidental interactions between implementation details across module boundaries.

azhenley · on Dec 21, 2017

Didn't Facebook do this for PHP (Hack) and Microsoft with JavaScript (TypeScript)?

From a technical aspect, I do find these projects cool. I wonder if its more efficient for large companies to initially develop using dynamic languages then transition them with these optionally typed languages.

vorotato · on Dec 21, 2017

Small projects scream dynamic types, but not every baby never grows up. Eventually the cognitive load becomes crippling and you're crying for static types.

jermaustin1 · on Dec 24, 2017

Dynamic languages definitely help with the rapid development of a small application or prototype, but when you have a large team trying to maintain the application, static typing allows for better compile time error catching. Weird hybrid approaches makes me wary.

xzel · on Dec 21, 2017

Facebook did even more with PHP with HHVM and other tooling. I'm not sure if JS was exactly the same for Microsoft as I saw it more as a cool internal project that got big vs something that eventually became necessary.

WorldMaker · on Dec 21, 2017

From what I read, several big teams at Microsoft were dogfooding Typescript immediately during development, even before it was ever externally released.

hacker_9 · on Dec 21, 2017

The evolution is fairly simple: python is very easy to setup, especially with django for working on the web, and for writing scripts quickly it work really well when everything is just an 'import' away.

Then as the product gets bigger, you'll hire python developers to keep up with the workload - and the best ones will be the ones who have committed their lives to Python. So you'll now end up with more and more python code.

Before you know it, you aren't writing small scripts anymore, but now you are writing quite large features, that take weeks and that require intimate knowledge of the code base so you don't keep backtracking and repeating yourself. But Python doesn't help you at this point, you traded static types for flexibility and now you have to pay the price.

At this point you're screwed, too many man hours spent on the codebase to redo it, so what to do? Well if you have the man power, build your own static type checker of course! I mean after all, if you have 100s of engineers who cares? You just throw more people at the problem until it goes away. Then wrap it up in a nice little package, and slap yourself on the back while you ride the instagram bubble.

oblio · on Dec 22, 2017

Rewriting a whole app would be at least as big of an investment, in my opinion...

sidlls · on Dec 22, 2017

But not as useful for the resumes of the devs involved.

scrollaway · on Dec 22, 2017

Did it occur to you that rewriting an internal app has no benefit to the community at large, whereas publishing a tool is a clear improvement on the community's tools?

Did it occur to you that instagram published a valuable and useful tool that now just exists, and this is now a non-issue for anyone else in their situation?

Like, why are you complaining about resume boasting?! It's like you want people to do useless work that has no positive effect on the open source community. Are you just upset people are using Python or what?

sidlls · on Dec 22, 2017

Rewriting over a million lines of code in a statically typed language coming from one likely as riddled with type errors as this codebase is is unlikely to be productive. They're making the best of a terrible situation.

I also think that if they want to use types the correct approach is to apply a tool like this as a stop-gap but write new components going forward and bug fixes/feature re-writes in a language that supports types "properly" (i.e. in the way they seem to want, that is static types checked at compile time).

I think tools like this are great for companies in situations like this. I don't think they're good to use from the outset: the team should just use an actually statically typed language instead.

nickm12 · on Dec 22, 2017

It's important to realize that these tools just didn't exist 7-10 years ago when companies like Instagram and Dropbox were getting started.

Though type inference has been around in so-called "academic" languages for decades, it hit a tipping point in the last 10 years, to the point that every major dynamic language has static type checkers or dialects (like Typescript) that support static checking. Meanwhile, even traditional statically typed languages are growing stronger type checkers.

bobwaycott · on Dec 21, 2017

You appear to be ignoring who this coming from. Instagram is huge, and they built their service with Python. It isn’t coming from someone who is arguing this is how it should be done from the beginning. Hell, given the typical lifecycle of a startup, this situation occurs exactly because a dynamic language is chosen to speed getting a product to market and try to build a business around it.

vasilipupkin · on Dec 21, 2017

I literally just had this exact thought independently of you. I mean if you need to do something like this, doesn't that mean you should be writing code in a statically typed language?

Barrin92 · on Dec 21, 2017

There's still a good deal of people who think of static typing as 'limiting', and dynamic typic as 'human'. Matsumoto said as much during (iirc) last years Ruby conf.

I think the reverse is true. Static typing is liberating for humans because it tames complexity. Because I'm not a machine I cannot possibly keep track of fuzzy programs that arise from dynamic typing.

dmitriid · on Dec 22, 2017

> Static typing is liberating for humans because it tames complexity.

It doesn't, though. Not with the currently existing type systems and implementations.

- Without type inference you end up righting multi-tier type declarations everywhere.

- With overly powerful type systems you need something close to a PhD in math to create proper types and then figure them out half a year later when you've already forgotten most of what you did

- Union and intersection types which are extremely valuable are missing from a lot of statically typed languages

And because I'm not a machine I often cannot figure out what a yet another two-hundred multiline error message wants of me. Often I'm happy to just throw an `if (x && x.field){}` and be done with it.

Barrin92 · on Dec 22, 2017

you'll have to expand on this a little for me. I just recently looked at an older Haskell codebase I was working on two years ago, and simply because of a very few straight forward types, nothing special, I could really wrap my head around the code-base a lot easier. Not just because the code tells me in plain text what types occur where, but also because enforcing a strong type system encouraged compositionality and well-formed behaviour in the first place.

If I look at some of the python code I've written, I'll be perfectly honest I often cannot tell you what to make of it any more.

dmitriid · on Dec 22, 2017

Types being code depend on people writing them (as is the rest of the code :) ). I guess you’re lucky/smart that your code base is just simple types.

Quite often people construct complex type hierarchies just because they can (or don’t know better). And it’s a pain to wade through and coerce to what you want it to do.

I’m very much on the fence between static and dynamic typing, having used (and probably abused) both. I prefer a “pragmatically” typed language, but I haven’t come up with a proper definition for it yet :)

jw- · on Dec 22, 2017

I also get the impression that some people view using dynamically typed languages as a badge of honor, taking more expertise to harness the greater 'expressive' power, all whilst juggling the types in your head. Bad programmers can't use them properly, but if you're one of the good ones then you're not held back by rigid types.

I'm one of the dumb ones and just let the computer do the checking for me.

vorotato · on Dec 21, 2017

Disclaimer, I'm a regular horn tooter for F# but imho this is such a perfect case for the language. It has similar line density to python, but with static type inference. Admittedly you will have to be explicit about using mutables, but I'm going to bet that instagram already cares about that.

nerfhammer · on Dec 21, 2017

there's a difference between should be and should have been

maybe it's easier to do this half-measure than rewrite your entire code base

vasilipupkin · on Dec 21, 2017

I get that part, but for new projects, this perhaps is a hint to do things in Java?

newfoundglory · on Dec 21, 2017

80% of startups are going to die before their codebase reaches this size. They should probably not be making tech stack choices based on "what if we succeed beyond all likelihood".

vasilipupkin · on Dec 21, 2017

two things come to mind: maybe those startups would die at a slower rate if their code was comprehensible to begin with ? but leaving that aside, for companies where survival is not an issue then, doesn't this indicate that using a dynamically typed language is not great?

panopticon · on Dec 21, 2017

I would be interested in seeing the number of startups that fail due to technical debt. My instinct is that most startups fail for business reasons (no clear need, not enough/right sales, poor management, bad pitch, solving the wrong problem, etc).

depressedpanda · on Dec 22, 2017

Well, in the case of Instagram, this just shows that Python was a great choice.

They went with a dynamically typed language, were successful, the language they chose added an optional type hinting system, and they wrote a tool that would automatically type hint their code in order to reap many of the benefits of a static type system.

I think that the amount of man hours that went into writing the tool is negligible, so it's a net win for Instagram.

lmm · on Dec 22, 2017

New projects at large companies often are written in Java. New projects at startups aren't and shouldn't be - doubling future maintenance costs for the sake of a 20% reduction in development time now is a good trade, because 90% of startups fail, the important thing is to validate product/market fit as soon as possible.

(Though really you should just use Scala and get both Java-like safety and Python-like productivity)

nikhilalmeida · on Dec 21, 2017

reasons (advantages) for using python (or any other dynamic lang.) when in startup mode is more than just typing. Similarly the advantages of continuing to use python outweigh the costs of doing this little "type-dance".

emmelaich · on Dec 21, 2017

There's lots of good and bad things about Python.

The worst thing -- to me -- is dynamic typing.

So why not fix it?

domenukk · on Dec 21, 2017

Similarly, I'm amazed by Pycharm (IDE), which supports pretty good inferred type hints from debugging and static code anlysis, btw. Makes writing code a lot easier. Looking forward to trying MonkeyType, it seems awesome for lager projects.

pvg · on Dec 21, 2017

Pycharm relies on pre-generated annotations as well, they're just built-in - it's not as magically trace-y and infer-y as it might seem at a casual glance.

tandav · on Dec 22, 2017

Why not use cython? It has types

true_religion · on Dec 22, 2017

How is this different from Dropbox's PyAnnotate[1] which was released a few weeks ago?

[1] https://github.com/dropbox/pyannotate