Fantastic contribution back to the community; I look forward to trying it out.
I must say this is the first time I've been disappointed with the quality of discussion on HN. For a community that promotes using the right tool for the job at the time, I would have thought people would be more open to the choices the early engineers made. I'm sure that Instagram are using a variety of tech across their stack.
Most people would expect software to crash, hang, be slow or somehow leak their personal information. That's normal behaviour for the products of the software industry.
For a long time there have been efforts to ensure at least a degree of quality and robustness through processes, practices and verification tools.
One such tool is a type system which allows encoding requirements and expectations that will be automatically verified with the help of a checker. This is exactly what Instagram is attempting in an effort to increase quality and ease maintenance.
It seems many are wondering why they haven't done this from the beginning. A typical (and probably correct) answer is development speed and flexibility: early stage companies need to be nimble and compromise on quality if they want to survive.
Fair enough, I understand that. We can applaud Instagram's success on the markets, but that doesn't have to mean that they're a role model of technical excellence for building a million line Python code base.
This tool is the proof that Python has significant problems at scale, which is something the Python community has denied for a long time.
They're still doing it in this thread, but the lesson looks pretty clear to me: if you plan on building large scale, don't use Python. Or PHP while we're at it (see Facebook). Definitely not JavaScript (see FirefoxOS).
The founders of a start-up can continue to do whatever they want in the name of success. Instagram at their beginnings was basically a different company from today's Instagram, no one would have used them as a technical role-model. Now they're at best a role model for large Python code bases, but many here seem to be drawing the wrong conclusion, namely that it's a good idea to do large-scale Python in the first place.
The reality borne out by the evidence [0][1] is that Python is at the very least perfectly suitable for the development of web stacks powering companies worth in excess of hundreds of million dollars.
Putting aside concrete technical issues regarding the python runtime’s performance envelope (eg: startup time, FFI inter-op call time, etc) and memory footprint, there is no reason not to use Python. Again, concrete technical issues aside, Python will not be an issue until your business is big enough that having to push the performance of your product is a nice problem to have and you will probably have the money to spend solving it by either optimising your python or rewriting parts of your application, as all these large companies have done.
You can substitute Python for assembly, point at the Apollo mission and your argument still holds.
Rarely is a programming language chosen because it actually is the best tool for the job. Often, it’s whatever the people at the ground floor were most comfortable getting a prototype out the door with. Prototype working well? Fix this bug, add yonder feature. Before you know it you’ve built Facebook in PHP. “PHP is at the very least suitable for the development etc etc”, yes, that’s why Facebook invested all that effort in Hiphop VM.
There’s an irony in saying “there’s no reason not to use X” in reply to an article that is about a company spending a ton of effort working around a problem in X.
My point is: as a community, it’s our duty to learn from these mistakes. Let’s admit there is a problem, investigate, adapt, overcome. We can improve whatever will be the next Python so the next Instagram doesn’t have to go through this. But that won’t work if we keep saying “this is a nice problem to have, there’s nothing wrong.” It isn’t. There is. Look at the article.
I'm not claiming that one can't be successful by using Python (or pretty much any programming language). Market success or user count are unfortunately not tightly correlated with technical excellence, as many of us have bitterly found out.
Development speed, maintainability, error count are also important development issues. Unfortunately we don't have much data to judge, but the little that we have such as this article indicate that dynamic typing has a non-negligible negative impact on the above.
""having to push the performance of your product is a nice problem to have and you will probably have the money to spend solving it by either optimising your python or rewriting parts of your application, as all these large companies have done.""
Not considering the topic at all is negligent. I don't understand why you're so sure that the only possible outcomes are either not reaching scale or having the money to optimise or rewrite.
First of all, not even Facebook had the money to rewrite their PHP code base, so that's probably out of the question.
And it's very well possible that one will reach scale and not have the money (or worse the time) to optimise, if optimising means inventing type checkers.
At least one should spend some time thinking about this topic and picking a language that is flexible and can scale at least somewhat. Having to stop writing production code in order to invent a Ruby static type checker or native compiler is decidedly not a good problem to have.
> ...we don't have much data to judge, but the little that we have such as this article indicate that dynamic typing has a non-negligible negative impact on the above.
In my own experience I see it on a spectrum: there are design decisions you can avoid by using dynamic typing that give you speed in the small that start to show their absence in the large. Savings on inputs (quicker coding), tend to evaporate as complexity grows because you're using so much time analysing outputs (the system), to determine behaviour. Every bit of up-front work saved gets amortised across issue after issue, and runtime testing during development, and increasing stagnation in high-level architecture.
For me the ideal is the type system of Haskell with the linguistic power of Haskell and the type inference of Haskell... only on a mainstream platform I can convince management to use.
> Market success or user count are unfortunately not tightly correlated with technical excellence, as many of us have bitterly found out
It depends on how you define technical excellence. I would suggest a low memory usage might get you a thumbs up from HN, but isn't real technical excellence the ability to solve users problems?
Nope. Publishing an offer on Craigslist to help with homework via Skype is solving some users' problems, yet has nothing to do with "technical excellence".
> This tool is the proof that Python has significant problems at scale, which is something the Python community has denied for a long time. They're still doing it in this thread, but the lesson looks pretty clear to me: if you plan on building large scale, don't use Python. Or PHP while we're at it (see Facebook).
Yeah, absolutely, don't do this! These are two examples of successful companies that did it and look at them now! </s>
Being able to move fast and produce a winning product on time is much more important for startups. What does it matter if you used <your_cool_scalable_thingie> for a project, when it never went past 10 users because you were concentrating on wrong side aspect of your business? PHP is fine. Python is great. Use the tools that fit your problem and you know how to use, not the latest toy.
There are two things that are critically incorrect about your argument:
1) That market success implies having quality software. Average seems to be enough in my experience.
2) That start-ups are a good example to follow if one wants to achieve good quality. In fact they should be ignored, because they will absolutely murder quality in order to stay alive. Sometimes the product doesn't even work and is held together with duct tape in order to get past that important demo... It's quite pointless to discuss quality and start-ups.
The lesson I mentioned should be heeded by mature companies that are able to do some project planning, complexity estimation, etc.
How do you identify the inflection point and then execute when it hits? There are conservative choices that would scale all the way through, but many startups would avoid them due to the hit on "velocity" (which is itself a very fuzzy topic.)
It seems that this same pattern plays out with many tools, and not just languages. When you've built something and you now have a team, processes, etc. built up it becomes difficult to see the forest for the trees, or to make the hard decisions because it might involve replacing people.
> This tool is the proof that Python has significant problems at scale...
This does not prove your point. Annotating a large dynamically-typed codebase with type information is a large amount of work, regardless of the language. This tool makes that easier.
I'm bemused by your reply and curious to know why you think Dropbox and Instagram are working on static type analysers for their large Python code bases.
Instagram at least gave us a hint: "we’re keen to make our code easier for new developers to read and understand, as well as more amenable to static analysis that shrinks the domain of possible bugs".
It seems to me it's so difficult to manage such a code base, that they decided to do that "large amount of work" in addition to the large amount of work required to develop the necessary tools!
In which case it seems prudent to avoid the said amount of work by picking another programming language for one's large-scale code base.
> ... it seems prudent to avoid the said amount of work by picking another programming language for one's large-scale code base.
While I'm a type-adherent in my day-to-day (F# represent, wut wuuut), I think this isn't reflective of the chicken and egg dilemma for startups... There are an immense number of things they "should be" doing at scale that they can't do early because they're relying on their early product to scale. Time to market, and windows of opportunity, are critical to startups.
Any immature technical decision at that part of the lifecycle needs to be made not with an attempt to make perfect forever from the start, but rather with an eye to transitioning to smarter solutions aggressively as you scale up. The company may be 6 pivots away from success, so better to validate solutions in the market than prognosticate.
I don't accept the premise that a lack of typing significantly improves development velocity, per se, but language decisions are about ecosystems, key components, and local talent. Where these companies get into coo-coo land is not integrating those immature components into better systems as they're getting bigger. Next thing you know someone is writing a whole compiler chain for PHP in an attempt to reinvent a sane programming language, or trying to get Python to be Java.
Smart, modern, functional languages provide development velocity and pleasure equal to Python with fundamental type safety and guarantees. But in a world where just putting a button on a webpage means multiple dialects in multiple languages I think we should be ok mixing and matching on the backend to scale smartly.
Isn't dropbox doing it partially to aid a future migration to Python 3?
> In which case it seems prudent to avoid the said amount of work by picking another programming language for one's large-scale code base
When I first saw Instagram I thought it was nonsense, I was wrong. When the founders started Instagram, I wonder if they had the amazing foresight to see what it would become. I would suggest getting to market was more valuable to them than worrying about what maintenance they would have to do once they had a billion dollars
Yes it does and it doesn't matter, because an argument can be made that there wouldn't be a Dropbox or an Instagram if they started out in a statically typed language back then.
"Scale" can be seen under different angles. You can run thousands of boxes with relatively simple code if it's designed to scale horizontally. Twitter used to run on Ruby this way, until they accumulated money and expertise enough to rewrite the whole thing and save on operations and further development costs.
Your code can be millions LOCs and run on a few boxes; it's a very different sort of "scale".
In one company where I worked they had a crazy mix of codebases, from modern Scala down to ancient PHP code. But since it was architected reasonably, it was possible to replace the PHP code piecemeal, without stopping the system. Do the devs that started it 15 years ago with PHP deserve blame for choosing a poor language, or praise for coming up with a serviceable architecture?
You can see the original post as a step in the right direction: in a complex codebase, static typing has a large number of advantages. Barring a wholesale rewrite, how would you gradually transform your code to using it? Yes, by documenting the current state in a formal way, introducing a typecheck build step, then maybe splitting out certain components and rewriting them in other languages, etc. Look at TypeScript.
Unfortunately, there's no way around using quick-and-dirty prototypes at the very early stages, if nothing else, for lack of technical expertise among founders and their first employees. They have business ideas first and foremost. So a tool like that would help exactly the step you want done, switching to a nicer and more manageable stack as growth allows / demands it.
MonkeyType is Python 3-centric, PyAnnotate is Python 2-centric.
PyAnnotate couples re-applying the types with the tool, and uses type comments for this.
MonkeyType generates .pyi files that you can either use directly or re-apply them to your code as proper type annotations.
Other than that, MonkeyType is used on a daily basis internally and solves a bunch of common annoyances of systems like these, like duplicates in unions, applying better types to stuff that was already hand-annotated with Any, etc.
I'd also like to hear more about this -- both the feature set and the development process. It's interesting that two large engineering organizations responsible for some of the most popular applications on the planet spent hundreds of engineering hours building almost the same exact tool at around the exact same time.
I'd also like to know how much quicker or better it could have been completed if it had been done out in the open.
The idea to gather types at runtime is as old as PEP 484. The Dropbox and Facebook teams working on Python type checking know each other. We both worked on our implementations independently since we wanted to first test internally whether the idea holds water. For example, I personally thought it wouldn't be as useful in practice as it turned out to be!
We knew we're going to open-source each others' implementations, esp. that Instagram's is focusing solely on Python 3 which isn't useful for Dropbox at the moment. It just took a while to get through the process of open sourcing what we had (cleaning up the early implementation with limited documentation, decoupling from internal data stores, etc.).
Would it be cheaper if this started out in the open? Probably, but I don't think by quite the margin as you expect.
Interestingly, I made a similar tool for php some time ago. It was recently revived and is now in active development to bring it up to speed with recent developments in the language.
That is fascinating. I’m the author of a static analysis tool for PHP that can generate types, but clearly not at the same level as runtime analysis. I’ll use it on my company’s codebase and report back.
Its interesting, this is a similar learning Clojure came around with, that the types weren't really useful unless everything is typed. Though Typed Racket's solution was to promote types to runtime validation at those borders between things with types and things without.
I do find it intriguing though, that adding back types manually is so hard and slow. Is it slower when done retroactively? Or is it just as slow when done at the same time, but we don't realize its overhead?
It's a lot slower to do retroactively. You basically have to tell the computer why you believe something is correct - e.g. if you're moving to having a distinct type for non-empty lists because some functions are only valid for non-empty lists, you have to explain why you believe a list you're passing to such a function is non-empty. That's a lot easier to do at the same time you're doing it (even in Python you'd probably still ask yourself whether you knew the list was non-empty as you were writing it) than to come back months or years later and remember why.
It's slower when done retroactively. A developer writing typed code has the problem domain and software design unmarshaled into his brain while writing his small section. Type testing occurs concurrently with feature testing. Types added retroactively need to be applied to a large codebase and undergo separate testing.
I have been using C++ a lot lately but really wish there were more tools for reflection at compile time, e.g., ability to iterate over all the members of a class. Other than that, I'm really loving C++17's auto template parameters and type deduction capabilities, plus code that's 200x faster at runtime than most interpreted languages. I've found autocompletion in CLion to be slightly better than autocompletion PyCharm, but not quite as good as IPython or IntelliJ with Java.
That's kind of missing the point though. Languages are tools for different purposes. You may love chizels and lathes and saws but you wouldn't build a suspension bridge with wood. Just like I wouldn't build a cabinet with cement, steel, and rebar.
Picking C++ over Python is like picking woodworking over metalworking.
Python being slow is never an issue unless someone is insisting on using the wrong tool for the job.
Numba is like laminating wood to build structural beams - it will get you close to the performance of metal, for some applications, if you can accept the weight increase etc.
Numba could be seen as misguided from some points of view. E.g. when using Python for high performance scientific computing, you will typically be writing your computational kernels, I/O etc. in some compiled, superfast language (C/Fortran/CUDA/whatnot) and all the input handling/case setup/etc. in Python. If 1% of your compute time is spent Python and 99% is carefully optimized C, Numba is obviously pointless.
But that's for one application. Python is used for so many different things that you can't make blanket statements like this.
Why C++ though? In e.g. Scala you have much more pythonlike code than even modern C++, full type safety, much better performance than Python, strong IDE support, and the ability to do the things you would want compile-time reflection for (e.g. typeclass derivation and similar use cases for "iterate over all the members of a class") in a completely type-safe way without needing dangerously flexible macros or code generation. Other ML-family languages will be similar.
While I won't claim it to be as elegant as Python, it doesn't seem too ugly. Does anybody have an idea about how `auto` could be utilized to avoid the long vector<pair...> type declaration?
1. I feel like most of the desire for a static language is to know what type something is. Is C++ exactly as brief as Python? No, as I think you've demonstrated. But I think you're a lot more likely to know the type of something. Rarely do I think I find that Python has annotations, and annotations can be wrong.
2. C++ is, in general, I feel, much more explicit about where copies occur. I elided one of the copies in your example, opting instead for an in-place sort (but this is trivial to fix in the Python).
> 1. I feel like most of the desire for a static language is to know what type something is. Is C++ exactly as brief as Python? No, as I think you've demonstrated. But I think you're a lot more likely to know the type of something.
Sure, but you don't have to choose between them, there are plenty of languages where you can have both Pythonic terseness and full type safety. E.g. Scala:
That's not the same thing, you've sorted a list of objects, we're looking for a list of tuples of `weight, name`)
In answer to your question though,
sorted(((k.weight, k.name) for k in some_list), key=lambda x: (-x[0], x[1]), reverse=True)
appears to work. This does use a non-obvious trick, but being more explicit is a smidge difficult, since the key function is called only n times, as opposed to O(nlogn) in the C# example.
Alternatively, you can use
sorted(sorted(((k.weight, k.name) for k in some_list), key=lambda x: x[1], reverse=True), key=x[0])
Which is more like the original example, and if you're doing it in place, you get
outs = [k.weight, k.name) for k in some_list]
outs.sort(key=lambda x: x[1], reverse=True)
outs.sort(key=lambda x: x[0])
Python's builtin sort is timsort, so despite sorting the list twice, this will still run in approximately NlogN comparisons, not 2NlogN.
You could also manually define a custom comparator, ie
You could implement the following with a decent bit of work (declaring k and reversed to be variables of special, hand-written/macro-generated types with overloaded operator, and operator=).
I don't see how you could implement the (k.attr).for_(k) part. That's essentially an assignment, a macro could maybe convert it to a lambda, but I don't see how it would be done macro-free.
I come from a statically typed background (C++), but have been doing a lot of analytics in python in the past two years. It is frustrating not to have compile time guarantees when dealing with mathematical programs, because some things have to be a particular type (i.e. matrices of compatible dimensions). The result is a copious use of asserts, but it feels bad when you know that if you did this in a functional language,let's say, you could prove implementations are correct by the nature of the type system. In short, I'd love to see more strong type support in python.
What language do you use where you can get these kinds of guarantees? As far as I know very few languages provide those kinds of dependent types statically.
C++ makes this possible via templates. Generally the size is moved to a template argument, which allows the compiler to check this at compile time (of course, this restricts you to statically sized matrices).
Don't feel too bad, almost everyone is. And almost all of the people who are aware of how "powerful" they are tend to equate "hellish complexity" with "expressive power" and are not people you want to work with.
I think I knew that templates were Turing complete, but so are java generics, it's just that to get dependent types in generics you have to reinvent the integers within the generic system. Not so for templates, which I didn't realize. That's pretty nifty!
In graphics programming, you have specific types for lots of small vectors and matrices (for vectors, there are separate types for every sizes 4 and below, and matrices usually comes in a variety of sizes as well, at the very least 3x3, 3x4 and 4x4).
Typechecking is very useful here: if you try to transform a point in space represented by a Vector3 by a general 4x4 matrix, it fails compilation because you have to convert the point to homogeneous coordinates first. Very useful information from the type system.
I'm aware of that, I was more thinking in the general case, as comes up in machine learning for example, where you have sizes like 128, 192, and odd shapes like 12x3x100x100 4-tensors, etc.
That is, generalized matrix types, not simply rotation matrix types or whatnot for special cases.
I believe in the functional world, that sort of thing is implemented suing a feature called dependent types. It's not a very common feature, but Liquid Haskell implements it.
FYI, it requires Python 3.6+. It mentions it in the article towards the end, but if you're like me and prefer to jump straight into trying something out you may not have seen it. I wasted a bit of time trying to figure out what the ContextManager is in the Python typing module and why it couldn't find it.
This looks wonderful :) I'd love to try it out at some point.
One thing that I think could really improve the documentation is a few examples! One of my favorite things about the Python docs and the community is the wealth of examples. From looking at the docs, I couldn't find the main thing I wanted to see - what would MoneyType's annotations look like if I used it?
Reading all of the comments from engineers who seem to either posess a time machine to send current tech back in time, or are criticising the technical choices that made the founders $squillions, is making me a bit mad. As a diversion perhaps some of them could list a few billion dollar startups that made perfect choices at the start and never had any cause to refactor or reimplement code as they grew?
It's not obvious (and IMO somewhat doubtful) that it was the technical choices of the founders that made them successful. Their choices could well have been bad, just not bad enough to make their business fail.
This was the part that stood out to me also. They must have a ton of new stuff in the pipeline, or their "display photos, insert some ads in between" loop is way more complex than it seems.
Or that's total number of engineers and way fewer actually twiddle the Python...
Is their goal to annotate everything or just the non-obvious things? Also how would a tool like this handle cases where the “correct” type is a generic base class but at runtime it only sees a certain subclass? To be pythonic, a function that accepts a tuple should usually also accept a list for example, but at runtime that may never happen.
Since this is how gradual typing works, the goal is to annotate every last function.
Good question about abstract base classes! Paraphrasing a well known cliché: types in functions should be forgiving in arguments (what the function accepts) and strict in return values (what the function emits). In our case, the human reviewer needs to decide if the argument types collected by MonkeyType should be generalized. In fact, the collected types might not even work in all cases and the type checker might complain. It's because annotations describe "what should be" whereas MonkeyType finds "what is". This is why a system like MonkeyType shouldn't even attempt to use abstract base classes in place of concrete types that it collected.
Reading this as someone who writes mostly in statically typed languages, the whole exercise seems odd.
Having so much dynamically typed code to maintain that you need to run production code using a separate tool just to figure out the types sounds just wrong. Why not use a statically typed language for such a large code-base? Is this done by purpose, or did they end up with a million lines of Python code and are looking for ways to make the maintenance easier?
And before I get down-voted to hell - I completely understand using Python for many things. It a good technical choice for many different problems, but navigating a million lines of Python seems just daunting to me (although maybe I'm just not experienced enough with Python).
>Is this done by purpose, or did they end up with a million lines of Python code and are looking for ways to make the maintenance easier?
Definitely the latter. I've seen this discussion a few times before, and it's always the same. Your initial developers are not looking down the road to the million lines of code milestone, they're just trying to make a product that might actually make some money here and now.
I'm sure Instagram was exactly that. They needed to handle images and some guy knew how to do it in Python. They wrote Python code, and then people liked Instagram. They eventually became a billion dollar company with millions of lines of code and no where along the road was there time to say "hey we need to refactor this whole thing". Or if that was said, management laughed and said "we need this feature".
So here is where you end up. The developers need to clean things up but they don't have time to clean it up by using a language, realistically, they probably don't know as well as the Python they wrote the millions of lines of code in.
Re: your last comment, navigating a million lines of any codebase is daunting, and especially more so if you aren't a developer in that language. I'm not sure what exactly "Python" has to do with that, besides that you're not a Python dev.
To add to this, note that type hinting is quite a new feature in Python (introduced in v3.5, released in 2015), and this functionality simply wasn't available before. So any company heavily invested in Python today obviously wants to improve their runtime reliability, without having to rewrite parts of their stack.
Stricter typing goes a long way to achieve this, and gradual typing allows you to upgrade the code base at your own pace, which is great.
Consider this study[0] about TypeScript and Flow, which use the same approach for JavaScript, which found both able to detect ~15% of runtime bugs. So no wonder companies with large Python code bases would be the first to invest in this space.
Personally I feel this is a great addition to the language, and hope type checking becomes a first class citizen too, instead of being delegated to external tools like mypy[1] or pytype[2].
Dropbox is very heavily invested in Python. I am under impression they hired Guido van Rossum to do exactly this, among other things. First 100% statically type the old codebase, then port it to Python 3.
You can statically type Python 2 codebases, but the language does not offer native support for it. Thus, all needs to go to docstrings or comments.
I'd consider those languages optionally typed, rather than gradually typed, as they don't insert run-time type checks. Regardless, I still think it's all interesting and valuable work.
I can picture these poor souls vividly. Millions of lines of python, flowing like the mightiest of rivers. Nobody really knowing whence it cometh and goeth.
A hero arises, offering a sacred herb to calm the torrent and light the golden path. The hero is elevated, yet they continue to pray
They say there is a holy land called Haskell, but it is only revealed to the truest of believers without the weight of the chaotic-neutral entity “Shareholder” weighing ever so heavily on their shoulders. For those in Shareholder’s clutches, one must forgive their prayer. ‘Tis the best they can do.
With it's comforting, reassuring warmth, Perl shines on as a luminous sun, lighting the way for youngling languages to learn from. Hushed whispers foretell the sunset, but none truly believe them...
Kevin Systrom " thought of combining location check-ins and popular social games. He made the prototype of what later became Burbn and pitched it to Baseline Ventures and Andreessen Horowitz at a party. He came up with the idea while on a vacation in Mexico when his girlfriend was unwilling to post her photos because they did not look good enough when taken by the iPhone 4 camera." (Wikipedia)
He used Django because I guess that was an easy way for one guy to do it fairly quickly. The app was Burbn which then pivoted into Instagram.
By the way I kind of surveyed the "what framework should I use" stuff on HN over the last year and Django still seems the most popular, probably followed by Rails and Phoenix.
Navigating a large code base that is dynamically typed like Python is far more tedious than something like C++ or C#.
First you can't read what the types passed into and out of functions are. You have to find their usages to work it out. Second, you can't reliable do things like "find usages" or "go to definition" because of the dynamic typing.
When I was writing c#, I did use visual studio, but since I'm used to developing in a terminal editor, all these tooltips and things are a little tricky -- sometimes they disappear, and then I can't get them back, etc.
But more often, when I'm looking at c# or c++, it's not code I wrote, it's not code I intend to change, it's code that's interacting with my code (written in another language) that I'm trying to see why it's misbehaving, so I can get the owner to fix it. I could be reading the code on GitHub or some other web view, I might have checked it out, but I have no interest in setting up a (probably new) IDE to look at it as the author would; I dig into too many projects to learn that many tools -- and deal with the upgrade cycle for them.
Sure, it would be useful to hover and get more information, but I'm used to loosely typed languages, so it's not awful. It's just jarring to see that the type information is apparently not important enough to write down the name in c++ or c# anymore.
I'm not sure what exactly "Python" has to do with that
Well as he said, a statically typed language is better in that kind of situation because it enables a better class of tooling and the typing system enforces certain style constraints, that enables better quality of code analysis en mass.
Python specifically is very lightweight in this regards with little in the way of naming constraints (vs for instance Ruby having different formatting rules for different types)
So yeah, it’s not “just like any other language” - horses for courses
I really don't get people. Instagram and Dropbox, through typing annotations in Python, are gradually improving a language that has codebases running globally, from YouTube to NASA.
Clearly something is right with the situation when the incentives are aligned for a tech company to contribute back to the open source community in such fundamental ways. So why look for the mole and think "They should have done it differently", when doing it differently has a high likelihood to mean not being as successful as they are today, and not having the occasion to contribute back?
It's like telling a successful charity "You should just take everyone's money and spend it on lamborghinis instead of wasting time building wells in africa".
No, it's like asking someone who spent a lot of time building an octagonal wheel and is now trying to shave down the corners... why didn't you use a circle to begin with.
To me, this is a metaphor that might better explain it, "You need a different approach for getting your first million than your second." (I've heard it attributed to customers, revenue, personal income, etc). It sounds like Python (and features like dynamic typing) works very well at bootstrapping and developing. They're leveraging different features (more like static typing) once they get larger and more time is spent on maintenance (and rewriting everything isn't appealing [1]).
Honestly, your octagonal wheel metaphor works, too. Building the first car you spend a lot of time on octagonal (crude) wheels, but later spend a lot more money on round (precision) wheels. You could have gone bankrupt spending money originally on round wheels that were the wrong size.
They did not spend a lot of time building an octagonal wheel, they built a billion dollar company using Python. Now, when the code base has been proven, and the business rules solidified, they retrofit what they believe will make the code base easier to maintain.
Python is an excellent enabler of this kind of dynamic system evolution.
Python allowed them to build a successful company. Now, when their stack is mature and maintenance is more important than rapid prototyping, Python allows them to add type hinting.
Because they are engineers, they built a tool (in Python) that allows them to do it in an automated manner.
And all of this is great!
They are evolving their code to fit their needs; it's nothing like making a octagonal wheel and wishing you'd have gone for a round one in the beginning.
The point is that there is no property of python that allowed them to build a successful company that any statically typed language don't have. It is a completely unnecessary detour.
The cost of doing it right from the start is negligible.
The first language I learned (after Applesoft BASIC) was C. I wrote C for a long time. About 6 years ago I picked up Python. Today I find it much easier and much more pleasant to spin up a new idea in Python, to the point that it is my default choice for new projects with fuzzily defined goals. None of my ideas have become companies, but I could totally see just sticking with Python even past the point that it became unwieldy.
I'm with the OP on this, I've experience with Java, C#, Python & C++.
I'm a big fan of rapid prototyping with Python to map out problem domains, and once the domain has been mapped properly, rewriting in a statically typed language if necessary.
Python is much better for prototyping than the other langs I've used. Because the syntax is almost pseudocode, and the duck-typing makes a lot of design patterns and boilerplate obsolete, so I can dedicate my headspace to the problem at hand.
You're assuming that the original developers would have been just as productive in a statically typed language as they were Python.
Big assumption.
Both because they might have known Python already and also because Python is quite a bit more newbie-friendly, concise and expressive than the mainstream statically typed languages.
Obviously what the programmers knew to start with is of importance. But that's no property of the language (well, sure, being easy to pick up increases the risk).
But no, ignoring that it is not a big assumption really. The benefits of static typing comes pretty quickly, especially if there are more than one programmer.
I think the general consensus is that that is not true. Python's dynamic nature is a clear advantage it has over statically typed languages. Add the fact that you can elect to tune down the dynamism when it makes sense to with very little impact on your existing stack makes Python the technological superior choice for the majority of applications.
Absolutely not. Python is not the technological superior choice for the majority of the applications.
If you think so then your experience in different domains and application types must be very limited.
The preconcept that dynamic languages are more productive is just an illusion because you can easily take shortcuts that will hamper your progress in the future.
A proper typed language with HM type inference has the ability to mostly avoid writing the types with the guarantee that the compiler will catch most of the pitfalls.
And if you don't do any logic error pretty much every time your code just works.
Saying that in a million line application Python is a better choice than F# or Haskell it's frankly ridiculous in my opinion.
Empirically, there are a lot more million line python codebases than F# or haskell codebases, in fact I can name multiple million line python codebases, and 0 F# or haskell codebases. Given that, logic would indicate some sort of failure on the part of haskell and F#, or they would see wider adoption among the large codebases where they are so useful.
Given how much smaller is the F# community and how much more you can crank in less lines of codes in F# I can believe it.
Between C# and F# there is about an order of magnitude of difference in the LOCs for big projects and C# and Python are comparable from this metric.
It appears you missed my point. I can't think of a 100kloc f# or Haskell codebase, so even if they were 10x as terse as python, which they aren't, python comes out ahead. If they're so much better, why don't people use them?
I can think of 100kloc Scala codebases, e.g. Kafka.
People do use ML family languages, and they are better. There are plenty of non-technical reasons they aren't as widespread as dynamic languages or shitty static languages.
There's nothing here that tunes down the dynamism. Hints aren't statically checked or enforced. It's still possible to pass in an empty list to an int-hinted var and, e.g. have `if not var` evaluate to True (rather than raise an Exception).
Type hints allow external tools to check some things, but at this point you're basically imposing static types so why not use a language with the tooling and optimizations to take advantage of that?
Python is a good choice to prototype, write small (less than a few thousand lines of code) projects with non-trivial complexity, and somewhat larger projects with more boilerplate (e.g. Django webapps). Beyond that its utility diminishes until it starts to become a hindrance.
Ah yes, this old chestnut: "(language I don't like) is only suitable for teeny-tiny puny baby child's toy programs, and once you're not writing those anymore you must use a big strong grown-up language like all the other Real Programmers™ do!"
The empirical evidence of reality is against you: there are successful large (in terms both of codebase and contributors/development team) projects in these awful terrible children's languages, and there are unmaintainable failed piles of crap in even the most grown-up of languages you'd care to name. The choice of language, and choice of type system, seem not to correlate with the success or failure in a meaningful way.
It doesn't correlate with success, but the choice of language does correlate with development speed, number of faults, maintainability, etc.
The interesting thing to note is that a language that's perfectly acceptable at the above at small or medium scale might turn into a hindrance at large-scale. An otherwise fast to develop in language like Python won't be so fast if every change has to be painstakingly reviewed and tested due to the complexity of interactions in the code base.
Using a type system to verify assumptions/requirements is not a recipe for success, but it can improve reliability.
Yes one can do a poor job in pretty much any situation, I'm afraid that's not an argument for anything though.
Here we're talking about average or best-effort: large code bases are complex in spite of the best intentions of their maintainers, so using tools that can manage that conplexity in an easier way through e.g their type systems could lead to better results.
>Type hints allow external tools to check some things, but at this point you're basically imposing static types so why not use a language with the tooling and optimizations to take advantage of that?
Python's type system is, imo, currently better than Java's, and the syntax is cleaner than java's or C++s. You get all the benefits of static typing without having to put `auto` and `List<>` everywhere. And at the same time, you get all of the advantages that python has over statically typed languages that aren't haskell (like comprehensions). And, when you need to, if you're doing something that's especially tricky or dynamic or whatnot, you can fall back to untypedness.
I think the closest parallel I can draw is to something like Rust. You get a huge set of guarantees for free, but can opt to do unsafe things when it's absolutely necessary, and better yet, you can start in unsafe land and then go back later and make sure your code is safe.
I'm curious what tooling you feel that say, Java, has over type-annotated python.
Hints don't provide any guarantees. It's still possible to silently and unknowingly pass the wrong type of value into any given argument, with or without the checker. The "tooling" the other languages have includes a compiler that performs these checks in a way that Hints + Checker-of-choice is unlikely (or unable) to. "What do you think these checkers do?" you might ask. The answer is: not nearly what a compiler does.
> It's still possible to silently and unknowingly pass the wrong type of value into any given argument
This is also possible in traditionally statically typed languages. Nothing stops you from doing unsafe casts or using reflection. Much like its exceedingly unlikely that you'll run across this in "normal" java or C++, its exceedingly unlikely for you to run into any issues with this in python. And, in fact, the typechecker has ways to handle unusual things like dynamically created attributes, for when that comes up.
And yes I mean this quite honestly. I've seen a lot of typechecked code, some of it quite ridiculously dynamic. Typecheckers perform absolutely fine.
>The "tooling" the other languages have includes a compiler that performs these checks in a way that Hints + Checker-of-choice is unlikely (or unable) to.
What way is that? Typechecking is static analysis. There's really no difference between how java or cpp does typechecking and how mypy does, other than that the python typechecker isn't installed by default.
> This is also possible in traditionally statically typed languages. Nothing stops you from doing unsafe casts or using reflection.
Neither of those is "silent" or "unknowing".
> What way is that? Typechecking is static analysis. There's really no difference between how java or cpp does typechecking and how mypy does, other than that the python typechecker isn't installed by default.
The typechecker can't handle un-hinted code (or, rather, it chooses something very permissive, like 'Any' for all hints). It's incomplete at best.
> This is not an answer.
It is. That you don't like or agree with it doesn't make it not an answer.
And in Java or c++, un-hinted code couldn't compile. The python type checker can do more than a java or c++ checker in this regard.
>Neither of those is "silent" or "unknowing".
They're exactly as silent or unknowing as you would get in typed python code. You appear to be comparing untyped python. That's an incorrect comparison. Offhand, I actually can't think of anything I could do in typed python that would get around the type checker, that wouldn't be considered reflection or a dynamic cast, and be very obviously so in python too. If you have an example of a silent or unknowing failure of well typed python code that passes on mypy, you should probably file a bug report ;)
>That you don't like or agree with it doesn't make it not an answer.
You're right. Its not an answer not because I disagree with it (I don't), but because it doesn't actually answer anything, which is why I don't disagree with it.
To summarize this:
Python typecheckers are capable of more type inference than Java, and require less syntax than c++ or Java to get well typed code. A typed python codebase can interact cleanly with an untyped python codebase, and within the typed parts of the code, you get equivalent safety guarantees to what the type systems of Java or C++ provide.
Your appear to be ascribing magical powers to compilers in other languages, when those compilers have exactly the same type information as mypy does.
In other words, going back to your first statement:
>Hints don't provide any guarantees.
Hints provide exactly the same guarantees as any other type system: "Assuming you write reasonable code that doesn't attempt to subvert the type system, the type system will catch any dumb mistakes you make."
That's the exact same guarantee you get in any statically typed language.
You don't put auto everywhere. You write out the type in 99% of cases and save auto for 100+ character templated types.
There is no reason to to save the literally 0.2s (you can still spend that time reasoning about your code) it takes to write the type. It is better for yourself writing it and for readability to be explicit.
By everywhere, I mean where it's otherwise obvious:
auto s = "Hello world";
for (auto c: s) {
cout << c;
}
Those autos don't need to exist, they're completely inferable, otherwise you wouldn't use auto. It's not like you can use auto in function declarations, nor should you, I agree.
Right, but my point is that there's really no tangible benefit to writing the type at all. In a language with good type inference (Haskell, ml, modern python), the string literal is known to be a string, and you don't need to do any extra work.
Being the lead on a data science team I am one of those. They're great for exploratory research and prototyping, and for use in the very tiny fraction of code in a production system that deals with machine learning if I must. For everything else, from the data pipeline to delivering results, I'd prefer and recommend something else, like Haskell, C++, go or Rust.
I agree with you but I think it's pretty straightforward how this sort of thing happens:
1. Startup builds thing fast in dynamic language because they need to optimize for development speed and iteration, not maintainability or scalability.
2. Startup grows and continues to hire for expertise in the tech stack they are mostly already using.
3. Repeat for some years and some hundreds of engineers and you arrive at this exact scenario.
Ok, then the business men among us should learn their lesson, that agility matters.
The software developers among us should also learn their lesson: don't build large-scale software in dynamic programming languages unless you can afford to spend time later adding a static type system on top.
I'd think the business men at Instagram have been very happy with the agility of development, that got them to a $2.8 billion revenue p/a company. More than enough to cover the engineering effort to help improve the maintainability of the code.
> 1. Startup builds thing fast in dynamic language because they need to optimize for development speed and iteration, not maintainability or scalability.
> 3. Repeat for some years and some hundreds of engineers and you arrive at this exact scenario.
Isn't it faster to write everything in Go, for example, that has a compiler guiding you all the way and you rarely get runtime errors? I feel my developer time very much "optimized" when writing backends in Go than when I wrote then in Python (I also tried Node, which was a disaster).
It's probably faster to verify that your Go code is working as intended, but many would argue that dynamically typed/interpreted languages are faster to write code in than statically typed/compiled languages. Others would argue that the pure "writing code" part isn't the majority of what takes up a developer's time. There's no one right answer.
Go wasn't a serious option for Instagram though. According to Wikipedia Instagram launched in 2010, Go launched in 2009. That would have made them very early adopters, was the library support there back then like it was for Python?
Just because there may be better tools now, should they scrap their working code that earned their fortune?
Well, OCaml, Haskell or maybe Scala would've been mature options in 2010, combining safety with language-level productivity that's comparable to Python, though the extent to which web frameworks available in those languages at that time were Django-equivalent is arguable. (Personally I would - and did - happily choose Scala with Wicket over Django in 2010)
In terms of what Instagram should do now, I'd say they should do what Facebook did: introduce thrift or similar, gradually move business logic into backend services written in more suitable languages, leaving the Python to eventually become just a thin web frontend. Retrofitting types onto code involves a lot of the same effort as rewriting it into a better language, and the rewards for the latter are higher, IME.
Empirically? Doesn't seem like it. But it's hard to run a good controlled experiment. The confounding effects of the team members, the project goal, and the vagaries of business are too noisy.
We're getting into a world where languages finally have type systems that dont suck for fast development. This wasn't the case until very recently. And many of the current options only became realistically viable in the last few years (or months!).
We still have to work with the world as it exists. Not as it should be or will be. And even with the crop of modern languages, its often still faster to start with less optimal languages and fix shit in the 1/10 chance you're actually successful.
You must have missed the part about "the world as it exists" and "fast development" and pretty much the entire point of the post you're replying to.
Haskell remains impractical for many use cases, it is not used much outside of academia, it's not documented to be used outside of academia, and it didn't even have a working package manager until a few years ago.
I think a lot depends on how solid your requirements are. If there's no need for further design as you develop, types are great. If you're writing one to throw away, they're a waste. If you're chopping and changing as you write, it can go either way.
It's rarely easy to re-implement large or complex codebases in a new language. That kind of major effort requires significant signoffs from leadership and a large effort.
This tool? This tool takes an existing set of codebases and makes them safer. No major boiling of oceans required.
Personally, I would love to ditch python for a language with strong types and type inference, but what is the replacement for django? Where do I find a well-designed, well-documented, battle-tested framework that I can easily hire developers for?
Currently, I think the nearest competitor is node with typescript, and I'd rather stick with python. Please tell me if I'm wrong.
> Why not use a statically typed language for such a large code-base? Is this done by purpose, or did they end up with a million lines of Python code and are looking for ways to make the maintenance easier?
Being in a similar (though much smaller scale) situation ourselves, I suspect your latter suggestion is the answer; they wound up with a large amount of Python code due to expediency (Python is good at quickly getting things done), and are now finding the code base quite hard to maintain.
Something to remember is that as of 10 years ago, statically typed languages consisted of C (for some definition of "statically typed"), C++ (complicated and cumbersome, and didn't yet have widespread availability of "modern" C++ features), Java (cumbersome, heavyweight, lots of missing opportunities for abstraction back then), C# (heavily tied into the Microsoft ecosystem, not yet open source), and a bunch of weird academic languages that required tutorials about burritos to learn (ML, Haskell, etc).
While statically typed languages like C, C++, and Java were the mainstream languages of the 90s, C was too limited for a lot of people, C++ and Java too cumbersome to use, and so lightweight dynamically typed languages like Perl, Python, PHP, Ruby, and JavaScript picked up a lot of steam due to how much easier to pick up and more productive many programmers found themselves in those languages. But now we have large, fairly un-maintainable code bases in these languages, and people are realizing the value of static typing, in part due to the maintenance hassle and in part due to newer, more expressive and accessible statically typed languages being available (Elm, Rust, TypeScript, Scala, Go, Kotlin, as well as improvements to C++, Java, and C#).
But that leaves all of these old codebases, that are hard to maintain. Rather than doing a complete rewrite, adding static typing capabilities that can be applied to existing codebases is a way to make them more maintainable without spending all of the time of a complete rewrite and having everyone have to spend all of the time learning the new language while still maintaining the old codebase.
Also, these old code bases can now be fairly easily and optionally retrofitted using modern Python's support for type hints and tools like mypy and now MonkeyType. So people can continue using their existing code base and the vast Python ecosystem, with the added benefit of type checking where appropriate.
My job is mostly machine learning and statistics. I would love to use something like Haskell or even Java, but the NumPy/SciPy/Scikit-learn/Pandas etc. ecosystem is just so far ahead of everything else that it's not worth it.
This tool seems to be built for teams that have already created large codebases in non-typed languages. This is a pretty common thing today. They are addressing a very real problem that a lot of people face which is "oh we fucked up, what's the most painless way we can fix it".
We just recently moved our entire codebase from JS to TypeScript which was pretty hard but super worth it.
>>> but navigating a million lines of Python seems just daunting to me (although maybe I'm just not experienced enough with Python).
It's not that bad.
The first thing you learn when you're in a million lines codebase is that you will only work within your project of maybe a hundred files.
Once in a while, there is a guy who is asking for help on his project or there is an old weird bug to fix and you dive in other stuff. Otherwise, it's like it's not there.
Subdirectory structure is sort of orthogonal to the issue of encapsulation. It's more about providing API's and clean abstractions which prevent incidental interactions between implementation details across module boundaries.
Didn't Facebook do this for PHP (Hack) and Microsoft with JavaScript (TypeScript)?
From a technical aspect, I do find these projects cool. I wonder if its more efficient for large companies to initially develop using dynamic languages then transition them with these optionally typed languages.
Small projects scream dynamic types, but not every baby never grows up. Eventually the cognitive load becomes crippling and you're crying for static types.
Dynamic languages definitely help with the rapid development of a small application or prototype, but when you have a large team trying to maintain the application, static typing allows for better compile time error catching. Weird hybrid approaches makes me wary.
Facebook did even more with PHP with HHVM and other tooling. I'm not sure if JS was exactly the same for Microsoft as I saw it more as a cool internal project that got big vs something that eventually became necessary.
From what I read, several big teams at Microsoft were dogfooding Typescript immediately during development, even before it was ever externally released.
The evolution is fairly simple: python is very easy to setup, especially with django for working on the web, and for writing scripts quickly it work really well when everything is just an 'import' away.
Then as the product gets bigger, you'll hire python developers to keep up with the workload - and the best ones will be the ones who have committed their lives to Python. So you'll now end up with more and more python code.
Before you know it, you aren't writing small scripts anymore, but now you are writing quite large features, that take weeks and that require intimate knowledge of the code base so you don't keep backtracking and repeating yourself. But Python doesn't help you at this point, you traded static types for flexibility and now you have to pay the price.
At this point you're screwed, too many man hours spent on the codebase to redo it, so what to do? Well if you have the man power, build your own static type checker of course! I mean after all, if you have 100s of engineers who cares? You just throw more people at the problem until it goes away. Then wrap it up in a nice little package, and slap yourself on the back while you ride the instagram bubble.
Did it occur to you that rewriting an internal app has no benefit to the community at large, whereas publishing a tool is a clear improvement on the community's tools?
Did it occur to you that instagram published a valuable and useful tool that now just exists, and this is now a non-issue for anyone else in their situation?
Like, why are you complaining about resume boasting?! It's like you want people to do useless work that has no positive effect on the open source community. Are you just upset people are using Python or what?
Rewriting over a million lines of code in a statically typed language coming from one likely as riddled with type errors as this codebase is is unlikely to be productive. They're making the best of a terrible situation.
I also think that if they want to use types the correct approach is to apply a tool like this as a stop-gap but write new components going forward and bug fixes/feature re-writes in a language that supports types "properly" (i.e. in the way they seem to want, that is static types checked at compile time).
I think tools like this are great for companies in situations like this. I don't think they're good to use from the outset: the team should just use an actually statically typed language instead.
It's important to realize that these tools just didn't exist 7-10 years ago when companies like Instagram and Dropbox were getting started.
Though type inference has been around in so-called "academic" languages for decades, it hit a tipping point in the last 10 years, to the point that every major dynamic language has static type checkers or dialects (like Typescript) that support static checking. Meanwhile, even traditional statically typed languages are growing stronger type checkers.
You appear to be ignoring who this coming from. Instagram is huge, and they built their service with Python. It isn’t coming from someone who is arguing this is how it should be done from the beginning. Hell, given the typical lifecycle of a startup, this situation occurs exactly because a dynamic language is chosen to speed getting a product to market and try to build a business around it.
I literally just had this exact thought independently of you. I mean if you need to do something like this, doesn't that mean you should be writing code in a statically typed language?
There's still a good deal of people who think of static typing as 'limiting', and dynamic typic as 'human'. Matsumoto said as much during (iirc) last years Ruby conf.
I think the reverse is true. Static typing is liberating for humans because it tames complexity. Because I'm not a machine I cannot possibly keep track of fuzzy programs that arise from dynamic typing.
> Static typing is liberating for humans because it tames complexity.
It doesn't, though. Not with the currently existing type systems and implementations.
- Without type inference you end up righting multi-tier type declarations everywhere.
- With overly powerful type systems you need something close to a PhD in math to create proper types and then figure them out half a year later when you've already forgotten most of what you did
- Union and intersection types which are extremely valuable are missing from a lot of statically typed languages
And because I'm not a machine I often cannot figure out what a yet another two-hundred multiline error message wants of me. Often I'm happy to just throw an `if (x && x.field){}` and be done with it.
you'll have to expand on this a little for me. I just recently looked at an older Haskell codebase I was working on two years ago, and simply because of a very few straight forward types, nothing special, I could really wrap my head around the code-base a lot easier. Not just because the code tells me in plain text what types occur where, but also because enforcing a strong type system encouraged compositionality and well-formed behaviour in the first place.
If I look at some of the python code I've written, I'll be perfectly honest I often cannot tell you what to make of it any more.
Types being code depend on people writing them (as is the rest of the code :) ). I guess you’re lucky/smart that your code base is just simple types.
Quite often people construct complex type hierarchies just because they can (or don’t know better). And it’s a pain to wade through and coerce to what you want it to do.
I’m very much on the fence between static and dynamic typing, having used (and probably abused) both. I prefer a “pragmatically” typed language, but I haven’t come up with a proper definition for it yet :)
I also get the impression that some people view using dynamically typed languages as a badge of honor, taking more expertise to harness the greater 'expressive' power, all whilst juggling the types in your head. Bad programmers can't use them properly, but if you're one of the good ones then you're not held back by rigid types.
I'm one of the dumb ones and just let the computer do the checking for me.
Disclaimer, I'm a regular horn tooter for F# but imho this is such a perfect case for the language. It has similar line density to python, but with static type inference. Admittedly you will have to be explicit about using mutables, but I'm going to bet that instagram already cares about that.
80% of startups are going to die before their codebase reaches this size. They should probably not be making tech stack choices based on "what if we succeed beyond all likelihood".
two things come to mind: maybe those startups would die at a slower rate if their code was comprehensible to begin with ? but leaving that aside, for companies where survival is not an issue then, doesn't this indicate that using a dynamically typed language is not great?
I would be interested in seeing the number of startups that fail due to technical debt. My instinct is that most startups fail for business reasons (no clear need, not enough/right sales, poor management, bad pitch, solving the wrong problem, etc).
Well, in the case of Instagram, this just shows that Python was a great choice.
They went with a dynamically typed language, were successful, the language they chose added an optional type hinting system, and they wrote a tool that would automatically type hint their code in order to reap many of the benefits of a static type system.
I think that the amount of man hours that went into writing the tool is negligible, so it's a net win for Instagram.
New projects at large companies often are written in Java. New projects at startups aren't and shouldn't be - doubling future maintenance costs for the sake of a 20% reduction in development time now is a good trade, because 90% of startups fail, the important thing is to validate product/market fit as soon as possible.
(Though really you should just use Scala and get both Java-like safety and Python-like productivity)
reasons (advantages) for using python (or any other dynamic lang.) when in startup mode is more than just typing. Similarly the advantages of continuing to use python outweigh the costs of doing this little "type-dance".
Similarly, I'm amazed by Pycharm (IDE), which supports pretty good inferred type hints from debugging and static code anlysis, btw. Makes writing code a lot easier. Looking forward to trying MonkeyType, it seems awesome for lager projects.
Pycharm relies on pre-generated annotations as well, they're just built-in - it's not as magically trace-y and infer-y as it might seem at a casual glance.
Why didn't you make the Python 3 type checking advisory instead? Like what Facebook did with Flow and Hack, why not make write a product that lets you statically analyze the types and will never itself cause runtime errors, and transition to types that way? What advantage does building this tool have? I understand that the thing I'm describing involves modifying the way Python 3 handles type annotations, but it doesn't seem like more work than building out all of the instrumentation you've done here.
I'm not quite sure what you mean. Python has an external static checker for types, it's called mypy.
Python's type annotations are in fact very similar to Flow and Hack in the sense that they provide gradual typing. The specification (see: PEP 484) describes that only annotated functions are type checked. Calls to non-annotated code are treated as accepting any type in arguments and returning the Any type (a special type which effectively silences the type checker).
This generates a chicken and egg problem: if you don't have enough functions annotated, the type checker won't be able to provide meaningful output to you. So convincing people to annotate their code is harder: they don't see the benefit right away. Worse yet, you already have tens of thousands of functions in your code that you know work in production but were written before type annotations were introduced. It's not really feasible to come back and fill this information manually.
MonkeyType is a tool that gathers types at runtime and enables putting them back in your code as annotations. The goal is for mypy to have more information to work with, making it way more useful.
Sorry, I guess I jumped to the conclusion that you profiled in production because you had a mandatory type checker which would cause runtime errors if invalid types were passed. Let me try another question. Do you think that a project like MonkeyType would substantially aid adding flow annotations to a JavaScript project? If so, why is the value to the Python ecosystem higher than the value to the Flow/JavaScript ecosystem (i.e. why doesn't it exist for JS)?
It's already been flirted with in JS I believe [1], though not specifically for Flow. The problem is that it falls down in the higher order case, which happens quite alot in JS. Also, I don't think JS has a mechanism like sys.setprofile that deals with alot of the pain points.
I was going to address the "why" by explaining actually why (Refactoring time required, impact difference, other negative tradeoffs, active devs on the projects / hiring requirements, benefit to the global python community, ...).
But it looks like this is a troll account, so flag & move on.
As both a dynamic and static type enthusiast, back typing dynamic code is extremely problematic. Fluent use of a dynamic language will use and create constructs that are nearly un-typeable. If you want to make typed code, start typed. If you code with implicit types, use a good type inferred language (ML, F#, etc). If you want to use type checking in Python, use the annotations and MyPy from the beginning.
That said, I am not saying this tool is bad. It could very well help a lot of codebases, but I would warn against using it as part of the operational workflow.
>Fluent use of a dynamic language will use and create constructs that are nearly un-typeable.
Can you give examples? In practice, I haven't noticed this being an issue. A lot of python code already had/has docstrings explaining the types, these annotations just formalize that a bit more.
> With MonkeyType’s help, we’ve already annotated over a third of the functions in our codebase, and we’re already seeing type-checking catch many bugs that would have otherwise likely shipped to production
Are most of their code not yet in production ? Or why do they produce more bugs now with static types, then before with dynamic types !? This sounds a lot like homeopathy, that can both detect and cure diseases with placebo.
"At Instagram we have hundreds of engineers working on well over a million lines of Python 3."
It always amazes me that some of the most popular products around are built with the worst technology choices. And now they had to build their own static type checker, which slows down random samples of real users, just to shore up the language's weaknesses? Outstanding.
The downside to using a certain tech stack is technical debt, not a health risk. Technical debt seems like a wise acquisition in moderation, especially for a startup that wants pictures and buttons on screens ASAP, and even more especially when everyone is realizing that it can rather easily be undone when/where it matters with type checker addons and FFI.
That doesn't have anything to do with the question of whether "raw number of successful projects using it" is a good metric for how good something is. Lots of successful companies have made mistakes, and there are even mistakes that many companies in a field made simultaneously (e.g. the XML mania of the early 2000s is not fondly looked back on).
I'm not commenting on the specific issue of whether Python creates good trade-offs, just that I think tallying successful companies is a poor measure of a programming language.
Ah, I gotcha, that’s a good point. I still do believe that these decisions for dynamic languages that can later be contorted aren’t just a symptom, but I don’t have any way to prove it.
No the person has a point. Often the weakness of the tech stack are not known ahead of time. Or what is considered a weakness is not a weakness when the choices are initially made. Most developers if they had to do stuff over again would make different choices in one way or another. Also later in the game if you ever get there, when you have capital then certain problems that were caused by the choices made can be resolved a different way... as is the case here.
I was taught assembly at the job by an old assembly guru. His first statement to me was something along the lines of: "we'll use an assembler to start with, but it doesn't generate very good code; so, once you're comfortable, we'll hand assemble our machine code".
Let's make this thread useful instead: Please enlighten the HN community at large: Tell us how you'd build instagram, what your tech choices would be and why Python is the "worst technology choice".
(If this goes well maybe you'll get an email from Mike Krieger with the subject line "Here, you do this".)
Another comment in this thread makes a very compelling point: perhaps there is some wisdom to using a language that can be initially dynamically typed, and moved over to static typing later on.
I don't know. I suspect being half-pregnant is a far bigger issue than the technical debt of dynamic typing, whether it stems from being slower to implement new features, spending time choosing the "correct" tech stacks, or spending more time/money hiring qualified engineers.
There's something to be said for the fact that many of the most successful startups in the world are saturated partly or fully with Python and PHP: getting features and new hires off the ground in as little time and money possible is worth its weight in gold.
I've always been very much of the opinion that the smartest way to go is making good use of FFI where it counts, and doing everything else in whatever gets code out the door most effectively. This TypeScript/Hack/MonkeyType trend just seems like a very welcome extension of that.
I must say this is the first time I've been disappointed with the quality of discussion on HN. For a community that promotes using the right tool for the job at the time, I would have thought people would be more open to the choices the early engineers made. I'm sure that Instagram are using a variety of tech across their stack.