Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Reading this as someone who writes mostly in statically typed languages, the whole exercise seems odd.

Having so much dynamically typed code to maintain that you need to run production code using a separate tool just to figure out the types sounds just wrong. Why not use a statically typed language for such a large code-base? Is this done by purpose, or did they end up with a million lines of Python code and are looking for ways to make the maintenance easier?

And before I get down-voted to hell - I completely understand using Python for many things. It a good technical choice for many different problems, but navigating a million lines of Python seems just daunting to me (although maybe I'm just not experienced enough with Python).



>Is this done by purpose, or did they end up with a million lines of Python code and are looking for ways to make the maintenance easier?

Definitely the latter. I've seen this discussion a few times before, and it's always the same. Your initial developers are not looking down the road to the million lines of code milestone, they're just trying to make a product that might actually make some money here and now.

I'm sure Instagram was exactly that. They needed to handle images and some guy knew how to do it in Python. They wrote Python code, and then people liked Instagram. They eventually became a billion dollar company with millions of lines of code and no where along the road was there time to say "hey we need to refactor this whole thing". Or if that was said, management laughed and said "we need this feature".

So here is where you end up. The developers need to clean things up but they don't have time to clean it up by using a language, realistically, they probably don't know as well as the Python they wrote the millions of lines of code in.

Re: your last comment, navigating a million lines of any codebase is daunting, and especially more so if you aren't a developer in that language. I'm not sure what exactly "Python" has to do with that, besides that you're not a Python dev.


To add to this, note that type hinting is quite a new feature in Python (introduced in v3.5, released in 2015), and this functionality simply wasn't available before. So any company heavily invested in Python today obviously wants to improve their runtime reliability, without having to rewrite parts of their stack.

Stricter typing goes a long way to achieve this, and gradual typing allows you to upgrade the code base at your own pace, which is great.

Consider this study[0] about TypeScript and Flow, which use the same approach for JavaScript, which found both able to detect ~15% of runtime bugs. So no wonder companies with large Python code bases would be the first to invest in this space.

Personally I feel this is a great addition to the language, and hope type checking becomes a first class citizen too, instead of being delegated to external tools like mypy[1] or pytype[2].

[0]: http://ttendency.cs.ucl.ac.uk/projects/type_study/

[1]: http://mypy-lang.org/

[2]: https://github.com/google/pytype


Dropbox is very heavily invested in Python. I am under impression they hired Guido van Rossum to do exactly this, among other things. First 100% statically type the old codebase, then port it to Python 3.

You can statically type Python 2 codebases, but the language does not offer native support for it. Thus, all needs to go to docstrings or comments.


I'd consider those languages optionally typed, rather than gradually typed, as they don't insert run-time type checks. Regardless, I still think it's all interesting and valuable work.


I can picture these poor souls vividly. Millions of lines of python, flowing like the mightiest of rivers. Nobody really knowing whence it cometh and goeth.

A hero arises, offering a sacred herb to calm the torrent and light the golden path. The hero is elevated, yet they continue to pray


They say there is a holy land called Haskell, but it is only revealed to the truest of believers without the weight of the chaotic-neutral entity “Shareholder” weighing ever so heavily on their shoulders. For those in Shareholder’s clutches, one must forgive their prayer. ‘Tis the best they can do.


There is something else, in the darkest reaches. It has many incantations, but the non-believers have a singular name. Lisp.


With it's comforting, reassuring warmth, Perl shines on as a luminous sun, lighting the way for youngling languages to learn from. Hushed whispers foretell the sunset, but none truly believe them...


Back to Perl folks.. The freedom, the happiness !!


> forgotpassagan

They have password managers you know and also spell checkrs too. :P


Looking it up, it seems pretty much that:

Kevin Systrom " thought of combining location check-ins and popular social games. He made the prototype of what later became Burbn and pitched it to Baseline Ventures and Andreessen Horowitz at a party. He came up with the idea while on a vacation in Mexico when his girlfriend was unwilling to post her photos because they did not look good enough when taken by the iPhone 4 camera." (Wikipedia)

He used Django because I guess that was an easy way for one guy to do it fairly quickly. The app was Burbn which then pivoted into Instagram.

By the way I kind of surveyed the "what framework should I use" stuff on HN over the last year and Django still seems the most popular, probably followed by Rails and Phoenix.


Navigating a large code base that is dynamically typed like Python is far more tedious than something like C++ or C#.

First you can't read what the types passed into and out of functions are. You have to find their usages to work it out. Second, you can't reliable do things like "find usages" or "go to definition" because of the dynamic typing.


> Second, you can't reliable do things like "find usages" or "go to definition" because of the dynamic typing.

In my experience PyCharm can do both correctly for the vast majority of cases.


Still very limited, for example, if I have:

def some_func(foo):

   foo.run()

   ...
Find usages in the run() method will return dozens of results, the IDE can't help you any more, to find what 'foo' is at runtime.


Isn't that basically saying "it fails in the kind of cases that wouldn't even be possible in a statically typed language"?


Indeed; it only fails in the really fiddly cases where you need it the most.


Most of the c++ and c# code I see lately has so many things declared as auto, it's hard for me to figure things out too.


Are you using notepad to write c#? If you hover on the var you will see the tooltip with the type.


When I was writing c#, I did use visual studio, but since I'm used to developing in a terminal editor, all these tooltips and things are a little tricky -- sometimes they disappear, and then I can't get them back, etc.

But more often, when I'm looking at c# or c++, it's not code I wrote, it's not code I intend to change, it's code that's interacting with my code (written in another language) that I'm trying to see why it's misbehaving, so I can get the owner to fix it. I could be reading the code on GitHub or some other web view, I might have checked it out, but I have no interest in setting up a (probably new) IDE to look at it as the author would; I dig into too many projects to learn that many tools -- and deal with the upgrade cycle for them.

Sure, it would be useful to hover and get more information, but I'm used to loosely typed languages, so it's not awful. It's just jarring to see that the type information is apparently not important enough to write down the name in c++ or c# anymore.


Would be nice if IDEs had a key combo to fill in the actual type for "auto"s!


That would be nice!


I'm not sure what exactly "Python" has to do with that

Well as he said, a statically typed language is better in that kind of situation because it enables a better class of tooling and the typing system enforces certain style constraints, that enables better quality of code analysis en mass.

Python specifically is very lightweight in this regards with little in the way of naming constraints (vs for instance Ruby having different formatting rules for different types)

So yeah, it’s not “just like any other language” - horses for courses


I really don't get people. Instagram and Dropbox, through typing annotations in Python, are gradually improving a language that has codebases running globally, from YouTube to NASA.

Clearly something is right with the situation when the incentives are aligned for a tech company to contribute back to the open source community in such fundamental ways. So why look for the mole and think "They should have done it differently", when doing it differently has a high likelihood to mean not being as successful as they are today, and not having the occasion to contribute back?

It's like telling a successful charity "You should just take everyone's money and spend it on lamborghinis instead of wasting time building wells in africa".


No, it's like asking someone who spent a lot of time building an octagonal wheel and is now trying to shave down the corners... why didn't you use a circle to begin with.


To me, this is a metaphor that might better explain it, "You need a different approach for getting your first million than your second." (I've heard it attributed to customers, revenue, personal income, etc). It sounds like Python (and features like dynamic typing) works very well at bootstrapping and developing. They're leveraging different features (more like static typing) once they get larger and more time is spent on maintenance (and rewriting everything isn't appealing [1]).

Honestly, your octagonal wheel metaphor works, too. Building the first car you spend a lot of time on octagonal (crude) wheels, but later spend a lot more money on round (precision) wheels. You could have gone bankrupt spending money originally on round wheels that were the wrong size.

[1] https://www.joelonsoftware.com/2000/04/06/things-you-should-...


They did not spend a lot of time building an octagonal wheel, they built a billion dollar company using Python. Now, when the code base has been proven, and the business rules solidified, they retrofit what they believe will make the code base easier to maintain.

Python is an excellent enabler of this kind of dynamic system evolution.


You've totally missed the analogy. See pfranz's comment.


No, I the analogy just wasn't very apt.

Python allowed them to build a successful company. Now, when their stack is mature and maintenance is more important than rapid prototyping, Python allows them to add type hinting.

Because they are engineers, they built a tool (in Python) that allows them to do it in an automated manner.

And all of this is great!

They are evolving their code to fit their needs; it's nothing like making a octagonal wheel and wishing you'd have gone for a round one in the beginning.


The point is that there is no property of python that allowed them to build a successful company that any statically typed language don't have. It is a completely unnecessary detour.

The cost of doing it right from the start is negligible.


The first language I learned (after Applesoft BASIC) was C. I wrote C for a long time. About 6 years ago I picked up Python. Today I find it much easier and much more pleasant to spin up a new idea in Python, to the point that it is my default choice for new projects with fuzzily defined goals. None of my ideas have become companies, but I could totally see just sticking with Python even past the point that it became unwieldy.


Between c and Python there is an ocean of languages...


I'm with the OP on this, I've experience with Java, C#, Python & C++.

I'm a big fan of rapid prototyping with Python to map out problem domains, and once the domain has been mapped properly, rewriting in a statically typed language if necessary.

Python is much better for prototyping than the other langs I've used. Because the syntax is almost pseudocode, and the duck-typing makes a lot of design patterns and boilerplate obsolete, so I can dedicate my headspace to the problem at hand.

Right tool for the right job, as they say.


True enough. I've dabbled in several. These are just the two I happen to have the most experience in and the ones that seemed relevant to my point.


You're assuming that the original developers would have been just as productive in a statically typed language as they were Python.

Big assumption.

Both because they might have known Python already and also because Python is quite a bit more newbie-friendly, concise and expressive than the mainstream statically typed languages.


Obviously what the programmers knew to start with is of importance. But that's no property of the language (well, sure, being easy to pick up increases the risk).

But no, ignoring that it is not a big assumption really. The benefits of static typing comes pretty quickly, especially if there are more than one programmer.


I think the general consensus is that that is not true. Python's dynamic nature is a clear advantage it has over statically typed languages. Add the fact that you can elect to tune down the dynamism when it makes sense to with very little impact on your existing stack makes Python the technological superior choice for the majority of applications.


Absolutely not. Python is not the technological superior choice for the majority of the applications. If you think so then your experience in different domains and application types must be very limited.

The preconcept that dynamic languages are more productive is just an illusion because you can easily take shortcuts that will hamper your progress in the future. A proper typed language with HM type inference has the ability to mostly avoid writing the types with the guarantee that the compiler will catch most of the pitfalls. And if you don't do any logic error pretty much every time your code just works. Saying that in a million line application Python is a better choice than F# or Haskell it's frankly ridiculous in my opinion.


Empirically, there are a lot more million line python codebases than F# or haskell codebases, in fact I can name multiple million line python codebases, and 0 F# or haskell codebases. Given that, logic would indicate some sort of failure on the part of haskell and F#, or they would see wider adoption among the large codebases where they are so useful.

Do you disagree?


Given how much smaller is the F# community and how much more you can crank in less lines of codes in F# I can believe it. Between C# and F# there is about an order of magnitude of difference in the LOCs for big projects and C# and Python are comparable from this metric.


It appears you missed my point. I can't think of a 100kloc f# or Haskell codebase, so even if they were 10x as terse as python, which they aren't, python comes out ahead. If they're so much better, why don't people use them?


I can think of 100kloc Scala codebases, e.g. Kafka.

People do use ML family languages, and they are better. There are plenty of non-technical reasons they aren't as widespread as dynamic languages or shitty static languages.


Right, but Scala is different from Haskell or f#. It doesn't use the same kind of type inference (hm) as classical ml derivatives.


Does this mean I have to return all the money I made?


There's nothing here that tunes down the dynamism. Hints aren't statically checked or enforced. It's still possible to pass in an empty list to an int-hinted var and, e.g. have `if not var` evaluate to True (rather than raise an Exception).

Type hints allow external tools to check some things, but at this point you're basically imposing static types so why not use a language with the tooling and optimizations to take advantage of that?

Python is a good choice to prototype, write small (less than a few thousand lines of code) projects with non-trivial complexity, and somewhat larger projects with more boilerplate (e.g. Django webapps). Beyond that its utility diminishes until it starts to become a hindrance.


Ah yes, this old chestnut: "(language I don't like) is only suitable for teeny-tiny puny baby child's toy programs, and once you're not writing those anymore you must use a big strong grown-up language like all the other Real Programmers™ do!"

The empirical evidence of reality is against you: there are successful large (in terms both of codebase and contributors/development team) projects in these awful terrible children's languages, and there are unmaintainable failed piles of crap in even the most grown-up of languages you'd care to name. The choice of language, and choice of type system, seem not to correlate with the success or failure in a meaningful way.


It doesn't correlate with success, but the choice of language does correlate with development speed, number of faults, maintainability, etc.

The interesting thing to note is that a language that's perfectly acceptable at the above at small or medium scale might turn into a hindrance at large-scale. An otherwise fast to develop in language like Python won't be so fast if every change has to be painstakingly reviewed and tested due to the complexity of interactions in the code base.

Using a type system to verify assumptions/requirements is not a recipe for success, but it can improve reliability.


won't be so fast if every change has to be painstakingly reviewed and tested due to the complexity of interactions in the code base

You can write spaghetti code in any language, it turns out. Blaming the language for that is not really an indicator of understanding the problem.


Yes one can do a poor job in pretty much any situation, I'm afraid that's not an argument for anything though.

Here we're talking about average or best-effort: large code bases are complex in spite of the best intentions of their maintainers, so using tools that can manage that conplexity in an easier way through e.g their type systems could lead to better results.


>Type hints allow external tools to check some things, but at this point you're basically imposing static types so why not use a language with the tooling and optimizations to take advantage of that?

Python's type system is, imo, currently better than Java's, and the syntax is cleaner than java's or C++s. You get all the benefits of static typing without having to put `auto` and `List<>` everywhere. And at the same time, you get all of the advantages that python has over statically typed languages that aren't haskell (like comprehensions). And, when you need to, if you're doing something that's especially tricky or dynamic or whatnot, you can fall back to untypedness.

I think the closest parallel I can draw is to something like Rust. You get a huge set of guarantees for free, but can opt to do unsafe things when it's absolutely necessary, and better yet, you can start in unsafe land and then go back later and make sure your code is safe.

I'm curious what tooling you feel that say, Java, has over type-annotated python.


Hints don't provide any guarantees. It's still possible to silently and unknowingly pass the wrong type of value into any given argument, with or without the checker. The "tooling" the other languages have includes a compiler that performs these checks in a way that Hints + Checker-of-choice is unlikely (or unable) to. "What do you think these checkers do?" you might ask. The answer is: not nearly what a compiler does.


> It's still possible to silently and unknowingly pass the wrong type of value into any given argument

This is also possible in traditionally statically typed languages. Nothing stops you from doing unsafe casts or using reflection. Much like its exceedingly unlikely that you'll run across this in "normal" java or C++, its exceedingly unlikely for you to run into any issues with this in python. And, in fact, the typechecker has ways to handle unusual things like dynamically created attributes, for when that comes up.

And yes I mean this quite honestly. I've seen a lot of typechecked code, some of it quite ridiculously dynamic. Typecheckers perform absolutely fine.

>The "tooling" the other languages have includes a compiler that performs these checks in a way that Hints + Checker-of-choice is unlikely (or unable) to.

What way is that? Typechecking is static analysis. There's really no difference between how java or cpp does typechecking and how mypy does, other than that the python typechecker isn't installed by default.

>The answer is: not nearly what a compiler does.

This is not an answer.


> This is also possible in traditionally statically typed languages. Nothing stops you from doing unsafe casts or using reflection.

Neither of those is "silent" or "unknowing".

> What way is that? Typechecking is static analysis. There's really no difference between how java or cpp does typechecking and how mypy does, other than that the python typechecker isn't installed by default.

The typechecker can't handle un-hinted code (or, rather, it chooses something very permissive, like 'Any' for all hints). It's incomplete at best.

> This is not an answer. It is. That you don't like or agree with it doesn't make it not an answer.


>The typechecker can't handle un-hinted code

And in Java or c++, un-hinted code couldn't compile. The python type checker can do more than a java or c++ checker in this regard.

>Neither of those is "silent" or "unknowing".

They're exactly as silent or unknowing as you would get in typed python code. You appear to be comparing untyped python. That's an incorrect comparison. Offhand, I actually can't think of anything I could do in typed python that would get around the type checker, that wouldn't be considered reflection or a dynamic cast, and be very obviously so in python too. If you have an example of a silent or unknowing failure of well typed python code that passes on mypy, you should probably file a bug report ;)

>That you don't like or agree with it doesn't make it not an answer.

You're right. Its not an answer not because I disagree with it (I don't), but because it doesn't actually answer anything, which is why I don't disagree with it.

To summarize this:

Python typecheckers are capable of more type inference than Java, and require less syntax than c++ or Java to get well typed code. A typed python codebase can interact cleanly with an untyped python codebase, and within the typed parts of the code, you get equivalent safety guarantees to what the type systems of Java or C++ provide.

Your appear to be ascribing magical powers to compilers in other languages, when those compilers have exactly the same type information as mypy does.

In other words, going back to your first statement:

>Hints don't provide any guarantees.

Hints provide exactly the same guarantees as any other type system: "Assuming you write reasonable code that doesn't attempt to subvert the type system, the type system will catch any dumb mistakes you make."

That's the exact same guarantee you get in any statically typed language.


No it's absolutely not. In Haskell and F# you don't need to write types annotations to get that guarantee.


Nor do you in python in many cases, it's type inference is quite good, certainly better than c++ or java, which is what I was speaking of.

But you're right, I should have specified "traditional" static language.


"That's the exact same guarantee you get in any statically typed language." You wrote that and f# and Haskell are statically typed


You don't put auto everywhere. You write out the type in 99% of cases and save auto for 100+ character templated types.

There is no reason to to save the literally 0.2s (you can still spend that time reasoning about your code) it takes to write the type. It is better for yourself writing it and for readability to be explicit.


By everywhere, I mean where it's otherwise obvious:

    auto s = "Hello world";
    for (auto c: s) {
        cout << c;
    }
Those autos don't need to exist, they're completely inferable, otherwise you wouldn't use auto. It's not like you can use auto in function declarations, nor should you, I agree.


I always write std::string etc. in those cases. It is consistent and quicker to read and there is not tangible benefit to using auto.


Right, but my point is that there's really no tangible benefit to writing the type at all. In a language with good type inference (Haskell, ml, modern python), the string literal is known to be a string, and you don't need to do any extra work.


> Beyond that its utility diminishes until it starts to become a hindrance.

Tell that to any serious Numpy/Scipy/Pandas user.


Being the lead on a data science team I am one of those. They're great for exploratory research and prototyping, and for use in the very tiny fraction of code in a production system that deals with machine learning if I must. For everything else, from the data pipeline to delivering results, I'd prefer and recommend something else, like Haskell, C++, go or Rust.


I agree with you but I think it's pretty straightforward how this sort of thing happens:

1. Startup builds thing fast in dynamic language because they need to optimize for development speed and iteration, not maintainability or scalability.

2. Startup grows and continues to hire for expertise in the tech stack they are mostly already using.

3. Repeat for some years and some hundreds of engineers and you arrive at this exact scenario.


This is exactly it.

People are running a business, not writing an a treatise on code maintenance and hygiene.


Ok, then the business men among us should learn their lesson, that agility matters.

The software developers among us should also learn their lesson: don't build large-scale software in dynamic programming languages unless you can afford to spend time later adding a static type system on top.


I'd think the business men at Instagram have been very happy with the agility of development, that got them to a $2.8 billion revenue p/a company. More than enough to cover the engineering effort to help improve the maintainability of the code.


Considering the company that bought them runs on PHP, perhaps we should all ditch Python altogether.


I agree - there's just always a condescending tone in the way people question the technical practices in hindsight of companies that are successful.


> 1. Startup builds thing fast in dynamic language because they need to optimize for development speed and iteration, not maintainability or scalability.

> 3. Repeat for some years and some hundreds of engineers and you arrive at this exact scenario.


Isn't it faster to write everything in Go, for example, that has a compiler guiding you all the way and you rarely get runtime errors? I feel my developer time very much "optimized" when writing backends in Go than when I wrote then in Python (I also tried Node, which was a disaster).


It's probably faster to verify that your Go code is working as intended, but many would argue that dynamically typed/interpreted languages are faster to write code in than statically typed/compiled languages. Others would argue that the pure "writing code" part isn't the majority of what takes up a developer's time. There's no one right answer.


Go wasn't a serious option for Instagram though. According to Wikipedia Instagram launched in 2010, Go launched in 2009. That would have made them very early adopters, was the library support there back then like it was for Python?

Just because there may be better tools now, should they scrap their working code that earned their fortune?


Well, OCaml, Haskell or maybe Scala would've been mature options in 2010, combining safety with language-level productivity that's comparable to Python, though the extent to which web frameworks available in those languages at that time were Django-equivalent is arguable. (Personally I would - and did - happily choose Scala with Wicket over Django in 2010)

In terms of what Instagram should do now, I'd say they should do what Facebook did: introduce thrift or similar, gradually move business logic into backend services written in more suitable languages, leaving the Python to eventually become just a thin web frontend. Retrofitting types onto code involves a lot of the same effort as rewriting it into a better language, and the rewards for the latter are higher, IME.


I'm not talking about Instagram, I'm answering the parent comment, which talked about developer productivity in general.


Empirically? Doesn't seem like it. But it's hard to run a good controlled experiment. The confounding effects of the team members, the project goal, and the vagaries of business are too noisy.


We're getting into a world where languages finally have type systems that dont suck for fast development. This wasn't the case until very recently. And many of the current options only became realistically viable in the last few years (or months!).

We still have to work with the world as it exists. Not as it should be or will be. And even with the crop of modern languages, its often still faster to start with less optimal languages and fix shit in the 1/10 chance you're actually successful.


What are you talking about? Haskell is 30 years old.


You must have missed the part about "the world as it exists" and "fast development" and pretty much the entire point of the post you're replying to.

Haskell remains impractical for many use cases, it is not used much outside of academia, it's not documented to be used outside of academia, and it didn't even have a working package manager until a few years ago.


Yeah. And you need 20 language extensions to sit at the top of your every file to do anything useful with it.


Many in the Haskell community talk about being guided by their types, perhaps you're seeing a similar benefit?

Not sure why the negative vibes toward this comment, I think it's fairly sensible (of course, this is a very subjective subject).


I think a lot depends on how solid your requirements are. If there's no need for further design as you develop, types are great. If you're writing one to throw away, they're a waste. If you're chopping and changing as you write, it can go either way.


It's rarely easy to re-implement large or complex codebases in a new language. That kind of major effort requires significant signoffs from leadership and a large effort.

This tool? This tool takes an existing set of codebases and makes them safer. No major boiling of oceans required.


Personally, I would love to ditch python for a language with strong types and type inference, but what is the replacement for django? Where do I find a well-designed, well-documented, battle-tested framework that I can easily hire developers for?

Currently, I think the nearest competitor is node with typescript, and I'd rather stick with python. Please tell me if I'm wrong.


I dream of a time when OCaml becomes the obvious answer to this question.


Java? But that's another kind of hell


PHP 7


> Why not use a statically typed language for such a large code-base? Is this done by purpose, or did they end up with a million lines of Python code and are looking for ways to make the maintenance easier?

Being in a similar (though much smaller scale) situation ourselves, I suspect your latter suggestion is the answer; they wound up with a large amount of Python code due to expediency (Python is good at quickly getting things done), and are now finding the code base quite hard to maintain.

Something to remember is that as of 10 years ago, statically typed languages consisted of C (for some definition of "statically typed"), C++ (complicated and cumbersome, and didn't yet have widespread availability of "modern" C++ features), Java (cumbersome, heavyweight, lots of missing opportunities for abstraction back then), C# (heavily tied into the Microsoft ecosystem, not yet open source), and a bunch of weird academic languages that required tutorials about burritos to learn (ML, Haskell, etc).

While statically typed languages like C, C++, and Java were the mainstream languages of the 90s, C was too limited for a lot of people, C++ and Java too cumbersome to use, and so lightweight dynamically typed languages like Perl, Python, PHP, Ruby, and JavaScript picked up a lot of steam due to how much easier to pick up and more productive many programmers found themselves in those languages. But now we have large, fairly un-maintainable code bases in these languages, and people are realizing the value of static typing, in part due to the maintenance hassle and in part due to newer, more expressive and accessible statically typed languages being available (Elm, Rust, TypeScript, Scala, Go, Kotlin, as well as improvements to C++, Java, and C#).

But that leaves all of these old codebases, that are hard to maintain. Rather than doing a complete rewrite, adding static typing capabilities that can be applied to existing codebases is a way to make them more maintainable without spending all of the time of a complete rewrite and having everyone have to spend all of the time learning the new language while still maintaining the old codebase.


Also, these old code bases can now be fairly easily and optionally retrofitted using modern Python's support for type hints and tools like mypy and now MonkeyType. So people can continue using their existing code base and the vast Python ecosystem, with the added benefit of type checking where appropriate.


My job is mostly machine learning and statistics. I would love to use something like Haskell or even Java, but the NumPy/SciPy/Scikit-learn/Pandas etc. ecosystem is just so far ahead of everything else that it's not worth it.


This tool seems to be built for teams that have already created large codebases in non-typed languages. This is a pretty common thing today. They are addressing a very real problem that a lot of people face which is "oh we fucked up, what's the most painless way we can fix it".

We just recently moved our entire codebase from JS to TypeScript which was pretty hard but super worth it.


Think you trade maintainability for initial productivity with a dynamic language.

You can churn out a greenfield project faster as you don't have to spend upfront time mapping out interfaces, DTO's etc. It's also easier to 'hack'.

Of course the above makes the code harder to maintain and reason with, unless written by very disciplined engineers.

So these languages are a natural fit for startups (and things like prototyping and scripting).

Instagram would have been in this category, and it's worked out well for them.


>>> but navigating a million lines of Python seems just daunting to me (although maybe I'm just not experienced enough with Python).

It's not that bad.

The first thing you learn when you're in a million lines codebase is that you will only work within your project of maybe a hundred files.

Once in a while, there is a guy who is asking for help on his project or there is an old weird bug to fix and you dive in other stuff. Otherwise, it's like it's not there.


Sounds like the projects you worked on had excellent encapsulation.


I wouldn't say excellent encapsulation. It's just normal to use sub directories to split things. The project was over ten millions lines of python.


Subdirectory structure is sort of orthogonal to the issue of encapsulation. It's more about providing API's and clean abstractions which prevent incidental interactions between implementation details across module boundaries.


Didn't Facebook do this for PHP (Hack) and Microsoft with JavaScript (TypeScript)?

From a technical aspect, I do find these projects cool. I wonder if its more efficient for large companies to initially develop using dynamic languages then transition them with these optionally typed languages.


Small projects scream dynamic types, but not every baby never grows up. Eventually the cognitive load becomes crippling and you're crying for static types.


Dynamic languages definitely help with the rapid development of a small application or prototype, but when you have a large team trying to maintain the application, static typing allows for better compile time error catching. Weird hybrid approaches makes me wary.


Facebook did even more with PHP with HHVM and other tooling. I'm not sure if JS was exactly the same for Microsoft as I saw it more as a cool internal project that got big vs something that eventually became necessary.


From what I read, several big teams at Microsoft were dogfooding Typescript immediately during development, even before it was ever externally released.


The evolution is fairly simple: python is very easy to setup, especially with django for working on the web, and for writing scripts quickly it work really well when everything is just an 'import' away.

Then as the product gets bigger, you'll hire python developers to keep up with the workload - and the best ones will be the ones who have committed their lives to Python. So you'll now end up with more and more python code.

Before you know it, you aren't writing small scripts anymore, but now you are writing quite large features, that take weeks and that require intimate knowledge of the code base so you don't keep backtracking and repeating yourself. But Python doesn't help you at this point, you traded static types for flexibility and now you have to pay the price.

At this point you're screwed, too many man hours spent on the codebase to redo it, so what to do? Well if you have the man power, build your own static type checker of course! I mean after all, if you have 100s of engineers who cares? You just throw more people at the problem until it goes away. Then wrap it up in a nice little package, and slap yourself on the back while you ride the instagram bubble.


Rewriting a whole app would be at least as big of an investment, in my opinion...


But not as useful for the resumes of the devs involved.


Did it occur to you that rewriting an internal app has no benefit to the community at large, whereas publishing a tool is a clear improvement on the community's tools?

Did it occur to you that instagram published a valuable and useful tool that now just exists, and this is now a non-issue for anyone else in their situation?

Like, why are you complaining about resume boasting?! It's like you want people to do useless work that has no positive effect on the open source community. Are you just upset people are using Python or what?


Rewriting over a million lines of code in a statically typed language coming from one likely as riddled with type errors as this codebase is is unlikely to be productive. They're making the best of a terrible situation.

I also think that if they want to use types the correct approach is to apply a tool like this as a stop-gap but write new components going forward and bug fixes/feature re-writes in a language that supports types "properly" (i.e. in the way they seem to want, that is static types checked at compile time).

I think tools like this are great for companies in situations like this. I don't think they're good to use from the outset: the team should just use an actually statically typed language instead.


It's important to realize that these tools just didn't exist 7-10 years ago when companies like Instagram and Dropbox were getting started.

Though type inference has been around in so-called "academic" languages for decades, it hit a tipping point in the last 10 years, to the point that every major dynamic language has static type checkers or dialects (like Typescript) that support static checking. Meanwhile, even traditional statically typed languages are growing stronger type checkers.


You appear to be ignoring who this coming from. Instagram is huge, and they built their service with Python. It isn’t coming from someone who is arguing this is how it should be done from the beginning. Hell, given the typical lifecycle of a startup, this situation occurs exactly because a dynamic language is chosen to speed getting a product to market and try to build a business around it.


I literally just had this exact thought independently of you. I mean if you need to do something like this, doesn't that mean you should be writing code in a statically typed language?


There's still a good deal of people who think of static typing as 'limiting', and dynamic typic as 'human'. Matsumoto said as much during (iirc) last years Ruby conf.

I think the reverse is true. Static typing is liberating for humans because it tames complexity. Because I'm not a machine I cannot possibly keep track of fuzzy programs that arise from dynamic typing.


> Static typing is liberating for humans because it tames complexity.

It doesn't, though. Not with the currently existing type systems and implementations.

- Without type inference you end up righting multi-tier type declarations everywhere.

- With overly powerful type systems you need something close to a PhD in math to create proper types and then figure them out half a year later when you've already forgotten most of what you did

- Union and intersection types which are extremely valuable are missing from a lot of statically typed languages

And because I'm not a machine I often cannot figure out what a yet another two-hundred multiline error message wants of me. Often I'm happy to just throw an `if (x && x.field){}` and be done with it.


you'll have to expand on this a little for me. I just recently looked at an older Haskell codebase I was working on two years ago, and simply because of a very few straight forward types, nothing special, I could really wrap my head around the code-base a lot easier. Not just because the code tells me in plain text what types occur where, but also because enforcing a strong type system encouraged compositionality and well-formed behaviour in the first place.

If I look at some of the python code I've written, I'll be perfectly honest I often cannot tell you what to make of it any more.


Types being code depend on people writing them (as is the rest of the code :) ). I guess you’re lucky/smart that your code base is just simple types.

Quite often people construct complex type hierarchies just because they can (or don’t know better). And it’s a pain to wade through and coerce to what you want it to do.

I’m very much on the fence between static and dynamic typing, having used (and probably abused) both. I prefer a “pragmatically” typed language, but I haven’t come up with a proper definition for it yet :)


I also get the impression that some people view using dynamically typed languages as a badge of honor, taking more expertise to harness the greater 'expressive' power, all whilst juggling the types in your head. Bad programmers can't use them properly, but if you're one of the good ones then you're not held back by rigid types.

I'm one of the dumb ones and just let the computer do the checking for me.


Disclaimer, I'm a regular horn tooter for F# but imho this is such a perfect case for the language. It has similar line density to python, but with static type inference. Admittedly you will have to be explicit about using mutables, but I'm going to bet that instagram already cares about that.


there's a difference between should be and should have been

maybe it's easier to do this half-measure than rewrite your entire code base


I get that part, but for new projects, this perhaps is a hint to do things in Java?


80% of startups are going to die before their codebase reaches this size. They should probably not be making tech stack choices based on "what if we succeed beyond all likelihood".


two things come to mind: maybe those startups would die at a slower rate if their code was comprehensible to begin with ? but leaving that aside, for companies where survival is not an issue then, doesn't this indicate that using a dynamically typed language is not great?


I would be interested in seeing the number of startups that fail due to technical debt. My instinct is that most startups fail for business reasons (no clear need, not enough/right sales, poor management, bad pitch, solving the wrong problem, etc).


Well, in the case of Instagram, this just shows that Python was a great choice.

They went with a dynamically typed language, were successful, the language they chose added an optional type hinting system, and they wrote a tool that would automatically type hint their code in order to reap many of the benefits of a static type system.

I think that the amount of man hours that went into writing the tool is negligible, so it's a net win for Instagram.


New projects at large companies often are written in Java. New projects at startups aren't and shouldn't be - doubling future maintenance costs for the sake of a 20% reduction in development time now is a good trade, because 90% of startups fail, the important thing is to validate product/market fit as soon as possible.

(Though really you should just use Scala and get both Java-like safety and Python-like productivity)


reasons (advantages) for using python (or any other dynamic lang.) when in startup mode is more than just typing. Similarly the advantages of continuing to use python outweigh the costs of doing this little "type-dance".


There's lots of good and bad things about Python.

The worst thing -- to me -- is dynamic typing.

So why not fix it?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: