Python is great for scripting and for small projects, but I believe strong, static typing is an essential feature for large scale projects. I wouldn't want to use Python for anything that's predicted to end up with more than a couple thousand lines of code.
Last time I saw the Bank of America python codebase, it had ~6 million lines of code, was worked on by about 4,000 developers, and ran some core, performance critical functionality.
Python programming is an aesthetic that needs learning. Many of the worst written, and least maintainable python codebases I've seen are by programmers/teams coming from "proper" languages and don't think they have to learn how to write idiomatic python.
"So, the sad thing is that these poor folks worked much, much harder than they needed to, in order to produce much more code than they needed to write, that then performs much more slowly than the equivalent idiomatic Python would."
I included a link to this blog-post in the "letter of intent" (don't know the exact English term) that I sent to my potential employer just before my first interview for a professional (Python) programmer job, back in 2005. I got the job. Good times.
And it appears to be a recruiter goldmine, the staff turnover is significantly high (so I am told).
A codebase of this size with no static type checking is not going to be fun.
Bzzzt - not really. Quartz in BoA, and Athena in JP Morgan (both built by the same folks) essentially takes Python, connects it to a bucketload of C++ and Java that makes up the bulk of the banks services, adds on a GUI layer, a pretty crappy object storage layer (shudders at Hydra...) and a half-baked object persistence layer that was always so slow. Sure, a lot of Python scripts get written for those platforms, but the heavy lifting -- pricing, trading, order books, risk systems, market data, connectivity all ends up being C++/Java, maybe in a Python overcoat.
> strong, static typing is an essential feature for large scale projects
This is empirically false.
There are (of course!) good reasons to consider static typing. But, in my experience, use of static typing has never been a first-order predictor of business or technology success. It's quite possible to build considerable value with, for example, a large Python 2.x code base.
That reminds me of Kapital [1] - a valuation and risk analysis system written in Smalltalk at JPMorgan, begun in the early '90s and as far as i know still going; 14,000 classes, 400,000 methods, hojillions of dollars of profit, twenty years in service, zero types:
In C++, classes do indeed define types. Templated classes define whole families of types. But in Smalltalk, classes do not define types.
I understand the word "type" to mean a property of a variable which restricts the range of values it can hold, and the set of methods which can be invoked on it. Smalltalk doesn't have any way to do either of those things, so it has no types.
It's true that a class with a set of methods defines a contract with its collaborators about what calls they can make (or what messages they can send, in Smalltalk terms), that is a lot like a type. Smalltalk calls this a "protocol". But they aren't enforced by the compiler; you can still send a message to an object that it won't be able to understand. What happens in that case is that a method called doesNotUnderstand gets called; a class can implement that to try to do something useful (you could implement a proxy this way, for example), but the default implementation throws an exception. I think that a genuine type would prevent the message being sent in the first place - the could would be rejected by the compiler, and would never get a chance to run.
Interestingly, it seems that this was planned for Smalltalk, but never implemented; a 1981 article about the design of Smalltalk [1] says:
"Also, message protocols have not been formalized. The organization provides for protocols, but it is currently only a matter of style for protocols to be consistent from one class to another. This can be remedied easily by providing proper protocol objects that can be consistently shared. This will then allow formal typing of variables by protocol without losing the advantages of polymorphism."
All of it is possible, but specific features help in making better software cost less in time and money.
For example, nullability annotations reduce nil pointer exceptions.
Static typing removes the need to make certain typing unit tests, makes refactors easier to do in large code bases, makes it easier for compilers to generate faster code and so on. Think of them as compiler level linters.
Anyway, python has a static gradual typing mechanism. You should try it out :D
The OP said "I believe" so obviously we can neither empirically prove or disprove the statement. We can say with absolute certainty that static typing is not required for large codebases, because there are large codebases that are not statically typed.
Depends if you define popular by most used or most liked.
Most used: everyone pays taxes. Paying taxes is popular!
A more practical example is javascript. I write some javascript, not because I like it (I hate it), but because that's the only way to make things happen in a browser. Javascript is popular. Does it mean it is liked / a good language?
No it does not, neither in theory nor in practice. Also, "less code" depends primarily on the language structure and not on whether the language is dynamically typed or not. (Haskell, for example, is more terse than Python.)
Most people writing VBA code for excel are only informal programmers. For example, I never took a CS course in my life but learned VBA and programming on the job. Since then I've learned python and other languages but when I'm doing a home project or something I always come back to python. It's just so easy. I was working on a macro the other day in Excel, it took me a couple hours to get it all working but I'm pretty confident that if I could've coded in python I could've banged it out in 20 minutes or less.
Edit: I suppose my point got a little lost. What I mean is that I highly doubt much Excel code is a "big project." It may feel that way when you crawl through some of the hideous VBA coding I've done but much of this is due to the inexperience of the coder and the realities of VBA. Give me python + 4 or 5 libraries and I could recode anything I've ever made in VBA in 1000 lines or less.
No one is proposing that we get rid of spreadsheets.
I understand, though: you hate VBA. And by extension, Access. But Excel, with or without Python, doesn't mean it can suddenly be used as a relational database. But you knew that.
Actually, a number of people are proposing this for certain applications. It is also reasonably likely to happen, at very least the spreadsheet is being challenged which will put pressure on spreadsheet vendors and most definitely cut into their marketshare. Adding Python support to Excel seems to be an attempt to resist this.
You would actually be hard pressed to find ANY major codebase that isn't using at least one dynamically typed language in at least some significant capacity.
Instagram would disagree with you. Seriously, Python is an absolute pleasure for managing large web projects and the lack of static typing has never been an issue for my company (spend the time writing tests instead!)
Every sufficiently large statically-typed application will contain an ad-hoc, informally-specified, bug-ridden, verbose implementation of half of a decent dynamic language.
(I too enjoy a good pseudo-Greenspun)
More seriously something I've been pondering a lot recently watching the old pendulum swing back towards an enthusiasm for explicit typing is this:
* The advantages of static type systems are obvious and easy to articulate.
* The disadvantages of static type systems are subtle and difficult to argue.
I started my career as a professional programmer when the pendulum was moving in the other direction. Essays by Paul Graham on Lisp and Python. The marvellous PJ Eby piece quoted above and Peter Norvig's "Design Patterns are artifacts of language flaws".
I just feel dynamic languages fit my brain better but maybe that's my own form of Stockholm Syndrome. Maybe I need to try a decent type system rather than the brain-damaged descendents of Java...
I think I've never really got the point of a good type system until I started using Elm and then wandered into the rest of the ML world, learning the so called "Type-Driven Development" method.
After some time doing that a Java project came up, so I grabbed Lombok, Vavr and started writing Java as if it was just another ML (immutability first, paying attention to side effects and so on) and the whole thing made sense. More sense than all those years of OOP teachings. The code was easy to debug, easy to reason about, easy to change. And it was Java. And that just stunned me for life.
Then of course, I started using TypeScript for React development and giggled like a little girl every time I had to refactor something, for I KNEW that it was very unlikely I'd have to stare at the debugger for long periods of time in a wild goose chase like I often had to with plain JS.
But the trick was to learn the way of doing things in the languages that really guide you towards that path.
I can definitely recommend that you try Elm if you're into frontend development, or something like F# if you want native. As far as docs go, the Elm guide and fsharpforfunandprofit.com are both great; the latter I can recommend regardless of your language choice for making typed functional programming make sense. I can also recommend the book Type-Driven Development with Idris, which has also been an invaluable resource to really understand that way of doing things.
No, the claim is that the tests that "check it works" are really testing the types (as an ad-hoc type checker) and wouldn't need to be written if a sufficiently strict static type system were used instead.
That's one thing I don't understand. The argument I hear against static typing is that it's too much work to write all this type metadata. But if you have to make up the lack of compiler checks with lots of tests for things that wouldn't require tests in a static language, we are not saving any work.
No, you're testing types and verifying the implementation details of the language. It's rare that a test for logic "incidentally" tests the type system. Usually both the logic and type checks are tested. It's just obfuscated because the bulk of the test is for checking the types and it's easy to look past that.
Yes, it doesn't test the "type system" (whatever that even means).
However, it tests values for correctness. Values have types. So if you are testing whether something has the correct value, you are also testing implicitly that it has the correct type, because for the values to match, the types must also match.
I love Python but I also don’t mind static types. Maybe I will realize some day what this argument is really about, but I expected that day to come by now.
Not for free. You have to think and write types, maybe add some code to cast values between them or implement the same function twice for two different type signatures. Sometimes it still gains time, sometimes it doesn't.
Anyway, I doubt that a VBA replacement would need types. The use case is small scripts.
Python is strongly typed. You still have to think about the types. Except now you have to think about them every time you work with the code, not just the first time.
I work with Python and Ruby (some Elixir.) They are strongly typed in the same way. I hardly think about types. The code just works. The only scenario in which I have to stop and think is when I get some input, for example some JSON. Is that value I have to add to this counter a string or an integer? I can cast it to integer and that's it. To be fair, sometimes an integer gets where there should be a string and boom. Still, I prefer that to having to write types again as I used to when working with C and Java. I fix the code and I don't write tests to check the types of function arguments.
Maybe I could accept some type inference, but no more string, int, generics, etc.
You have to think about the types for some small pieces of your code where those types matter. That's what hints are for. Python lets you, through type hints, only care about the typing in those small cases where it actually matters, and ignore them the rest of the time.
Most of it is caught as incidentals of tests you have to write either way.
There's room for both paradigms - it's kinda silly to argue strict superiority of either because there's just no empirical evidence that having static typing or not drastically changes bug count.
If you take a look at some of the studies out there that do exist (which there are, admittedly, few, and it's a fundamentally tough thing to measure), e.g. [0], it usually tends to be the case the both typed and untyped languages show up in the realm of "least likely to produce bugs"
I'd rather not write a bunch of tests that are really acting as a static type checker or, worse, testing the Python equivalent of compiler, linker and assembler output.
That's what the vast majority of testing is when it isn't simply testing mock code implementations.
Ah yes spend that valuable engineer time writing type validation tests a compiler could do instead of new features.
Put me with Op. I'll use Python for prototypes and small tools but get past that and I want a statically typed language. Not just for validation but also refactoring.
Sitting on 90K lines of Python here. It’s a breeze. Rarely see an error, and when I do, it’s from a third-party API failing to do its job properly (which is then successfully caught to avoid it causing problems). I’m interested in and use other stacks (mostly Elixir), but I don’t have any complaints about the language itself after 10 years.
This is to script spreadsheets, not build the next Netflix.
I would hate a typed language in there. Besides most static typing systems are ridiculously weak and introduce more headaches than they actually solve.
Have you used MyPy? I'm currently looking to adopt it, but the feedback I've been getting is that it's a lot more painful to use than Typescript (which imho sets the gold standard of "optional typing").
I haven't used typescript, but I have used MyPy (Or, more accurately, pytype), and its absolutely a joy to work with. I've also used closure (the JS type system that isn't TypeScript) and I prefer pytype, fwiw.
My only complaint about Pytype is that there's no `Char` type at compile time (ie `for x in "a string"` -> Iterable[Char] instead of Iterable[str] during typechecking). But alas.
Unfortunately, MyPy doesn't work well with libraries or third party packages and treats external objects as `ANY` type. You can work around this with stubs but it's not fun writing type annotations for third party objects / functions. typeshed exists for the standard library and various popular third party libraries[1] to solve this very problem.
You should have a little bit more experience with Python and you would see how huge systems can be written in it with no problem. It needs a bit more discipline, but above 100k lines software written in any other language would need the same discipline also. (I'm working with 100kloc codebase right now and seen almost 300kloc. That was messy because mostly juniors wrote it :D but still bearable and profitable.)
At last! hehe I was wondering why no-one mentioned Cython. Did I miss the part where everyone learned why it's not a good idea? For me it's the best of both C and Python worlds.
Write a Python program. Compile it as a Cython program. (Already faster, with no changes.) Add C types to the speed-critical parts. (Up to many 100s of times faster than Python)
People will tell you that instagram uses python or yelp or other big name projects use python. Certainly, type checking is not Essential to a large project the same way utensils are not essential when you eat. You can just use your hands to shove all the food into your mouth.
And what you are inferring is incorrect. I never said eating with your hands is uncivilized. You inferred it in your request for someone else to infer something.
Oh, stop backtracking. If I were you, I'd take the metaphor further and explain why eating with utensils is more hygienic. Type safety, food safety, ... You could write some flavorful prose (ha!).
That was not the underlying message. The underlying message is: I have my world view, I am offering it, you can agree, disagree or conform. The choice is yours. I would never force anyone to conform. Where in my post did I say that?
You latched onto the word "conform" when the word "should" was more important. You implied that type systems are better in the same way that utensils are better. If that wasn't your intention, your analogy was extremely confusing.
Well, there are rules to using utensils, and eating some types of food using them would seem uncivilized if not plain ridiculous. (Eating without hands, though, is surely uncivilized in the eyes of most people.)
I used to think that but I can't recall the last time the compiler saved me when augmenting or refactoring someone else's code - the IDE beats it to the punch every time - and static typing is not substitute for a good test suite.
On the flip side, i use interfaces and dependency injection constantly in java to work around static typing. In python, i write probably ~20% the amount of code because i never use interfaces, wiring logic or convert types.
I agree and I would add to that that many users of these scripting tools in Office are novice, and dynamic typing makes the language non self-discoverable. Static typing lets the IDE give a lot more feedback on invalid syntax, what can be done from there, etc. So I think this is a disservice we make them.
Saying dynamic languages are for novices hints more at you being one than anything else.
Not everyone likes IDEs (Emacs and VIM are still by far superior to many) and not everyone wants to deal with all the extra code and boilerplate and ad-hoc data classes that comes with static typing, to name a few.
Dynamic languages have faster iteration times and from experience that can yield higher quality software. They're easier to fit in the functional paradigm, better to model data transformations, and a bunch of other goodies.
You can't judge something without taking into account the context in which its used. And for scripting something like Excel dynamic is clearly superior.
I am not saying dynamic languages are for novice. I am saying people who will be using office's scripting are more often than not programming novices (like they are with VBA).
You may like plain text editor but for someone who doesn't know how to program, typing a variable then dot, and having a drop down of what is available from there, with an embedded documentation and direct IDE feedback on what is correct or incorrect syntax immediately after typing every character is super useful. RTFM isn't novice friendly.
Its still possible for dynamic languages to have auto-completion. There's way more information available at runtime than at compile-time.
Besides, IDEs tend to have the entire world in most autocompletions, which is not useful either.
There would be no IDE here, you'll probably still write code from within Excel and advanced users will use separate source files to leverage their editor of choice.
The novices you mention will not want to leave Excel. A dynamic runtime with reflection is all you need to give a friendly experience. That doesn't prevent type hints, inference or autocompletion.
Python is great for scripting and for small projects, but I believe strong, static typing is an essential feature for large scale projects. I wouldn't want to use Python for anything that's predicted to end up with more than a couple thousand lines of code.