Btw, am I alone in thinking that DataFrame abstractions in OOP languages (like Pandas in Python) are oftentimes simply inferior to relational algebra? I'm not sure that many Data Scientists are aware of the expressive power of SQL.
There are loads of things that are not possible or are very cumbersome to write in SQL, but that pandas and many other dataframe systems allow. Examples are dropping null values based on some threshold, one-hot encoding, covariance, and certain data cleaning operations. These are possible in SQL but very cumbersome to write. There are also things that are outright impossible in a relational database related to metadata manipulation.
SQL is super expressive, but I think pandas gets a bad rap. At it's core the data model and language can be more expressive than relational databases (see [1]).
I co-authored a paper that explained these differences with a theoretical foundation[1].
Thanks for sharing this. I believe we essentially agree: chaining method calls is inexpressive compared to composing expressions in an algebraic language.
I'm not defending Pandas but just want to point out that the inability to conveniently compose expressions is one of the biggest problems with SQL, since it was designed to be written as a sort of pseudo-English natural language, in an era when people imagined that it would be used by non-programmers. To be clear, that's a problem with SQL, not with the idea of a language based on relational algebra. There are various attempts to create SQL-alternatives which behave like real programming languages in terms of e.g. composability. This blog post makes the point better than I can:
I absolutely agree - one of the biggest shortcomings of SQL is that its primary programming interface is based on text and intended for human, instead of being based on data structures and intended for programs.
SQL does not exactly implement relational algebra in its pure form.
SQL implements a kind of set theory with relational elements and a bunch of practical features like pivots, window functions etc.
Pandas does the same. Most data frame libraries like dplyr etc. implement a common set of useful constructs. There’s not much difference in expressiveness. LINQ Is another language around manipulating sets that was designed with the help of category theory, and it arrives at the same constructs.
However SQL is declarative, which provides a path for query optimizers to parse and create optimized plans. Whereas with chained methods, unless one implements lazy evaluation one misses out on look aheads and opportunities to do rewrites.
Pick one :) the way I see it, if declarativeness is not a factor in assessing expressiveness, then expressiveness reduces to the uninteresting notion of Turing-equivalence.
Expressiveness and declarativeness are different things, no?
Are you talking about aesthetics? I’ve used SQL for 20 years and it’s elegant in parts but it also has warts. I talk about this elsewhere but SQL gets repetitive and requires multi layer CTEs to express certain simple aggregations.
Agree. I've completed data pipelines for several projects and have found that the cleanest, and often fastest solution is to use SQL to structure the data as needed. This is anecdotal and I'm not an expert with SQL, but I haven't come across a situation where R or Pandas dataframes worked better than a well written query for data manipulation. This has the benefit of simplifying collaboration across teams because within my company not everyone uses the same toolset for analysis, but we all have access to the same database. Other tools are better suited to analysis or expansion of the data with input from other sources, but within our own data SQL wins.
It's only trivial if you already know it's prime. Determining that is non-trivial enough that a tractable deterministic algorithm wasn't devised until 2002, and its time complexity is thought to be the sixth power of log(n = digits).
> When given the same problem, a quantum computer should be able to trounce any supercomputer in any problem in terms of speed and efficiency
LOL, no, not any problem, far from it. Some problems, rather specific ones, such as prime factoring.
> Our current system, for example, taps into electrons and cleverly-designed chips to perform their functions. Quantum computers are similar, but they rely on alternative particle physics.
Um, no, they both rely on the same physics, that is a combination of Quantum Mechanics and electromagnetism. Note to the author: an electron is a quantum system, and classical electronics definitely rely on that.
So yes, quantum computers are overhyped, through no faults of their own, and this article contributes to the trend.
I made the same remark in reply to another comment which used the phrase "factoring primes" :) Wikipedia does use the term "prime factorization": that seems legit to me, as prime is used as an adjective. https://en.wikipedia.org/wiki/Integer_factorization
Another legitimate meaning might be "factoring probable primes" (or "candidate primes" as they are sometimes called in key generation/cryptanalysis), or possibly "factoring semiprimes".
Both of those phrases could be referred to as "prime factorisation" in a not-entirely-accurate-but-unambiguous-in-context shorthand.
> LOL, no, not any problem, far from it. Some problems, rather specific ones, such as prime factoring.
Yeah, as someone who works in quantum computing this is the hardest thing for me to explain to non-technical people. For technical people, I liken it to a FP unit or some other specialized coprocessor that's often embedded in CPU/GPUs.
> Quantum computers are similar, but they rely on alternative particle physics.
I think it's fair to say this in reference to using different physical properties of electrons than what normal computers use. The physics rules are the same, but how you manipulate them is different, presumably (I don't know much of how photonic QCs work)
I never thought of it that way for some reason. Always imagined mature quantum computers as being their own system. But it's possible a lot of them will be supplementary components to a classical computer. We have storage-over-PCIe, graphics-over-PCIe, and soon quantum-over-PCIe?
It'll be a long time before they need remotely comparable bandwidth, and more than likely the latency on higher level protocols won't even be near. PCIe would work fine, but so would old school serial.
That seems unlikely to happen in the near to medium term. For that to happen, everything would have to be rewritten using a quantum algorithm and language, and run on quantum hardware. Imagine writing a web browser in a quantum language, within a quantum computing software ecosystem. It's hard to see how that would have any benefit.
If you are talking 100 years out, though, who knows?
Yes it's overhyped, but to be fair, the whole point of classical electronics is to hide the quantum nature as much as possible. You want your transistor to act as a deterministic switch, not be in a superposition of states.
Well, to be also fair, we also want quantum electronics to be deterministic in their behaviour. The difference lies not so much in randomness as in leveraging intrication.
> Some problems, rather specific ones, such as prime factoring.
This is absolutely how we understand the technology now, but I think it's worth noting that computing luminaries also thought "640Kb of memory was more than enough for anyone" and that "eight mainframe computers will serve the computing needs of everyone across the planet" at one point in time, too. Quantum computers are definitely overhyped and that may be all they're good for, but it's also possible we'll figure out how to do some crazy shit with them in the future, too.
> the annual worldwide energy usage of blockchain technology is roughly equal to the annual US energy waste from machines plugged in while in standby mode. It is also significantly lower than the annual worldwide usage of Christmas lights, and wash dryers.
... and it benefits much, much fewer people.
That's analogous to the pro-air-travel disinformation argument: "air travel is only 2% of CO2 emissions". It is only so because air travel is only adopted by a small minority of people: that doesn't stop it from being insanely carbon-intensive.
April fool's joke : we'll pretend that bugs in software are distributed as a homogeneous Poisson process, AND that Poisson distributions are bounded, while we're at it.
> We are profoundly uninterested in claims that these measurements, of a few tiny programs, somehow define the relative performance of programming languages aka Which programming language is fastest?
Now, I challenge you to find a major bloated software where the main source of overhead is Python interpretation. IME it's always something else, like the surrounding UI framework.
The Office suite is written in C++ and is badly bloated, obviously not because of language execution overhead but because of technical debt, which if that's any indication recommends against using low-level languages.
Performing a legitimate performance benchmark of even one piece of enterprise Python software — much less across a representative survey — is well beyond the reasonable scope of a comment board reply.
It was, but the author can be forgiven for mixing them up. Knuth did say it originally, but Hoare repeated it (in writing), properly attributing it to Knuth. Knuth then read Hoare's quote, missed the attribution, forgot that he was the one who said it, and repeated it again in writing, mis-attributing it to Hoare.
> Every programmer with a few years' experience or education has heard the phrase "premature optimization is the root of all evil." This famous quote by Sir Tony Hoare (popularized by Donald Knuth) has become a best practice among software engineers.
I wrote the blog post you cited (thanks!) but I disagree with both statements: that is not what is meant in the article.
1. I don't think Event Sourcing sucks - I think we are lacking accessible technology for supporting it.
2. For most difficulties encountered in Event Sourcing, I would rather blame distribution taken to the extreme.