Hacker Newsnew | past | comments | ask | show | jobs | submit | stabbles's commentslogin

It should be possible to better explain in IKEA style how to perform partitioning with swapping. In its current form it can make people fall into a quadratic complexity trap.

For Python (or PyPI) this is easier, since their data is available on Google BigQuery [1], so you can just run

    SELECT * FROM `bigquery-public-data.pypi.distribution_metadata` ORDER BY length(version) DESC LIMIT 10
The winner is: https://pypi.org/project/elvisgogo/#history

The package with most versions still listed on PyPI is spanishconjugator [2], which consistently published ~240 releases per month between 2020 and 2024.

[1] https://console.cloud.google.com/bigquery?p=bigquery-public-...

[2] https://pypi.org/project/spanishconjugator/#history


Regarding spanishconjugator, commit ec4cb98 has description "Remove automatic bumping of version".

Prior to that commit, a cronjob would run the 'bumpVersion.yml' workflow four times a day, which in turn executes the bump2version python module to increase the patch level. [0]

Edit: discussed here: https://github.com/Benedict-Carling/spanish-conjugator/issue...

[0] https://github.com/Benedict-Carling/spanish-conjugator/commi...


i love the package owner’s response in that issue xD

Tangential, but I've only heard about BigQuery from people being surprised with gargantuan bills for running one query on a public dataset. Is there a "safe" way to use it with a cost limit, for example?

Yes you can set price caps. The cost of a query is understandable ahead of time with the default pricing model ($6 per TB of data processed in a query). People usually get caught out by running expensive queries recursively. BigQuery is very cost effective and can be used safely.

You can tell someone has worked in the cloud for too long when they start to think of $6 per database query as a reasonable price.

We really need to go back to on-premise. We have surrendered our autonomy to these megacorps and now are paying for it - quite literally in many cases.

Surely most queries should process much less than 1 TB of data?

My 3TB, 41 billion row table costs pennies to query day to day. The billing is based on the data processed by the query, not the table size. I pay more for storage.

Running ripgrep on my harddrive would cost me $48 at that price point.

BigQuery data is stored (I assume) in column oriented files with indices, so a typical query reads only a tiny fraction of the stored data.

Can you actually set "price caps"?

Most of the cloud services allow you to set alerts that are notorious for showing up after you've accidentally spend 50k USD. So even if you had a system that automated shutdown of services when getting the alert, you are SOL.


You can also query for free at clickpy.clickhouse.com. If you click on any of the links on the visuals you can see the query used.

The underlying dataset is hosted at sql.clickhouse.com e.g. https://sql.clickhouse.com/?query=U0VMRUNUIGNvdW50KCkgICBGUk...

disclaimer: built this a a while ago but we maintain this at clickhouse

oh and rubygems data is also there.


Here [0] is the partial query on the ClickHouse dataset, with different results due to a quota error [1].

[0] https://sql.clickhouse.com?query=U0VMRUNUIHByb2plY3QsIE1BWCh...

[1] Quota read limit exceeded. Results may be incomplete.


We have mvs you can use to avoid this

https://sql.clickhouse.com/?query=U0VMRUNUIHByb2plY3QsIE1BWC...

takes 0.1s


I decided my life could not possibly go on until I knew what "elvisgogo" does, so I downloaded the tarball and poked around. it's a pretty ordinary numpy + pandas + matplotlib project that makes graphs from csv. one line jumped out at me: str_0 = ['refractive_index','Na','Mg','Al','Si','K','Ca','Ba','Fe','Type'] the university of st. andrews has a laser named "elvis" that goes on a remote controlled submarine: https://www.st-andrews.ac.uk/~bds2/elvislaser.htm I was hoping it'd be about go-go dancing to elvis music, but physics experiments on light in seawater is pretty cool too.

> spanishconjugator [2], which consistently published ~240 releases per month between 2020 and 2024

They also stopped updating major and minor versions after hitting 2.3 in Sept 2020. Would be interesting to hear the rationale behind the versioning strategy. Feels like you might as well use a datetimestamp for the version.


deps.dev has a similar bigquery dataset across a couple more languages if someone wanted to do analysis across the other ecosystems they support.

> - The fact that it's essentially unstructured data makes it hard to work with generically. If you have a username + password and need to use those in a script, you'll need to implement your own parser in your shell language in every script you need it in.

Fair, but you can use your own conventions.

> - `pass generate` to generate new passwords, maybe thanks to the above, replaces everything in the pass value by default. So if you had e.g. a password + secret question answers, if you use `generate` to get a new password it'll wipe out your secret question answers.

Just split it into `site/pass`, `site/secret-question`, etc. The fact that it's just using a directory tree is quite nice.

> It's very difficult to review history. I stopped using it a while ago, but since everything's encrypted `git diff` won't give you anything useful

`git diff` would be an odd command to run on generated passwords even without encryption. What matters is that you know when the last change was for a password or site with `git log <file/dir>`, and you can just `git checkout -d <old commit sha>` if needed.

> - The name makes it nearly impossible to search for

in the terminal `$ pass` typically suggests the associated package.


I assume they mean "search the web for", which is definitely a problem I've faced in the passt.

`pass git diff` decrypts the passwords for me.

I can recommend this. I'm sure it's a bug in the YouTube interface that they recommend literally nothing. My home screen has been completely empty for over a year, just a message saying "Your watch history is off". I have a couple subscriptions, which means a new video every so many days, which appear in the side bar, and they're still two or three clicks away, and that's perfect.


Not a bug, it was an explicit change they made about a year ago. I used to enjoy the recommendations based on my likes and they took that away.


It's not a bug, it's extremely passive aggressive. They couple it with rewriting their browser, working group recommendations, and legal lobbying to make shoving ads down your throat their basic "human" right. When I saw they did it to me, my response was, "great, game on".


In my experience it's better at lower level stuff, like systems programming. A pass afterwards with claude makes the code more readable.


Does it include a decent BLAS? If I remember correctly R ships with reference BLAS, but for decent performance you need something external. Wonder what they picked for wasm based R.


Probably uses LLVM Flang to make the Fortran parts happen, compiling reference BLAS and LAPACK. As the main dev for WebR is also the one who did this [0].

[0] https://gws.phd/posts/fortran_wasm/


I wonder what kind of edge cases you deal with when blas is your bottleneck in R. Stan code aside, I’ve seen few problems that are neither instant (i.e. sub hours) nor impossible (I.e years of compute).


Since you mentioned Stan, feels relevant to mention https://stan-playground.flatironinstitute.org/, which lets you run Stan in WASM and analyze the results using WebR


Isn’t the linear algebra conventional wisdom that matrix ops are ALWAYS the bottleneck?

I’m sure this is true in scientific computing.

In R maybe a bunch of resampling would be expected to dominate?


Makes me think of the movie Inception: "I say to you, don't think about elephants. What are you thinking about?"


It reminds me of that old joke:

- "Say milk ten times fast."

- Wait for them to do that.

- "What do cows drink?"


But... cows do drink cow milk, that's why it exists.


You’re likely thinking of calves. Cows (though admittedly ambiguous! But usually adult female bovines) do not drink milk.

It’s insidious isn’t it?


If calves aren’t cows then children aren’t humans.


No, you're thinking of the term "cattle". Calves are indeed cattle. But "cow" has a specific definition - it refers to fully-grown female cattle. And the male form is "bull".


Have you ever been close enough to 'cattle' to smell cow shit, let alone step in it?

Most farmers manage cows, and I'm not just talking about dairy farmers. Even the USDA website mostly refers to them as cows: https://www.nass.usda.gov/Newsroom/2025/07-25-2025.php

Because managing cows is different than managing cattle. The number of bulls kept is small, and they often have to be segregated.

All calves drink milk, at least until they're taken from their milk cow parents. Not a lot of male calves live long enough to be called a bull.

'Cattle' is mostly used as an adjective to describe the humans who manage mostly cows, from farm to plate or clothing. We don't even call it cattle shit. It's cow shit.


So, this joke works only for natives who know that calf is not cow.


I guess a more accessible version would be toast… what do you put in a toaster?


Here's one for you:

A funny riddle is a j-o-k-e that sounds like “joke”.

You sit in the tub for an s-o-a-k that sounds like “soak”.

So how do you spell the white of an egg?

// All of these prove humans are subject to "context priming".


My brain said "y" and then I caught myself. Well done!

(I suppose my context was primed both by your brain-teaser, and also the fact that we've been talking about these sorts of things. If you'd said this to me out of the blue, I probably would have spelled out all of "yolk" and thought it was correct.)


Notably, this comment kinda broke my brain for a good 5 seconds. Good work.


Well, it works because by some common usages, a calf is a cow.

Many people use cow to mean all bovines, even if technically not correct.


Not trying to steer this but do people really use cow to mean bull?


No one who knows anything about cattle does, but that leaves out a lot of people these days. Polls have found people who think chocolate milk comes from brown cows, and I've heard people say they've successfully gone "cow tipping," so there's a lot of cluelessness out there.


> Many people use cow to mean all bovines, even if technically not correct.

Come on now :0

I just complained non-natives would have a problem distinguishing between a cow and a calf, and you had to bring those bovines.

To make it easier, would just drop that in my native language, the correct term for bovine is more used to describe people with certain character, that animal kind.


Colloquially, "cow" can mean a calf, bull, or (female adult) cow.

It may not be technically correct, but so what? Stop being unnecessarily pedantic.


In this context it is literally the necessary level of pedantic yes?


Surprisingly it's not fixed in the meantime. Maybe they were being honest.


It sounds interesting, but I think it's better if a linker could resolve dependencies of static libraries like it's done with shared libraries. Then you can update individual files without having to worry about outdated symbols in these merged files.


If you mean updating some dependency without recompiling the final binary, that's not possible with static linking.

However the ELF format does support complex symbol resolution, even for static objects. You can have weak and optional symbols, ELF interposition to override a symbol, and so forth.

But I feel like for most libraries it's best to keep it simple, unless you really need the complexity.


Much of the dynamic section of shared libraries could just be translated to a metadata file as part of a static library. It's not breaking: the linker skips files in archives that are not object files.

binutils implemented this with `libdep`, it's just that it's done poorly. You can put a few flags like `-L /foo -lbar` in a file `__.LIBDEP` as part of your static library, and the linker will use this to resolve dependencies of static archives when linking (i.e. extend the link line). This is much like DT_RPATH and DT_NEEDED in shared libraries.

It's just that it feels a bit half-baked. With dynamic linking, symbols are resolved and dependencies recorded as you create the shared object. That's not the case when creating static libraries.

But even if tooling for static libraries with the equivalent of DT_RPATH and DT_NEEDED was improved, there are still the limitations of static archives mentioned in the article, in particular related to symbol visibility.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: