More

ngoldbaum · 2025-08-12T15:27:35 1755012455

I wonder why the github status page has an atlassian cookie request pop-up.

mbb70 · 2025-08-12T15:40:24 1755013224

The status page is a SaaS product called StatusPage, acquired by Atlassian in 2016.

ngoldbaum · 2025-05-07T20:10:23 1746648623

I gave away the “ty” project name on pypi to Astral a week or so ago. I wanted to use it for a joke a few years ago but this is a much better use for a two letter project name. They agreed to make a donation to the PSF to demonstrate their gratefulness.

_carljm · 2025-05-08T04:30:59 1746678659

Yes, thank you for your graciousness and generosity, very much appreciated.

Celeo · 2025-05-07T20:47:42 1746650862

I love this outcome; kudos to you and Astral both!

swyx · 2025-05-08T00:13:53 1746663233

thanks for not charging obnoxious amounts for package names!

smitty1e · 2025-05-08T00:34:04 1746664444

ty--thank you

rrszynka · 2025-05-08T07:12:06 1746688326

nice! what was the planned joke?

ngoldbaum · 2025-05-08T14:46:57 1746715617

Either something about beanie babies or something riffing on "thank you". Couldn't ever make up my mind then basically forgot about it.

ngoldbaum · on Jan 17, 2025

You could do two passes over the string, first get the total length in bytes, then fill it in codepoint by codepoint.

You could also pessimistically over-allocate assuming four bytes per character and then resize afterwards.

With the API in the linked blog post it's up to the user to decide how they want to use the output [u8;4] array.

ngoldbaum · on Oct 24, 2024

The main difference is the strings are stored in a single contiguous arena buffer (with some minor caveats if you mutate the array in-place). With object strings each string has its own heap allocation.

More details in NEP 55: https://numpy.org/neps/nep-0055-string_dtype.html

This post is based on the content of a 25 minute talk and it’s hard to explain everything fully…

ngoldbaum · on Oct 24, 2024

Ah I could see how that’s confusing. I was trying to indicate that the size stored for the string in the example is 28, but it’s stored in a 64 bit uint.

ngoldbaum · on Oct 24, 2024

They’re stored on the DType instance. This requires that there’s only one DType instance per “owned” array buffer, which I figured out how to do along with Sebastian Berg and others using the new DType system.

ngoldbaum · on Oct 23, 2024

This was a case of convergent evolution, both projects ended up working simultaneously on similar ideas.

One issue with using Arrow directly in NumPy is PyArrow exposes an immutable 1D array, while NumPy exposes a mutable ND array.

See also https://numpy.org/neps/nep-0055-string_dtype.html#related-wo...

hopfenspergerj · on Oct 23, 2024

Are the pandas people considering this as the default string type? Seems like it would be a slam dunk.

ngoldbaum · on Oct 24, 2024

That is something I’d like to see but I don’t want to wade into the already very complicated discussion around arrow strings in pandas. If a Pandas developer wanted to take this on I think that would make things easier since there’s so much complexity around strings in Pandas.

That said there is a branch that gets most of the way there: https://github.com/pandas-dev/pandas/pull/58578. The remaining challenges are mostly around getting consensus around how to introduce this change.

If NumPy had StringDType in 2019 instead of 2024 I think Pandas might have had an easier time. Sadly the timing didn’t quite work out.

ngoldbaum · on Oct 23, 2024

Well, there was no concept of sidecar storage. Now we have the hack we came up with for StringDType to store data on the DType instance and also make it so StringDType arrays don't share StringDType instances, unless the array is a view.

EDIT: looking back at the NEP, I'm not sure it does a great job explaining exactly how the per-array descriptor works. Ultimately it's powered by a hook in the DType API: https://github.com/numpy/numpy/pull/24988. There is only one spot in NumPy where array buffers are allocated, so we hooked there and made sure any arrays with newly allocated buffers get a new DType instance.

ngoldbaum · on Oct 23, 2024

Thank you this really means a lot.

ngoldbaum · on Oct 2, 2024

This style of mutex will also power PyMutex in Python 3.13. I have real-world benchmarks showing how much faster PyMutex is than the old PyThread_type_lock that was available before 3.13.

electroglyph · on Oct 2, 2024

Can I use PyMutex from my own Python code?

ngoldbaum · on Oct 3, 2024

No, it wouldn’t make sense to. Use threading.Lock for that. PyMutex is available in the CPython C API.

miohtama · on Oct 2, 2024

Any rough summary?

ngoldbaum · on Oct 2, 2024

https://github.com/numpy/numpy/issues/26510#issuecomment-229...

And now that I look at that again I realize I forgot to finish that up!

kagrt · on Oct 3, 2024

I wonder how much it will help in real code. The no-gil build is still easily 50% slower and the regular build showed a slowdown of 50% for Sphinx, which is why the incremental garbage collector was removed just this week.

Python development is in total chaos on all social and technical fronts due to incompetent and malicious leadership.

plesner · on Oct 3, 2024

I'm very ready to believe your description of the state of python is true but I've been out of the loop on python for a while. I'm interested in more details. Can you expand or point to any articles that give more details?