Hacker Newsnew | past | comments | ask | show | jobs | submit | ngoldbaum's commentslogin

I wonder why the github status page has an atlassian cookie request pop-up.


The status page is a SaaS product called StatusPage, acquired by Atlassian in 2016.


I gave away the “ty” project name on pypi to Astral a week or so ago. I wanted to use it for a joke a few years ago but this is a much better use for a two letter project name. They agreed to make a donation to the PSF to demonstrate their gratefulness.


Yes, thank you for your graciousness and generosity, very much appreciated.


I love this outcome; kudos to you and Astral both!


thanks for not charging obnoxious amounts for package names!


ty--thank you


nice! what was the planned joke?


Either something about beanie babies or something riffing on "thank you". Couldn't ever make up my mind then basically forgot about it.


You could do two passes over the string, first get the total length in bytes, then fill it in codepoint by codepoint.

You could also pessimistically over-allocate assuming four bytes per character and then resize afterwards.

With the API in the linked blog post it's up to the user to decide how they want to use the output [u8;4] array.


The main difference is the strings are stored in a single contiguous arena buffer (with some minor caveats if you mutate the array in-place). With object strings each string has its own heap allocation.

More details in NEP 55: https://numpy.org/neps/nep-0055-string_dtype.html

This post is based on the content of a 25 minute talk and it’s hard to explain everything fully…


Ah I could see how that’s confusing. I was trying to indicate that the size stored for the string in the example is 28, but it’s stored in a 64 bit uint.


They’re stored on the DType instance. This requires that there’s only one DType instance per “owned” array buffer, which I figured out how to do along with Sebastian Berg and others using the new DType system.


This was a case of convergent evolution, both projects ended up working simultaneously on similar ideas.

One issue with using Arrow directly in NumPy is PyArrow exposes an immutable 1D array, while NumPy exposes a mutable ND array.

See also https://numpy.org/neps/nep-0055-string_dtype.html#related-wo...


Are the pandas people considering this as the default string type? Seems like it would be a slam dunk.


That is something I’d like to see but I don’t want to wade into the already very complicated discussion around arrow strings in pandas. If a Pandas developer wanted to take this on I think that would make things easier since there’s so much complexity around strings in Pandas.

That said there is a branch that gets most of the way there: https://github.com/pandas-dev/pandas/pull/58578. The remaining challenges are mostly around getting consensus around how to introduce this change.

If NumPy had StringDType in 2019 instead of 2024 I think Pandas might have had an easier time. Sadly the timing didn’t quite work out.


Well, there was no concept of sidecar storage. Now we have the hack we came up with for StringDType to store data on the DType instance and also make it so StringDType arrays don't share StringDType instances, unless the array is a view.

EDIT: looking back at the NEP, I'm not sure it does a great job explaining exactly how the per-array descriptor works. Ultimately it's powered by a hook in the DType API: https://github.com/numpy/numpy/pull/24988. There is only one spot in NumPy where array buffers are allocated, so we hooked there and made sure any arrays with newly allocated buffers get a new DType instance.


Thank you this really means a lot.


This style of mutex will also power PyMutex in Python 3.13. I have real-world benchmarks showing how much faster PyMutex is than the old PyThread_type_lock that was available before 3.13.


Can I use PyMutex from my own Python code?


No, it wouldn’t make sense to. Use threading.Lock for that. PyMutex is available in the CPython C API.


Any rough summary?


https://github.com/numpy/numpy/issues/26510#issuecomment-229...

And now that I look at that again I realize I forgot to finish that up!


I wonder how much it will help in real code. The no-gil build is still easily 50% slower and the regular build showed a slowdown of 50% for Sphinx, which is why the incremental garbage collector was removed just this week.

Python development is in total chaos on all social and technical fronts due to incompetent and malicious leadership.


I'm very ready to believe your description of the state of python is true but I've been out of the loop on python for a while. I'm interested in more details. Can you expand or point to any articles that give more details?


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: