More

itamarst · 2025-05-24T15:41:38 1748101298

That's great! Would also be cool (separately from Pydantic use case) to add jiter backend to ijson.

itamarst · 2025-05-23T00:31:20 1747960280

I talk about this more explicitly in the PyCon talk (https://pythonspeed.com/pycon2025/slides/ - video soon) though that's not specifically about Pydantic, but basically:

1. Inefficient parser implementation. It's just... very easy to allocate way too much memory if you don't think about large-scale documents, and very difficult to measure. Common problem with many (but not all) JSON parsers.

2. CPython in-memory representation is large compared to compiled languages. So e.g. 4-digit integer is 5-6 bytes in JSON, 8 in Rust if you do i64, 25ish in CPython. An empty dictionary is 64 bytes.

cozzyd · 2025-05-23T02:07:03 1747966023

Funny to see awkward array in this context! (And... do people really store giant datasets in json?!?).

chao- · 2025-05-23T04:52:36 1747975956

Often the legacy of an engineer (or team) who "did what they had to do" to meet a deadline, and if they wanted to migrate to something better post-launch, weren't allowed to allocate time to go back and do so.

At least JSON or CSV is better than the ad hoc homegrown formats you found at medium-sized companies that came out of the 90's and 00's.

ljm · 2025-05-23T08:53:43 1747990423

Some people even use AI-generated JSON as a semantic layer over their SQL.

jfb · 2025-05-23T03:41:55 1747971715

My sweet summer child

itamarst · 2025-05-22T20:42:14 1747946534

Once you switch to ijson it will not save any memory, no, because ijson essentially uses zero memory for the parsing. You're just left with the in-memory representation.

itamarst · 2025-05-22T19:36:43 1747942603

msgspec is much more memory efficient out of the box, yes. Also quite fast.

itamarst · 2025-05-22T19:10:32 1747941032

You can't add extra attributes that weren't part of the original dataclass definition:

  >>> from dataclasses import dataclass
  >>> @dataclass
  ... class C: pass
  ... 
  >>> C().x = 1
  >>> @dataclass(slots=True)
  ... class D: pass
  ... 
  >>> D().x = 1
  Traceback (most recent call last):
    File "<python-input-4>", line 1, in <module>
      D().x = 1
      ^^^^^
  AttributeError: 'D' object has no attribute 'x' and no __dict__ for setting new attributes

Most of the time this is not a thing you actually need to do.

masklinn · 2025-05-22T20:05:36 1747944336

Also some of the introspection stops working e.g. vars().

If you're using dataclasses it's less of an issue because dataclasses.asdict.

monomial · 2025-05-22T20:36:19 1747946179

I rarely need to dynamically add attributes myself on dataclasses like this but unfortunately this also means things like `@cached_property` won't work because it can't internally cache the method result anywhere.

franga2000 · 2025-05-23T06:38:35 1747982315

IIRC you can just include a __dict__ slot and @cached_property should start working again. I

itamarst · 2025-05-22T19:09:43 1747940983

For my PyCon 2025 talk I did this. Video isn't up yet, but slides are here: https://pythonspeed.com/pycon2025/slides/

The linked-from-original-article ijson article was the inspiration for the talk: https://pythonspeed.com/articles/json-memory-streaming/

tomrod · 2025-05-23T12:25:38 1748003138

I have a side question -- what did you use for slides?

itamarst · 2025-05-23T13:26:56 1748006816

https://remarkjs.com/

itamarst · 2025-05-22T19:07:37 1747940857

One key question in these sort of things is how free() works: it is given a pointer, and it has to decide whether this was sampled or not, with _minimum_ effort.

Poireau does this, IIRC, by putting the pointers it sampled in a different memory address.

Sciagraph (https://sciagraph.com), a profiler I created, uses allocation size. If an allocation is chosen for sampling the profiler makes sure its size is at least 16KiB. Then free() will assume that any allocation 16KiB or larger is sampled. This may not be true, it might be false positive, but it means you don't have to do anything beyond malloc_usable_size() if you have free() on lots and lots of small allocations. A previous iteration used alignment as a heuristic, so that's another option.

itamarst · 2024-10-02T11:44:57 1727869497

It's somewhat domain specific. Pure Python libraries have easier time supporting new releases than libraries that rely on C APIs, and even slower are those that deal with less stable implementation details like bytecode (e.g. Numba). But definitely getting better and faster every release.

itamarst · 2024-09-13T20:46:43 1726260403

It's much faster! There's been significant performance improvements since 3.8.

itamarst · on Feb 28, 2024

There's overhead in transferring data from CPU to GPU and back. I'm not sure how this works with internal GPUs, though, insofar as RAM is shared.

In general, though, as I understand it (not a GPU programmer) you want to pass data to the GPU, have it do a lot of operations, and only then pass it back. Doing one tiny operation isn't worth it.