^F intern -> not found. But string interning is what they're doing, isn't it? > ...

masklinn · 2026-02-02T08:33:24 1770021204

String interning only stores whole strings, dictionary compression stores substrings. Essentially string interning is a trivial subset of dictionary compression (because you don’t need a substringing scheme, which is the hard part).

snackbroken · 2026-02-02T08:41:39 1770021699

Doesn't interning usually refer to when you only consider identical copies, as opposed to dictionary compression where you allow for concatenations? E.g.

  Interning:
  1: "foo"
  2: "bar"

  my_string = "foo" // stored as ref->1
  my_other_string = "foobarbaz" // not found & too long to get interned, stored as "foobarbaz"

  Dictionary compression:
  1: "foo"
  2: "bar"
  
  my_string = "foo" // stored as ref->1
  my_other_string = "foobarbaz" // stored as ref->1,ref->2,"baz" (or ref->1,ref->2,ref->3 and "baz" is added to the dict)

Someone · 2026-02-02T11:52:09 1770033129

  my_string = "foo" // stored as ref->1

Only if you explicitly intern the string. Interning can be expensive because

- the runtime has to check whether the string already is interned.

- you’re permanently creating the string, so if it isn’t reused later, that means your memory usage needlessly goes up.

Both get more expensive the more strings you intern.

I think interning is a hack that very rarely, if ever should be used. It likely won’t work well for large strings, as these tend to be unique, and use cases where it helps for shirt strings often are better handled by using enums or symbols, or by using a custom set of strings. If you do the latter, you have more control over Emory usage; you can do such things such as removing the least recently used strings, or ditching the entire cache when you’re done needing it (prime example: parsing a large XML file with many repeated nodes)

masklinn · 2026-02-02T16:58:22 1770051502

> you’re permanently creating the string, so if it isn’t reused later, that means your memory usage needlessly goes up.

Nowadays lots of runtime will GC interned strings to avoid this, this can mean churn in the interning table but avoids mis-estimations bloating the process too much.

latch · 2026-02-02T10:10:47 1770027047

Makes me think of an iolist in erlang/elixir.

0928374082 · 2026-02-02T06:12:17 1770012737

> rediscovered/reinvented and renamed

It's not old yet; it'll be old once it's been renamed two or three times...