Hacker Newsnew | past | comments | ask | show | jobs | submit | aghilmort's commentslogin

would poke around Vopson <> Verlinde or other primal / dual / gauge correspondences for space-time / gravity especially vs. info theory etc.,

https://pubs.aip.org/aip/adv/article/15/4/045035/3345217/Is-...


Really appreciate the pointers. I'll definitely look into them and use them as references.

connect screenless devices, e.g., Echo Dot extend weak wireless range in hotel screen share or network between multiple devices eg travel with two laptops and can virtual KVM only have to do the captive device on one - many hotels limit number of devices extra security buffer phone can't bridge wifi for headless like this etc etc

there’s decent work on computational reasoning power of transformers, SSMs, etc.

some approximate snippets that come to mind are that decoder-only transformers recognize AC^0 and think in TC^0, that encoder-decoders are strictly more powerful than decoder-only, etc.

Person with last name Miller iric if poke around on arXiv, a few others, been a while since was current top of mind so ymmv on exact correctness of above snippets


You are probably thinking of Merrill (whose work is referenced towards the end of the article).

ah yes Merrill thx!

interesting. like Excel Solver? or OpenSolver, Gurobi, other optimizers? or different objective?


Never used any of those, so I don't know! I'd be curious to read a comparison from anyone who knows about them.

I think what's pretty unique about the bidicalc solver that I made is that it does not depend on the previous input values to update backwards. It's truly solving the root finding problem. The advantage is that there are never any "stuck in a local optimum" problems with the solver. So you can solve difficult problems like polynomials, etc.


Excel Solver allows you to create target function with different variables and describe limits for them. Then you may try to find maximum, minimum or exact value for the target function.

link doesn’t work



awesome had thought about doing this / great to see will try!!!!!!!!!!!


wait why subscription?


Interesting. Modular manifolds are precisely what hypertokens use for prompt compiling.

Specifically, we linearize the emergent KVQ operations of an arbitrary prompt in any arbitrary model by way of interleaving error-correcting code (ECC).

ECC tokens are out-of-band tokens, e.g., Unicode's Private Use Area (PUA), interleaved with raw context tokens. This construction induces an in-context associate memory.

Any sort of interleaved labeling basis, e.g., A1, quick brown fox, A2, jumped lazy dog, induces a similar effect to for chaining recall & reasoning more reliably.

This trick works because PUA tokens are generally untrained hence their initial embedding is still random Gaussian w.h.p. Similar effects can be achieved by simply using token combos unlikely to exist and are often in practice more effective since PUA tokens like emojis or Mandarin characters are often 2,3, or 4 tokens after tokenization vs. codeword combos like zy-qu-qwerty every k content tokens, where can be variable.

Building attention architecture using modular manifolds in white / gray-box models like this new work shows vs. prompt-based black box injection is a natural next step, and so can at least anecdotally validate what they're building ahead of next paper or two.

Which is all to say, absolutely great to see others building in this way!


Wot? Is this what AI generated non-sense has come to? This is totally unrelated.


Nope. Construction induces ECC-driven emergent modular manifolds in latent space during KVQ maths. Can't use any ole ECC / crux why works. More in another reply.


The original article discusses techniques for constraining the weights of a neural network to a submanifold of weight space during training. Your comment discusses interleaving the tokens of an LLM prompt with Unicode PUA code points. These are two almost completely unrelated things, so it is very confusing to me that you are confidently asserting that they are the same thing. Can you please elaborate on why you think there is any connection at all between your comment and the original article?


Our ECC construction induces an emergent modular manifold during KVQ computation.

Suppose we use 3 codeword lanes every codeword which is our default. Each lane of tokens is based on some prime, p, so collectively forms CRT-driven codeword (Chinese Remainder Theorem). This is discretely equivalent to labeling every k tokens with 1x globally unique indexing grammar.

That interleaving also corresponds to a triple of adjacent orthogonal embeddings since those tokens still retain a random gaussian embedding. The net effect is we similarly slice the latent space into spaced chain of modular manifolds within the latent space every k content tokens.

We also refer to that interleaving as Steifel frames for similar reasons as the post reads etc. We began work this spring or so to inject that net construction inside the model with early results in similar direction as post described. That's another way of saying this sort of approach lets us make that chained atlas (wc?) of modular manifolds as tight as possible within dimensional limits of the embedding, floating point precision, etc.

We somewhat tongue-in-cheek refer to this as the retokenization group at the prompt level re: renormalization group / tensor nets / etc. Relayering group is the same net intuition or perhaps reconnection group at architecture level.


I'm sorry, but even if I am maximally charitable and assume that everything you are saying is meaningful and makes sense, it still has essentially nothing to do with the original article. The original article is about imposing constraints on the weights of a neural network, during training, so that they lie on a particular manifold inside the overall weight space. The "modular" part is about being able to specify these constraints separately for individual layers or modules of a network and then compose them together into a meaningful constraint for the global network.

You are talking about latent space during inference, not weight space during training, and you are talking about interleaving tokens with random Gaussian tokens, not constraining values to lie on a manifold within a larger space. Whether or not the thing you are describing is meaningful or useful, it is basically unrelated to the original article, and you are not using the term "modular manifold" to refer to the same thing.


hmm / hear you. my point wasn't that we are applying modular manifolds in the same way it was that we are working on model reliability from two extremal ends using the same principle. there are various ways to induce modular manifolds in model at various levels of resolution / power. we started at outside / working in level and so it works with any black-box model out of the box and zero knowledge needed, dont even need to know token dictionary to show effect.

We're already working on pushing construction deeper into model both architecture and training. currently that's for fine-tuning and ultimately full architecture shrinkage / pruning and raw training vs. just fine-tuning etc.

& it was just great to see someone else using modular manifolds even if they are using them at the training stage vs. inference stage. they're exploiting modular form at training, we're doing it at inference. cool to see.


oy, clicked thinking was Bell Inequality meets Schrondinger's cat post


switching models great best practice whether get stuck or not

can look at primal check the mean or dual get out of local minima

in all cases, model, tokenizer, etc is just enough different that will generally pay off in spaces quickly


read something new every day before going to bed

journal before you start your day

buy some sort of electric kettle


The fact that there's an entire country mostly unaware of the utility and ubiquitousness of a simple electric kettle, blows my mind. But then again, I'm a product of the Empire (British) not a North-American.

But while the idea of using a stove top kettle (have done so in the past) is fine, the thought of using a microwave to heat up a cup of water for tea seems abhorent. (although it's really not)

I guess it came about because 110V not being as efficient? Or more American's are coffee drinkers?


intern suggested years ago and now electric kettle pretty much first thing buy anytime stay somewhere longer than a couple of weeks if doesn’t already have


I won't be able to have enough sleep then.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: