Seems like in this differentiable FSM implementation you only differentiate edge...

uoaei · on June 8, 2022

But if you add a huge number of nodes and put regularization on the sparsity of the edge weights, you have essentially a model which can adapt to problems using subsets of its structure. Somewhat like the "lottery ticket" hypothesis in NN theory. Then you can think of each traversal choice as a conditional branch and voila, the program runs!

You can see this in effect actually in the article when the author uses the penalty on the entropy of the transition probabilities.

jiggawatts · on June 9, 2022

Something that just occurred to me: finite state machines don't have to be a single "flat" graph. There are hierarchical/nested variations. For example, imagine a factory with many identical machines, where each machine has the same FSM for its operation, and then there's a FSM for the factory as a whole, modelling the interactions of the machines.

I wonder, I wonder, I wonder... could there be a differentiable and deeply nested FSM hierarchy model that -- like deep learning for neural nets -- can solve qualitatively different categories of problems compared to a "plain" FSM? As in, just as deep learning revolutionised ML, I wonder if there is an equivalent "deep FSM learning".

It smells like there's something fundamental there: Take "any" simple, differentiable systems like a layer of neural nets, an FSM, or whatever... and then make it "deep". Similarly, I wonder if all of the other concepts from NNs could have equivalents for similar differentiable systems, such as convolution, etc...

nshm · on June 12, 2022

The standard operation for FSM to compose knowledge sources is composition. You can compose different level graphs and differentiate them too. Awni's paper talks about that https://openreview.net/pdf?id=MpStQoD73Mj. The problem is that you need to differentiate graph structure, not just weights. This is a problem since the weights are inherently non-continuous.