> This generates Answers which are sequenced according to "frequency-guided heur...

mjburgess · 2025-01-29T16:13:24 1738167204

You've just given me the heuristic, and told me the graph -- you haven't said A* is a bad model, you've said it's exactly the correct one.

However, transformers do not sit on a "knowledge graph", since the space is not composed of discrete propositions set in discrete relationships. If it were, then P(PrevState|NextState) = 0 would obtain for many pairs of states -- this would destroy the transformers ability to make progress.

So rather than 'deviation from the truth' being an accidental symptom, it is essential to its operation: there can be no distinction-making between true/false propositions for the model to even operate.

> making fatal logical errors such as saying "P" and "!P"

Since it doesn't employ propositions directly, how you interpret its output in propositional terms will determine if you think it's saying P&!P. This "interprerting-away" effect is common in religious interpretations of texts where the text is divorced from its meaning, a new one substituted, to achieve apparent coherence.

Nevertheless, if you're asking (Question, Answer)-style prompts where there is a cannonical answer to a common question, then you're not really asking it to "search very far away" from its inlined historical data (the ersatz knowledge-graph that it does not possess).

These errors become more common when the questions require posing several counterfactual scenarios derived from the prompt or otherwise have non-cannonical answers which require integrating disparate propositions given in a prompt.

The prompt's propositions each compete to drag the search in various directions, and there is no constraint on where it can be dragged.

nurettin · 2025-01-31T07:18:22 1738307902

I am not going to engage with your A* proposition. I believe it to be irrelevant.

> However, transformers do not sit on a "knowledge graph", since the space is not composed of discrete propositions set in discrete relationships.

This is the main point of contention. By all means, embeddings are a graph, as you can use a graph to represent its datastructure, but not a tree. Sure, they are essentially points in space, but a graph emerges as the architecture starts selecting tokens for use according to the learned parameters during inference. It will always be the same graph for the same set of tokens for a given data set which provides "ground truth". I know it sounds metaphoric but bare with me.

The above process doesn't result in discrete propositions like we have in prolog, but the point is, it is "relatively" meaningful, and you seed a traversal by bringing tokens to the attention grid. What I mean by relatively meaningful is that inverse relationships are far enough that they won't usually be confused, so there is less chance of meaningless gibberish emerging which is what we observe.