I'd bet the last problem (if you can prove that N^2 is a lower bound for the las...

bonzini · on April 13, 2023

The space of the solutions is n^3, since you have up to n items and each of them has n^2 possibilities. An O(n^2) solution only looks at each possible item in the output a constant number of times.

To prove that it's optimal you "only" need to construct the pathological case, which I believe should not be hard to do by induction.

I think that pathological case could simply be that the input string is abcdefghij... and the dictionary contains all one-letter strings, plus all strings like ac, abd, abce, ..., bd, bce, bcdf ... and so on.

The base induction case would be xyz as the input and {x,y,z,xz} as the dictionary.

Gehinnn · on April 14, 2023

That's not how you show lower bounds.

The problem is function f that gets a list of words D and a word w of length n and returns a list of indices I where to split w such that each part is in D or "false".

Cleary, I has length at most n and the indices in it are also at most n. Also, you need to look at every word in D. Thus, the trivial lower bound of the complexity for an algorithm that computes f is O(n+|D|).

To prove a tighter lower bound, you would need to show that every algorithm that does it in in linear time has a bug - you need to find this "pathological" case for every imaginable algorithm!

Afaik it is still unknown of SAT has a quadratic lower bound! https://cstheory.stackexchange.com/questions/17789/is-there-...

jekub · on April 13, 2023

Nop, the article is false about this. The optimal algorithm is O(n) as you can make a simple regular expression that is the Kleene star of an or of all the words in your dictionary to match the input. This is allowed as you can choose how to implement the dictionary, it don’t take more space than a trie and match in linear time as it can be compiled to an NFA or a DFA.

bonzini · on April 13, 2023

The DFA is O(n) with up to exponential time spent upfront (and exponential space usage too). The exact complexity for the NFA with the regex you suggest depends on how the NFA is built, but it is at least O(n^2) worst case because you can easily build a case with n active states. Likewise for the lazy DFA.