Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Show HN: Subsequential Transducer for Efficient Text Rewriting (github.com/deniskyashif)
45 points by deniskyashif on July 5, 2017 | hide | past | favorite | 9 comments


Could someone explain why this is better than simple string substitution? Is regular string substitution much slower than this?


I am not yet 100% certain so I requested full-text access to read the paper, but I can say that a naive approach to string substitution will certainly be less performant than this, and offer less flexibility in replacement complexity without significant effort.

The reason being that in a naive approach, a vocabulary of size M and a document of token size N is an O(m*n) operation in the best case - while this is claimed to be an operation of O(2n).


Came here to ask the same question, I'm not clear on what functionality this offers beyond a simple k-v substitution.


I’ve wanted this specifically for terms and conditions documents for a long while... any thoughts on it’s applicability to this subject; so many legal document sections are the same but you can’t just copy them.


What would it do that some basic templating library wouldn't do?

Not knowing the specifics, I would lean toward a template library, as it would be more precise, and could account for things like changing gender, plurality, a vs an, etc, in a structured way.


What specifically do you want to do with these documents?


If it meets the guidelines, this might make a good 'Show HN'. Show HN guidelines: https://news.ycombinator.com/showhn.html


Interesting project. Question on this: "For any text t of length |t| the time it takes to perform a rewrite is O(|t|+|t'|) where t' denotes the resulting output string"

Wouldn't the vocabulary size fit into the order complexity? Vocabs that would be considered useful in this context tend to be quite large. Are you achieving worst case logn of search in the vocab?


Hi, thanks! The outputs are encoded in the transitions and the states themselves, so the vocabulary size doesn't affect the search. It affects, however, the construction because for each node in the trie we have to add an outgoing transition for each distinct symbol of the input vocabulary.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: