At large enough ngram size there would be very few collisions. You can take for example this text and try in Google with quotes, it won't find anything matching exactly.
I tested this 6-gram "it won't find anything matching exactly", no match. Almost anything we write has never been said exactly like that before.
This approach is probably inadequate. In my line of (NLP) research I find many things have been said exactly many, many times over.
You can try this out yourself by grouping and counting strings using the many publically available Bigquery corpora for various substring lengths and offsets, e.g. [0-16]; [0-32]; [0-64] substring lengths at different offsets.