I wondered initially if this was a winamp port for older macs.
It requires macOS 13.0 (High Sierra, 2017) or later, which is several releases after it stopped being called OS X. 10.11 (El Capitan, 2015) was the last OS X.
(I personally would accept someone referring to High Sierra as “OS X” because it’s still version 10 of the Macintosh OS, even if Apple dropped that branding a few years earlier.)
Not that I doubt the value of the work, and the reasoning of its performance directly affecting common operations makes intuitive sense, but I would have liked to hear more about what concrete problems were being solved. Was there any interesting data across the V8 ecosystem about `JSON.stringify` dominating runtimes?
it doesn't need to dominate run times when it's being called by hundreds of millions of pages every day. The power saving worldwide will be considerable.
When I read this piece, apart from the generally terrible writing, the poor spelling and grammar, I initially wondered if it was a spoof from GPT-3 or some other bot.
When I decided it wasn't, I checked the dateline, as it seemed likely to have been written a decade or so ago.
I think this is a great first stab at the problem, but for two reasons I think a robust solution needs more work:
- The first is that, as someone else pointed out, Google is almost certainly logging your translation queries.
- Secondly, even if you do it offline (as someone else suggested) the approach itself might not work. Success in linguistic forensics isn't based (as we might naively assume) on catching obscure words that a particular individual has a tendency to overuse. It's based on subtle shifts in the relative frequency of functional words. Depending on the proximity of the source and target language, round-trip machine translation might not change this.
In forensic linguistics you typically measure a lot of metrics, not just word frequencies, use of punctuation and whitespace, sentence lengths and structures etc. Attribution also isn't the only use of forensic linguistics. You can also look at influences, deas, people, publications etc. For instance in order to infer something about the reader, analyze influence networks etc.
I got interested in forensic linguistics many years ago when an article in a somewhat shady publication mentioned me. I got curious and started reading anything I could find on the topic. I was eventually able to identify the author, but mostly by tricking him to admit it after I had a ranked list of candidates. He was second on a list of about 4-5 people (out of a candidate set of perhaps 300). Not half bad for the rather crude methods I used. I was rather pleased with myself.
I've used similar techniques later to look at influence networks in companies.
Translation history will soon only be available when you are signed in and will be centrally managed within My Activity. Past history will be cleared during this upgrade, so make sure to save translations you want to remember for ease of access later.
I guess you could skirt around this by using something to tag the various parts of speech in your original text (using something like Python's NLTK) and replace them with randomly picked synonyms from a thesaurus?
Pretty sure it would obscure the original writer although possibly at the cost of obscuring the original meaning.
I think what we’re concluding here is that using Google to obscure the linguistic style is flawed, because a state actor could obtain the original linguistic style from Google records, or from their own records of snooped traffic.
In other words: the blog should find a way to obscure linguistic style offline.
The title is perhaps missing "... for spoken and/or non-English sources, preferably not at all".
If we should stop using this test, what should we start using? In the author's comment on the study, they noted "There are ways to study linguistic complexity".
You can do a timed disable (like a pause) from the Pi-hole UI: I've done 5-minute disables to eg. get through purchase check-out flows with tight coupling to trackers.
I don't mean to be pompous, but they clearly are not very effective. Every single article on a city is written with a subtle advertorial tone, whether to promote tourism or simply give the city a positive reputation.
It requires macOS 13.0 (High Sierra, 2017) or later, which is several releases after it stopped being called OS X. 10.11 (El Capitan, 2015) was the last OS X.