Whelp this just ate up a night's worth of sleep... scripted something to automate finding the play area, pipe orientation and click the solution. https://www.youtube.com/watch?v=kIpyQRBuwt0
Yup, it finds the play area and starts clicking. I haven' tried the wrapped version yet, but i think the nonwrap is funner. Also I wouldn't mind a 50x50 .....
With 5 bits per letter, you have 6 symbols left over. We can use those to represent alternate pairs like "A or E", so you can encode BANDS and BENDS at the same time. Looks like if you pick the 6 highest frequency replacements for each starting letter, you can reduce the full word list size by ~2k words.
A naive lookup table for the replacements is 26 * 2 * 6 = 312 bytes.
Many times you don't even need to store the individual letters, just the pairings, and if you are permitted to prune out troublesome words from your dictionary, all the better.
> The tiny peak at the edge is probably because the edge between the two walls sometimes has white paint showing through, so it's not a lighting effect.
I vaguely remember an old graphics professor mentioning that this peak would exist because of reflected light from the other wall, though I can't find any literature after some light googling. Is this a real thing, or am I crazy?
Having grown accustom to MP3 artifacts, it's strange to hear artifacts that are natural, but just aren't quite right. More specifically, in the male voice sample: "sold about seventy-seven", I received it as "sold about sethenty-seven".
Yes, and "certificates" sounds like "certiticates".
Reminds me of a story about a copying machine that had a image compression algorithm for scans which changed some numbers on the scanned page to make the compressed image smaller. (Can't remember where I read about that, must have been a couple years ago on HN)
And yes, I think this is a relevant comparison. As the entropy model becomes more sophisticated, errors are more likely to be plausible texts with different meaning, and less likely to be degraded in ways that human processing can intuitively detect and compensate for.
My understanding of this fault was that it was a bug in their implementation of JBIG2, not the actual compression? Linked article seems to support this.
I think it was just overly aggressive settings of compression parameters. I don't see any evidence that the jbig2 compressor was implemented incorrectly. Source: [1]
Right. Jbig2 supports lossless compression. I'm not very familiar with the bug, but it could have been a setting somewhere in the scanner/copier that it was changed to lossy compression instead. Or they had lossy compression on by default or misconfigured some other way (probably a bad idea for text documents).
No. The bug was when using the "Scan to PDF" function. It happened on all quality settings. Copying (scanning+printing in one step, no PDF) was not effected.
I remember differently, but I don't want to pull up the source right now.
I did check some of the sources, but was not able to find the one I remember which had statistics on it.
The xerox FAQ to it does lead me to consider that I might be confusing this with some other incident though, as they claim that Scanning is the only thing that is affected.
This is a big rabbit hole of issues I'd never even considered before. Should we be striving to hide our mistakes by making our best guess, or make a guess, that if wrong, is easy to detect?
The algorithm detected similar patterns and replaced these with references. This lead to characters being changed into similar looking characters that also appeared on the page.
If we're abandoning accurate reproduction of sound and just making up anything that sounds plausible, there's already a far more efficient codec: plain text.
Assuming 150wpm and an average 2 bytes per word (with lossless compression), we get about 5bps, which makes 2400bps look much less impressive. Add some markup for prosody and it will still be much lower.
This codec also has the great advantage that you can turn off the speech synthesis and just read it, which is much more convenient than listening to a linear sound file.
If you have such a codec, it would be worth testing the word error rate on a long sample of audio. e.g. take a few hours of call centre recordings, pass them through each of {your codec, codec2}, and then have a human transcribe each of:
- the original recording
- the audio output from your proposed codec (which presumably does STT followed by TTS)
- the audio output from CODEC2 at 2048
Based on the current state of open source single-language STT models, I would imagine that CODEC2 would be much closer to the original. And if the input audio contains two or more languages, I cannot imagine the output of your codec will be useful at all.
Speech to text is certainly getting better but it makes mistakes. If the transcribed text was sent over the link and then a text to speech spoke at the other end you'd lose one of the great things about codec2 - the voice that comes out is recognisable as it sounds a bit like the person.
A few of us have a contact on Sunday mornings here in Eastern Australia and it's amazing how the ear gets used to the sound and it quickly becomes quite listenable and easy to understand.
Yeah, the main use case for codec2 right now is over ham radio. David Rowe, along with a few others, also developed a couple of modems and a GUI program[1]. On Sunday mornings, around 10AM, they do a broadcast of something from the WIA and answer callbacks.
What you might be able to do is your the text codec as the first pass, then augment the audio with Codec2 or so to capture the extra information (inflections, accent, etc...), for something in between 2 and 700bps.
One of the very few things I know about audio codecs is that they at least implicitly embody a "psychoaccoustic model". The "psycho" is crucial because the human mind is the standard that tells us what we can afford to throw away.
So a codec that agressively throws away data but still gets good results must somehow enbody sophisticated facts about what human minds really care about. Hence "artifacts that are natural".
In the normal codec2 decoding it sounds like "seventy" but muffled and crunchy.
In the wavenet decoding, the voice sounds clearly higher quality and crisp, but the word sounds more like "suthenty". And not because the audio quality makes it ambiguous but it sounds like it's very deliberately pronouncing "suthenty".
It's as if in trying to enhance and crisp up the sound, it corrected in the wrong direction. It sounds like the compressed data that would otherwise code for a muffled and indistinct "seventy", was interpreted by wavenet but "misheard" in a sense. When wavenet reconstructs the speech, it confidently outputs a much clearer/crisper voice, except it locks onto the wrong speech sounds.
With the standard "muffled/crunchy" decoding, a listener can sort of "hear" this uncertainty. The speech sound is "clearly" indistinct, and we're prompted to do our own correction (in our heads), but also knowing it might be wrong. When the machine learning net does this correction for us, we don't get the additional information of how its guess is uncertain.
This is exactly the sort of artifact I'd expect with this kind of system. As soon as I heard the ridiculously good and crisp audio quality of the wavenet decoder, that fidelity just isn't included in the encoding bits, that's impossible. It's a great accomplishment and just impressive, but it has to "make up" some of those details in a sense very similar to image super resolution algorithms.
I'm just thinking we should perhaps be careful to not get into a situation like the children's "telephone" game, if for some reason the speech gets re/de/re/encoded more than once. Which is of course bad practice, but even if it happens by accident, the wavenet will decode into confident and crisp audio, so it may be hard to notice if you don't expect it.
If audio is encoded and decoded a few times, it's possible that the wavenet will in fact amplify misheard speech sounds into radically different speech sounds, syllables or even words, changing the meaning. Kind of like the "deep dreaming" networks. Sounds like a particularly bad idea for encoding audio books, because small flourishes in wording really can matter.
Edit: I just realised that repeated re/de/re-encoding can in fact happen quite easily if this codec is ever implemented and used in real world phone networks. Many networks use different codecs and re-encoding just has to be done if something is to pass through a particular network.
But the whole thing is ridiculously cool regardless :) And I wonder if they can improve on this problem.
(o=b=>{for(j in a)for(i in a)y=a[i]+j,b?document.write(`<${i*j?'input':'p'} onfocus=value=[o[id]] onblur=o[id]=value;o() id=${y}>`):eval(y+(".value"+o[y]).replace(/[A-Z]\d/g," +$&.value"))})(a="_ABCD")
Or, just stick a regular 3.5mm headphone jack into the two-pronged hole half-way in. The contacts line up in such a way that one channel touch both inputs. Just turn up the volume a bit.
In most airlines the announcements are at fixed volume, independent of the movies. That means you can put the video on full volume and turn it down inline. No more announcements screaming in your ear :) (and better sound quality as a bonus)
Yes, I too travel with both of these. The volume control is convenient, but I originally bought it after breaking a couple of my headphone connectors. Now I break a $5 part instead of a $50 part. (Shure's with removeable cords are great, but the cord is still $50)
Both your link, and the link above, are two components I recently purchased after my own experiences. Just funny to realise I wasn't the only one who encountered this...
I deeply urge everyone interested in go to try to play it in person. Nothing beats a live face-to-face game against someone and then being able to review it on a board afterwards. You'll also find that the people who go to go clubs are very friendly and generally love to teach beginners.