Visual speech recognition (VSR) aims to recognise the content of speech based on the lip movements without relying on the audio stream. Advances in deep learning and the availability of large audio-visual datasets have led to the development of much more accurate and robust VSR models than ever before. However, these advances are usually due to larger training sets rather than the model design. In this work, we demonstrate that designing better models is equally important to using larger training sets. We propose the addition of prediction-based auxiliary tasks to a VSR model and highlight the importance of hyper-parameter optimisation and appropriate data augmentations. We show that such model works for different languages (English, Mandarin, Spanish, French, Portuguese and Italian) and outperforms all previous methods trained on publicly available datasets by a large margin. It even outperforms models that were trained on non-publicly available datasets containing up to to 21 times more data. We show furthermore that using additional training data, even in other languages or with automatically generated transcriptions, results in further improvement.
It is pretty funny because what Tumblr is now missing is high quality edgy stuff, including content curated by photographers while bot-generated/resposted porn shows up again as fast as Tumblr's systems kill it.
Tfa mentions at least 7000 dick pics, and the 'fitness' tag. So yeah, the purge was awful from both ends. I can only imangine the users feel like they are inhabiting some sort of wasteland when they go beyond their feeds.