I think it's more like "fetching and displaying various image formats is outside the purview of HTML5 canvas".
If you want to just show an image, you use an <img> tag, or just play an audio file you use <audio>. Canvas and the Web Audio APIs are for pages that want to make or mix their own images/audio. Though to be fair, html/javascript do make it easy to load image data from an image tag directly into a canvas; maybe there's a missing parallel for audio.
Just looking at that clause makes me think perhaps the Web Audio API should have been called something else.
Can you imagine writing "fetching and displaying various image formats is a bit outside the purview of HTML"?
(I realize that's a bit apples 'n oranges.)