I imagine this came up partly as a result of the recent alpha compositing discussion.
We desperately need some research, based in user studies and using modern display technology, to settle some basic questions:
* What reconstruction filter gives the best results? Is it the same for vector (text) and natural images? By "best" I do mean contrast (sharpness) and lack of visible artifacts.
* For rendering of very thin lines (relevant to text), what gamma curve gives the perception of equal line thickness across the range of subpixel phase? How does it vary with display dpi? (Hint: it's likely not linear luminance)
* What gamma curve yields the perception of equal width of black-on-white and white-on-black thin lines (also relevant for text)? (Hint: likely not linear luminance)
I've seen a number of discussions where people feel they are able to answer these questions from first principles, and corresponding arguments that doing these in a "correct" way gives results that are less visually appealing than the common assumptions of treating a pixel as a little square (so doing a box filter for reconstruction) and ignoring gamma so effectively using a perceptual color space for the purposes of alpha compositing.
I posit that these questions cannot be solved by argumentation. I think user studies might not be very difficult to do; you could probably get "good enough" results by doing online surveys, though this wouldn't pass standards of academic rigor.
> * What gamma curve yields the perception of equal width of black-on-white and white-on-black thin lines (also relevant for text)? (Hint: likely not linear luminance)
Gamma curves do not affect black and white themselves, only intermediate grays. It’s true that grays are usually used to draw antialiased black and white lines, but we can also think about “ideal” (axis-aligned, pixel-centered, non-antialiased) black and white lines. By framing this as a question about gamma, you’ve implicitly assumed that “ideal” black lines on white would have equal perceived thickness to “ideal” white lines on black.
This is not the case, as typographers have known for decades. The right way to draw perceptually equal-thickness black-on-white and white-on-black lines is to vary the line width (perhaps even by more than a full pixel if the lines are thick enough!). Gamma only comes in afterwards, to help us reproduce the varied widths accurately, and the accurate way to do that is in a linear color space.
I should have clarified that I meant lines with subpixel phase and possibly widths that are not integral numbers of pixels, ie the antialiased case.
Also I totally agree that we need to take into account the perceptual differences between black-on-white and white-on-black even assuming the display technology is perfect. That's one reason doing these studies is not trivial!
The non-antialiased “case” is one spot among the continuum of antialiased “cases”; it doesn’t satisfy totally different perceptual rules. What gamma curve would you propose to correct for the perceptual equality of a 9.3px white-on-black line with a 11.6px black-on-white line? What I’m saying is that you’re presupposing the wrong layer to make the correction (antialiasing or no).
You are correct. Change my third question to, "after applying a correct gamma curve to achieve perceptually uniform line widths in the unipolar case, what is the correspondence of line widths when inverting the polarity to preserve the perception of width matching?" Then you want to break that down into the contribution assuming a perfect display and the effect from lower dpi.
This question is less important than the other two, and plausibly the best place to address the effect is in design, rather than rendering.
As a simple demonstration, here's an image of a 1px white-on-black line and a 1px black-on-white line: https://imgur.com/a/9d9Tu3x
It's axis-aligned and box-filtered so there are no shades of gray, which means gamma is irrelevant. The white-on-black line appears to have a greater thickness than the black-on-white line.
There's a lot of empirical research showing that reading performance is better with positive-polarity (black-on-white) text than with negative-polarity text [1, 2], probably because the higher overall luminance results in a smaller pupil size and thus a sharper retinal image [3]. So, white-on-black lines appear thicker than black-on-white lines because the eye doesn't focus as sharply on them. This is true regardless of which color space blending is performed in.
Given this fact, if one wants to achieve uniform perceptual line thickness for black-on-white and white-on-black text, a more principled approach than messing with color blending would be to vary line width or the footprint of the antialiasing filter based on the luminance of the text (and possibly the background). This is the approach Apple and Adobe have taken for years with stem darkening/font dilation.
One caveat, if viewing this image on a high-dpi display there will be blurring from image upsampling by the browser. My eyes have the same result as yours: the white line appears thicker than the black one.
Here's the interesting thing about your observation: doing alpha compositing of text in a perceptual space (as opposed to a linear space) results in a thickening of black-on-white and a thinning of white-on-black. So doing gamma "wrong" actually results in better perceptual matching.
Do you have evidence that either Apple or Adobe varies the amount of stem darkening based on text luminance? I've tested Apple (macOS 10.12 most carefully) and have not seen this.
I don't think they actually take luminance into account (although I personally like the idea); I just meant that they solve the problem of "black-on-white text looks too thin" by thickening the text, rather than messing up other parts of the pipeline.
Absolutely, and I think there's a very strong case to be made for that. One of the points I'm trying to make is that you have to solve it somewhere. People who focus narrowly on "correct gamma" often miss this, especially when "incorrect gamma" is also plausibly a workable solution.
I can tell you right now that you will have a very difficult time beating a normalized gauss filter with a diameter of around 2.2 pixels in a general case.
Color and luminance is a separate and orthogonal issue from filtering. I also know that people get away with compositing without converting to linear space, but I'm skeptical that any benefits they see aren't just a matter of getting the color curve they want for free, as opposed to doing a similar correction after something has been composited correctly.
I was working at PDI Dreamworks during one of the semi annual investigations into which filter kernel was best. And I was completely blown away by how quickly the lighting sup could identify and react to various filters. Gaussian was voted down reliably and repeatedly for being too blurry.
Personally, I like the extra blur I get (and extra safety and guarantees) you get with Gaussian. Back in the NTSC days I discovered that vertically blurring interlaced video made it noticeably more clear and visible, even though it was softer.
But, If you do really spend time with sharper filters, it is true that Gaussian is softer and some pros really do want sharper images than Gaussian can provide.
Were all the filters being compared with the same width? At the same width it will be softer, at a smaller pixel radius is where you can compare aliasing with the same visual sharpness as something like catmull-rom.
There were a variety of widths, it was one axis of the study. But I don’t remember the details, it is certainly possible you’re talking about something we didn’t test. Personally, at first I couldn’t even see the differences they were discussing.
Having studied graphics and signal processing for a few years in graduate school before that job, I thought I would be good at seeing the differences, and I was a bit shocked how good they were at it, and how not that good I was. :)
Truncating the Gaussian too closely though, and it’s not exactly a Gaussian anymore, you lose the best antialiasing properties. I can totally see how it will be sharper and more comparable to other popular filters. (Normalized & truncated at 1.1 radius is just slightly outside the 1 std dev line, right?)
Gaussian is my personal choice for large format prints of images with extreme aliasing problems.
My experience has likewise been that film DP’s have extremely impressive visual acuity, memory for color, etc. Talented artists often have developed whole sets of skills that the rest of us are unaware are even skills. In the same way a programmer might have thought about cache line false sharing as it affects memory hierarchy throughput, visual art often hides lots of expertise you cannot directly perceive, even as you can sense the quality of the whole.
I like to describe “learning to draw” as “installing a 3d modeling and rendering package on your brain, along with a decent collection of base models to modify”. It’s a complex skill set. If you start animating you get to add in a physics simulation. And you become conscious of so many little things that the layman only notices when you get it wrong.
> Having studied graphics and signal processing for a few years in graduate school before that job, I thought I would be good at seeing the differences, and I was a bit shocked how good they were at it, and how not that good I was. :)
IMO the best place to learn to see these distinctions is in careful photo printing (of the type where you spend >30m per image, and where “printing” here is being used in an old-school sense of “all of the manual steps to take a raw image from the camera and turn it into printed output”).
Spend a few months doing that for a few hours per week and your ability to see artifacts, textural details, fine differences in amount of edge contrast, etc. will shoot up. (Obviously the folks who spend 30 years on this are even better.)
Studying signal processing, optics, psychophysics, etc. is also useful for understanding what you are noticing, but it isn’t seeing practice.
The width of the filter is separate from how the gauss curve is used as a filter (and not everyone does it the same).
Using only one standard deviation cuts off a huge amount of the curve, ideally it would be cut off around the third standard deviation and normalized so that the value after the last would be zero.
Reeeaallly? I haven't made the images, but my intuition is telling me that text will be noticeably blurry compared with a box filter.
I basically agree with your points regarding compositing in a linear space, except that I suspect that thin black-on-white lines will come out looking thin and spindly.
It sounds like you are talking about text that doesn't move.
If you want a general filter that can give a result without visible aliasing while sacrificing as little sharpness as possible, a 2.2 gauss filter is very hard to beat and I have spent a lot of time trying.
Box filters can be sharper, lancoz filters can be better for scaling down a final image, etc. but they will alias in a general sense. You might not see it in static text which is fine.
Also thin black on white lines are an extreme outlier, since what you percieve is relative and the entire image matters. It is more into the realm of optical illusion that play off our relative sensitivity.
A ‘Gaussian’ which extends 1.1 units is a non-standard and not clearly defined thing. Are you picking some fraction of 1.1 as the standard deviation and truncating after? (Often Gaussian filters are truncated after 3 standard deviations, or similar.) Are you multiplying by some other window function? ...
Can you link to a more explicit formal description of what you mean?
When I say 2.2 pixels as the diameter, I mean roughly that a pixel will be made up of all the samples 1.1 pixels or closer to the center of the pixel and they will be weighted by a gauss curve. How that gauss curve is actually used is not formal or strictly defined, but is usually a curve that goes out to about 3 standard deviation a baked into a LUT and normalized.
A curve that uses more standard deviations would have to be wider in pixels to look similar.
Do you have some code for what you specifically are talking about? Or a formal description?
You have been saying here “this kernel is better than all of the alternatives” but it’s hard to evaluate that kind of claim without knowing precisely what you mean.
“not formal or strictly defined” is not super encouraging.
I never said it was better than all alternatives, I said my experience is that it is difficult to beat in a general case.
Bake a gauss curve out to three standard deviations into a LUT and normalize it. You can look at what PBRT does.
Keep in mind that I was replying to someone confused by all the choice of filters and was giving him a very solid starting point. This isn't some grandiose claim of scientific exploration, it's experience.
So obviously text is one of the things I care about a lot. I'm willing to accept your filter as being very good for a lot of stuff other than presenting a GUI.
Does this mean that there is not one true right answer for images, as there is for audio[0]? That you should use different reconstruction filters depending on the application. Is it accurate to say that a pixel is not a little square unless you're rendering text, in which case it is?
[0] I know this is a slight oversimplification, if you care deeply about latency you might choose between linear and minimal phase, etc. But for consumer applications it's true enough.
True, but how often do you care about the general case? Most images are representations of real objects, or symbols or diagrams designed for human comprehension. Ignoring cases that in practice are very unlikely to occur lets you optimize for the more common cases. Techniques like resizing in a sigmoidal color space (see http://www.imagemagick.org/Usage/resize/#resize_sigmoidal ) don't have any rigorous mathematical basis, but they're tuned subjectively to have good results in common cases.
Lanczos interpolation seems to beat a gauss filter for me. It has some overshooting/ringing effects of course, but the improved sharpness is a great tradeoff compared to the blurriness of gauss filtering. Of course, gauss filtering might still play a residual role in analog systems, such as the one that physically displays stuff via a CRT screen.
The perception of equal line thickness may depend on orientation of the lines (https://res.mdpi.com/vision/vision-03-00001/article_deploy/v...), distance from the fovea, distance from where your attention lies (a can of worms even deeper. I don’t think there’s agreement on whether one can attend to more than one visual location at a time, or what ‘attention’ even is), direction of those distances (vertical will almost certainly be different from horizontal, sight could be better or worse in the nasal direction vs the temporal one), light/dark adaptation of the eyes, whether subjects are color-blind, etc.
It's possible this research has been done, but if so I haven't seen it yet. Basically the reason I feel confident asserting it is that I'm talking very specifically about the stimulus produced by modern displays. I know there's some work by Avi Naiman on CRT displays from the very early 90's[0]. There might be some other work done on low-resolution LCD's, but even aside from resolution it's only modern, high quality displays that have good contrast and low dependence on viewing angle (which affects gamma greatly).
Another place to look is Kevin Larson's work on subpixel rendering, which informed Microsoft's ClearType efforts. But that was done mostly around 10 years ago, when displays also were different. A good representative is [1].
Here's another pretty good paper I found[2], but it focuses more on the display technology than the perception side.
So what I'm looking for is adjacent to general psychophysics results on visual perception, but much more specific to what real displays do. That literature is pretty thin on the ground.
[0]: Avi C. Naiman and Walter Makous. Spatial nonlinearities of gray-scale CRT pixels, 1992
There is research about perception of physically printed lines, done by e.g. print photography companies decades ago.
There is plenty of formal signal processing analysis of the aliasing artifacts at different angles created by grids of pixels.
The ImageMagick folks did a bunch of experimentation about resampling filters as used in arbitrary transformations of existing raster images. https://www.imagemagick.org/Usage/filter/nicolas/ – of course what looks best depends significantly on (subjective) preferences and on what the source image is.
So based on this research, what reconstruction filter and gamma curve should I use to render text so it looks good over a range of fonts? :)
Or, perhaps a better posed question. What research-informed model will accurately predict the results of a user study that presents various renderings of antialiased lines on a modern LCD monitor and asks subjects to choose "which line is thicker" types of queries?
Also does the font have variable weight? Are you adjusting it in any other way before rendering? What context were the fonts designed for – did the font designers evaluate their appearance using any specific displays / rendering engines?
> asks subjects to choose "which line is thicker" types of queries
I think you could develop a model for “which line is thicker” given a specific target display without inordinately much trouble; you might have to tweak some parameters for matching particular displays. The harder question is “which line is the right thickness”, especially if you don’t have any correct answers to reference.
We also don’t just care about apparent line thickness but also spatial resolution, aliasing artifacts, ...
A pixel is a picture element. An element of a picture. Hence the name... It turns out that thinking of them as little boxes arranged in rectangular grids is very useful. Because that is how computers deal with them. Not as point samples.
The article reminds me of the many mathematical text I've read insisting on that vectors are not tuples of numbers. That thinking of them as anything other than directions with magnitudes is wrong. Technically, that might be correct but vectors-as-numbers is much more useful when calculating with them. When you get into more abstract mathematics, and your vectors contain other kinds of algebraic objects, such as polynomials, you are already so accustomed with them that you can think of them as flying burritos if you like.
When I teach graphics programming, I will continue to tell students that pixels are like little boxes.
> The article reminds me of the many mathematical text I've read insisting on that vectors are not tuples of numbers. That thinking of them as anything other than directions with magnitudes is wrong.
I break every mathematical object down into three things:
1. The intuition. Why do we have this concept to begin with? What underlying idea are we trying to capture?
2. The definition. These are the axioms.
3. The implementation. This includes every way to communicate the idea, from natural language words to notation to source code.
Without the intuition, you have nothing but a symbol game. It's hollow. Something with rules and notation but no deeper intuition is, arguably, chess.
Without the definitions, you can't think rigorously about your ideas and you don't know if they lead to internal contradiction. You can dump the axioms without losing the intuition; we did this with set theory at the turn of the previous century, when Russell proved that the previous axioms were inconsistent. We saved set theory without having to abandon the notion, the intuition, of sets entirely.
Without implementation, it's just thought, and you can't communicate with anyone. Moreover, without some intuition, the implementation is meaningless, because you have no cognitive frame to use to interpret it.
So the tuple of numbers is one implementation of a vector. It allows you to communicate some aspects of a vector, but without the underlying idea of what a vector means, what concept we're trying to get across, it's just a list of numbers. They might as well be box scores or something.
There's also the case that there's a bunch of vectors you _can't_ represent as tuples of numbers. A vector with orientation but a magnitude of 0 is a completely valid vector afaik, and you could do things with it like normalizing it to get a unit vector of the same orientation, and it's not representable as a tuple.
The tuple model is extremely useful, but incomplete.
Vectors with different orientation but the same magnitude are the same vector (the zero vector.) Observe that if you add a zero magnitude vector and a unit vector, the result is just the same unit vector. It follows that the normalization scheme you described can't be a function on vectors (since you can "normalize" two equal vectors but get two different results.)
> A vector with orientation but a magnitude of 0 is a completely valid vector afaik
Not by the mathematicians' definition. Maybe somewhere in physics such objects are useful, but if you formalized them you'd get something other than a vector space.
> When I teach graphics programming, I will continue to tell students that pixels are like little boxes.
There’s certainly a sweet spot, where too much theoretical background takes away from learning graphics, and too little could leave students unprepared, or worse, uninterested.
It’s certainly good & fun to go through at least a little sampling theory.
You might also be interested to know the guy who wrote this paper co-founded Pixar. His graphics advice is worth careful consideration.
> It turns out that thinking of them as little boxes arranged in rectangular grids is very useful.
Do keep in mind that the shape and the arrangement are two separate things. Alvy’s paper is talking about the sample shape, but not talking about the arrangement. The grid arrangement is useful, and Alvy would agree.
> Because that is how computers deal with them. Not as point samples.
I’d be cautious drawing that line. I realize you were talking in part about grid arrangements. But computers treat samples however we teach them to. It’s uncommon and unlikely you’re writing a lot of code that truly handles pixels as finite square geometry rather than a point sample. Plus, if you’re teaching things like magnification filtering using bilinear or bicubic, then you’re already treating pixels (or perhaps texels) as point samples.
> I’d be cautious drawing that line. I realize you were talking in part about grid arrangements. But computers treat samples however we teach them to. It’s uncommon and unlikely you’re writing a lot of code that truly handles pixels as finite square geometry rather than a point sample.
I thought about it and came up with a counter-example. subpixel rendering in general and ClearType in particular. The algorithm works by considering the exact arrangement of RGB squares (rectangles actually) to improve the appearance of rendered text. Theoretically subpixel rendering could increase the horizontal resolution threefold which were very useful for DPI-starved screens. Font hinting were also used to fit the text into the confines of the limited pixel grid.
> I thought about it and came up with a counter-example. subpixel rendering in general and ClearType in particular. The algorithm works by considering the exact arrangement of RGB squares (rectangles actually) to improve the appearance of rendered text.
I mostly agree, and see my other top-level comment where I called out LCD panels as having physically square pixels.
It’s a good thought, and it is correct that ClearType is considering the sub pixel arrangement of LCD elements. But also remember the arrangement of pixels isn’t Alvy’s main point, he was mainly trying to convey how to think about the shape of samples (pixels).
Even with ClearType you don’t necessarily want to integrate the sub-sub-samples of an LCD sub-pixel with a box (square) filter.
For sub-pixel rendering in general, box filtering is definitely not the best answer. Though yes, lots of people do it and get away with it all the time when sampling quality is not a high priority. Games are a good example, even as they’re improving. Treating pixels as square when sub-sampling causes ringing artifacts that can never be cured by adding more samples. This is actually a really fun thing to do with a class of graphics students because it’s kind of surprising the first time you really get it. For some nasty antialiasing problems only high quality kernels like a Gaussian will integrate sub-samples without artifacts.
Note I’m not talking about LCD sub-pixels there, just normal supersampling. The Wikipedia article on ClearType calls that “grayscale antialiasing” to distinguish it from LCD red-green-blue subpixels. But IMO that’s a bad name since grayscale antialiasing is still referring to filtering color images.
> Because [little squares] is how computers deal with them. Not as point samples
This is not accurate. Computers generally represent raster images as arrays of numbers (where each entry in the array is called a “pixel”). There are no literal little squares involved. Some code (much of it mediocre) conceives of those arrays of numbers as representing little squares. Other code does not.
> vectors-as-numbers is much more useful when calculating with them
This is super myopic / parochial.
Mathematicians think of “vectors” as elements of an abstract vector space (i.e. anything with well-defined concepts of scalar multiplication and vector addition over some field). This is useful to them because there are many powerful theorems which work in general for any arbitrary vector space, or sometimes for any vector space over the complex number field, or sometimes for any finite-dimensional vector space, or ....
Physicists think of vectors as directed magnitudes, generally some kind of measurable physical quantity in Euclidean 3-space (or Minkowski space). This is useful because many kinds of combinations and relations of directed magnitudes can be computed can be made without reference to any specific coordinate system.
One possible representation of physicists’ vectors (or certain types of mathematicians’ vectors) is an array of numbers.
But an array of numbers by itself is a completely different type of object than a vector. There are no specific well-defined operations on a generic array of numbers; or rather, depending on what it represents there are a wide variety of operations that might be meaningful or reasonable.
There are many kinds of “calculations” which are completely abstract where thinking of vectors as arrays of numbers is unbelievably obscurantist and counterproductive. Proofs and derivations involving coordinates are almost always extremely cumbersome.
There are even many types of concrete calculations on vectors-represented-as-arrays-of-numbers where the most effective algorithm is to first convert to a different representation.
As a physicist myself, I find I think of vectors more often as a set of independent magnitudes for each of the coordinate system axes, than of a single magnitude with a direction. That's because in most cases, the physics just works independently in each axis. So it makes a lot of sense to treat vectors as sets of numbers.
Most calculations physicists do are the same as pure mathematicians would do. So we use for example in-product and cross-product operators to operate on vectors, without caring for the exact coordinate system used. Only when we have to come up with a final answer to a question like "what angle does the ball hit the ground and with what velocity?" will we convert our vectors into a magnitude and direction.
> I find I think of vectors more often as a set of independent magnitudes for each of the coordinate system axes
Interesting. This is what Hestenes calls the “coordinate virus”, http://geocalc.clas.asu.edu/pdf/MathViruses.pdf ; very often any specific coordinate system is not inherent but is some arbitrary addition to the space made for convenience in some particular calculation. It is in my opinion a mistake to think of the coordinates as primary.
> operate on vectors, without caring for the exact coordinate system used
Not really. I didn't say in the first statement that a specific coordinate system was used. In most systems, it doesn't matter how you choose your coordinate system, as long as the basis is orthogonal, treating the coordinates independently of each other will work.
> That thinking of them as anything other than directions with magnitudes is wrong.
Mathematicians like to think of just about everything as vectors, so that sentence seems a bit off.
Anyway, yeah, starting from abstract definitions is a great way to make a student's life hell. After all, there's a reason universities teach linear algebra twice.
This piece is a classic and a must-read for graphics people, but do remember that this was written before LCD displays. Today’s pixels actually are little squares to a much greater degree than CRTs in 1995. That doesn’t change the theory or truth in Alvy Ray’s paper, but it does mean that the perfect reconstruction isn’t the same now that it was then.
The more I've thought about it since, the more I think we should represent images with hexagonal pixels (i.e., laid out on a hex grid), and that color images should treat the center points of red, green, and blue subpixels as not being right on top of each other. (the third image in the post shows how they would be arranged, which is actually similar to how they are on some displays)
It would be a little harder to deal with for graphics programmers who are working at a pixel level, or at the least, it would require a bit of relearning. But it makes more sense in so many ways. Hex is just a better way of "circle packing" (as you notice if you arrange a bunch of pennies on a table), and of course real world displays tend to have the red, green and blue subpixels offset from one another anyway. (are there any that don't?)
Obviously it isn't easy to change something like this at this point, but still, I find the idea fascinating, and appealing to the OCD efficiency fan in me.
You might want to look into pentile displays. You are correct in that they provide higher resolution than rectilinear displays (see the controversy tab in [1]), but they have largely fallen out of style because it is impossible to draw a straight line so everything looks blurry upon close inspection.
One cool trick is that you can sprinkle in more efficient white pixels to allow the display to reduce its power consumption.
This is very relevant for image reconstruction for medical imaging systems. When determining how much a beam of radiation is attenuated through an image, it really matters what the representation you use when translating a 2D matrix of values of material parameters, to an image representation. Are they boxes, trapezoids (bilinear interpolation), 2D gaussians, spheres? Each technique has some drawbacks, but for getting down to millimeter precision in scans it matters.
We are so used to see the visual presentation of samples that look like a bar diagram, that a lot of people think analog sounds better because the curves are smoother.
I wish I could force any audiophiles to watch it before they waste their money on snake oil. I think a similar point can be made about analogue synthesis, e.g. You pay Moog £1500 for what is ultimately a fancy box for a relatively simple analogue circuit.
The problem is that most audiophile behaviour is driven by emotion and psychology, not by facts and evidence. In that respect it's perfectly analogous to religion—and we know how futile it is to challenge religious thinking with facts and evidence.
The underlying frustration is that objective (measurable) and subjective (unmeasurable) improvements in sound quality could ever be placed on equal footing with each other. It just wrecks my brain that people would ever spend a dollar on improvements that has an unmeasurable or negligibly measurable impact on the when there's still so much opportunity for substantial, measurable improvements.
If you want to improve your audio, focus on speakers, speaker placement, room acoustics, bass management and in-room calibration. Most everything else is relatively marginal (e.g. fancy amps, fancy DACs) or negligible/unmeasurable (e.g. fancy wires).
For most people with mid+ range audio gear, the number one upgrade they can perform is almost always to add targeted sound absorption to their room, not to change any of the electronics.
I try to explain that to people. Even just the basics like fiddling with your EQ to your liking.
A lot of this analogue woo applies to synthesisers, as mentioned previously. For £500ish you can buy a million core, trillion transistor monster but for £1500 you get a box with some op amps and a (gasp) digital signal path in some cases.
>We are so used to see the visual presentation of samples that look like a bar diagram, that a lot of people think analog sounds better because the curves are smoother.
Except for a philosophical debate about continuity, isn't that true?
The “stair-steps” you see in your DAW when you zoom up on a digital waveform only exist inside the computer. [...] When digital audio is played back in the Real World, the reconstruction filter doesn’t reproduce those stair-steps – and the audio becomes truly analogue again.
Yeah, it's strange people are thinking the display of a spectrum analyzer is somehow 1:1 with the underlying thing which they are measuring. As if a digital clock with only the hour and minutes displayed implies that seconds don't exist.
No, you put a low pass filter on the other side of the DAC and it looks equally smooth. Those stair steps are high frequency. This is the filter is called the image rejection or reconstruction filter. Once you learn about the frequency domain, you realize how silly most digital versus analog debates are and how very few really understand what is going on.
Be careful generalizing the audio results to pixels. The central lesson of xiph's work is that people simply cannot hear frequencies above, let's say 20kHz. Therefore, as long as your sampling rate is above the Nyquist limit (and under the assumption the signal chain is linear), any reconstruction filter that passes frequencies through 20kHz is effectively "perfect."
There are two ways this is not true for pixels. First, even for "retina" displays the human visual system can make out spatial frequencies beyond the Nyquist limit of the display (this will vary by viewing distance, so is more of an issue for young people who can get close to their displays). Second, even assuming perfect gamma, the display must clip at black and white because of physical device limitations. Thus, especially for text rendering, only a reconstruction filter with nonnegative support is generally useful. Such a reconstruction filter would be an extremely poor choice for audio.
It is true that many of the underlying signal processing principles are the same, and I encourage people to learn and understand those :)
Thanks for the heads up. There is one specific section where he compares pixels to lollipop graphs and that is mainly what I was referring to, I didn't mean to suggest that all the principles in the video apply to graphics in the same way that they apply to audio.
However, image is not a wave. Sampling theorem applies beautifully to audio, but only to a limited degree to images. Some filters make sense in frequency domain, eyes are sensitive to certain frequencies more than others, but it all breaks down on hard edges, which don't behave like square waves.
The problem is that in images the Gibb's effect is visible and annoying (ringing artifact). If sampling theorem applied, people wouldn't be able to see it, like they can't hear the difference between square waves shifted by a half of a sample.
This is mostly only true because our display technology lacks the resolution to reliably saturate the visual system in the same way that our audio technology does.
> rid the world of the misconception
that a pixel is a little geometric square.
> The little
square model is simply incorrect. It harms.
> I show why it is wrong in general.
Why the bluster?
He makes a decent case for this point in the domain of graphics processing; what he calls "correct image (sprite) computing".
But the narrow focus on computer graphics undermines these broad generalizations. There are many other domains that can be represented in pixel-based data models. Climate, terrain, population and land cover mapping are just a few domains where the use of a pixel as a "little geometric square" is a perfectly viable approach.
Ultimately, if the message is "think about how your data model maps to reality" - I agree. But why the hyperbole? Why shit on an entire model because it doesn't fit for your very specific use case?
Texels and Voxels can be square/cubic, such as in video game applications where it is accepted and exploited as a fundamental esthetic: worlds are textured with tiled mosaics which reveal their square unit when approached closely, and ditto for worlds made of voxels.
A voxel as a sample of a solid, for instance from a computed tomography, where the fidelity of the reconstruction matters, is subject to different requirements.
Both the word “pixel” and the word “voxel” are commonly used in both of these different senses. I believe the word “voxel” was created to be explicitly a 3D analog of a 2D “pixel”.
For example, you can also have “pixel art”, drawings made up of little squares, which only loosely have to do with pixels as samples for a raster image or physical camera detector elements or physical display elements. https://en.wikipedia.org/wiki/Pixel_art
> A Pixel Is Not a Little Square! Microsoft Tech Memo 6
is this the reason why they forced this horrible bilinear filter on the windows image viewer for so long (guess they still do) ? It made me crazy, it's so ugly
No, this was written by Alvy Ray Smith 25 years ago in response to incorrect filtering in computer graphics in general. This has nothing to do with whatever you are seeing.
We desperately need some research, based in user studies and using modern display technology, to settle some basic questions:
* What reconstruction filter gives the best results? Is it the same for vector (text) and natural images? By "best" I do mean contrast (sharpness) and lack of visible artifacts.
* For rendering of very thin lines (relevant to text), what gamma curve gives the perception of equal line thickness across the range of subpixel phase? How does it vary with display dpi? (Hint: it's likely not linear luminance)
* What gamma curve yields the perception of equal width of black-on-white and white-on-black thin lines (also relevant for text)? (Hint: likely not linear luminance)
I've seen a number of discussions where people feel they are able to answer these questions from first principles, and corresponding arguments that doing these in a "correct" way gives results that are less visually appealing than the common assumptions of treating a pixel as a little square (so doing a box filter for reconstruction) and ignoring gamma so effectively using a perceptual color space for the purposes of alpha compositing.
I posit that these questions cannot be solved by argumentation. I think user studies might not be very difficult to do; you could probably get "good enough" results by doing online surveys, though this wouldn't pass standards of academic rigor.