Unit testing is a means to an end: how do we verify that code is correct the fir...

snidane · on Oct 25, 2022

If you need to maintain some sort of deductive correctness - ie. my inputs are correct and my code is correct therefore my outputs are also correct - you're gonna expose yourself to only a tiny amount of real world problems.

Data engineering is typically closely aligned with business and it's processes are inherently fuzzy. Things are 'correct' as long as no people/quality checks are complaining. There is no deductibe reasoning. No true axioms. No 'correctness'. You can only measure non-quality by how many complaints you have received, but not the actual quality, since it's not a closed deductive system.

Correctness is also defined by somebody downstream from you. What one team considers correct, the other complains about. You don't want to start throwing out good data for one team just because somebody else complained. But many people do. Or typically people coming from SWE into DE tend to, before they learn.