We recently wrote an article (https://l.bit.io/o-cop26) about methane emissions and the COP26 commitment to cut emissions. During the writing of that article, we found some serious inconsistencies in some of the data sources.
Discussions of data quality and validation in data science tend to end with recommendations for a few data validation checks, such as making sure data come from trusted sources; handling missing values; and investigating outliers. These sorts of checks are important, but they won't save an analysis from perfectly-formatted data from a trusted source that happens to be wrong for reasons that can't be found in the dataset itself. Even data of apparently good quality can lead to faulty conclusions.
This article delves into this question by exploring a case study. The U.N. publishes greenhouse gas emissions data supplied each year by parties to the UNFCCC (United Nations Framework Convention on Climate Change). The data are consistent, up-to-date, and well formatted, and the U.N. is a reliable source of official data. However, there is good reason to believe the data submitted by some countries is not accurate. There are other trusted data sources that show startlingly large differences from the U.N. data. In particular, we found that Russia's Methane emissions data were highly inconsistent with the World Resources Institute (WRI) Climate Analysis Indicators Tool (CAIT) data, even though these data were quite similar to the U.N. data for other countries.
Sure, in a constrained sense at least. There are things I wouldn't do/say because I don't especially want to do/say them in a professional context. But I seldom, if ever, feel that I need to act in a truly inauthentic way at work.
A little bit of false enthusiasm now and then? Sure, it makes it easier to get through the day. In both a professional and personal context.
(Article author here) Agreed! Instead we passed a law locking the size of the House at 435 members. And the present political interest in changing this seems limited at best.
Interesting point, though: smaller states end up at the extremes of both over- and under-representation under the current system (though there does appear to be a systematic bias in favor of small states). I wrote about that in a previous article: https://l.bit.io/census-apportionment-bias. Larger states tend to be close to the national average constituent-to-population ratio while smaller states are more likely to be very over- or under-represented.
We recently wrote an article (https://l.bit.io/o-cop26) about methane emissions and the COP26 commitment to cut emissions. During the writing of that article, we found some serious inconsistencies in some of the data sources.
Discussions of data quality and validation in data science tend to end with recommendations for a few data validation checks, such as making sure data come from trusted sources; handling missing values; and investigating outliers. These sorts of checks are important, but they won't save an analysis from perfectly-formatted data from a trusted source that happens to be wrong for reasons that can't be found in the dataset itself. Even data of apparently good quality can lead to faulty conclusions.
This article delves into this question by exploring a case study. The U.N. publishes greenhouse gas emissions data supplied each year by parties to the UNFCCC (United Nations Framework Convention on Climate Change). The data are consistent, up-to-date, and well formatted, and the U.N. is a reliable source of official data. However, there is good reason to believe the data submitted by some countries is not accurate. There are other trusted data sources that show startlingly large differences from the U.N. data. In particular, we found that Russia's Methane emissions data were highly inconsistent with the World Resources Institute (WRI) Climate Analysis Indicators Tool (CAIT) data, even though these data were quite similar to the U.N. data for other countries.