Seriously, a loglog plot? Even in that, there is a seriously wide dispersion to ...

dwaltrip · on June 22, 2023

The linear plot in that Wolfram link is messed up. It doesn't show all the data (caps out at 800 billion GDP). Here's a corrected linear plot, from the script that I linked (commenting out the log-log scaling):

https://ibb.co/9bBgwH8

There is clearly a correlation, even on linear. It's a little messy, but it's undeniably there.

The starting point for this discussion was about the relationship between a country's size and population and it's power and influence. The correlation between area and GDP demonstrates that there is a meaningful relationship.

Btw, what is your specific complaint about a log-log plot? Country data points for area and GDP span many orders of magnitude, which makes it harder to visualize any patterns on a linear plot.

I also don't understand your point about the dispersion. The correlation and trend is pretty clear. No one said the correlation was 99%.

Edit: I've calculated Pearson's correlation coefficient for this data [1]. The result is 0.82, which indicates a strong positive correlation.

[1] https://en.wikipedia.org/wiki/Pearson_correlation_coefficien...

zimpenfish · on June 22, 2023

> The result is 0.82, which indicates a strong positive correlation.

datamash gave me 0.52 for Pearson. Which is "eh, maybe".

erwald · on June 30, 2023

That's weird, are you looking only at the top 10 countries?

I've reproduced dwaltrib's results using World Bank data on 251 countries, and I get a Pearson's r of 0.82 and a p value of 5.6e-61 (!). I.e. a strong correlation, with high confidence. It makes sense too -- larger countries generally have more people, and more people generally generate more economic activity.

Code if you want to try yourself:

import pandas as pd

gdp = pd.read_csv("~/Downloads/API_NY.GDP.MKTP.CD_DS2_en_csv_v2_5551501.csv").set_index("Country Name")

land_area = pd.read_csv("~/Downloads/API_AG.LND.TOTL.K2_DS2_en_csv_v2_5552158.csv").set_index("Country Name")

gdp["GDP"] = gdp["2020"]

gdp["Land"] = land_area["2020"]

gdp = gdp.dropna(subset=["GDP", "Land"])

from scipy import stats

print(stats.pearsonr(gdp.Land, gdp.GDP))

#+RESULTS: : PearsonRResult(statistic=0.8151313879150333, pvalue=5.621180589722219e-61)

dwaltrip · on July 10, 2023

Thanks for double checking. I ran out of steam on the convo heh.