Sunday, December 18, 2016

Scatter/Bubble charts and regression analysis: median income and obesity levels

Is there a correlation between median income and obesity? I have put some numbers from a few data sources (see full list in the end of the post) in a single GoogleDocs spreadsheet. I have created a new grouped bubble chart and imported the data using CSV/TSV wizard, using Region/Haplogroup column for grouping:

In the chart editor, I played with the new panel that allows to choose regression method and turn on and off group-by-group regression. Finding correlation between incomes and obesity levels without grouping countries by region does is too challenging, so I decided to focus on running regressions in a few clusters of countries instead. I used prevailing haplogroup data for grouping, so do not be surprised when you see Jamaica and Barbados in the African cluster, and Mongolia in the Europid cluster. You will not see Oceania in this spreadsheet because of the lack of both income and haplogroup data. And you will not see Switzerland here because Gallup did not publish income survey results for it.

As expected, linear regression did not give good results. Logarithmic regression looked better, but the countries on the right end of the curve did not fit well in the picture. Polynomial worked the best (the chart is a bit oversize):

The pattern is clear: obesity level grows with median income up to the point when higher taxes kick in and drive obesity level down. This is NOT a scientific research, of course. But the chart works well to demonstrate the regression feature.

Tools used: Microsoft Excel, online chart editor


No comments:

Post a Comment