Sunday, December 18, 2016

Scatter/Bubble charts and regression analysis: median income and obesity levels

Is there a correlation between median income and obesity? I have put some numbers from a few data sources (see full list in the end of the post) in a single GoogleDocs spreadsheet. I have created a new grouped bubble chart and imported the data using CSV/TSV wizard, using Region/Haplogroup column for grouping:

Friday, September 2, 2016

Importing CSV/TSV data: SUM and AVG aggregate functions

These are accountant's favourite aggregate functions. Consider a sample data spreadsheet from Tableau slightly massaged to expose order year, quarter and weekday explicitly - here is the Google spreadsheet we will be working on. About 8000 order lines spanning a few years of an online superstore sales activity.

Quarterly sales by province
Let's create a Line chart with totals for each quarter by province (it happened to be a Canadian superstore).

1. Login to charte.ca and create a new Line chart.

2. In the Data panel, click "Import" (you can find it right under the data grid).

3. On wizard screen #1, paste data from the spreadsheet (we will re-use this data in all examples):


4. On Wizard screen #2, select series (Province), category(Year-Quarter) and value (Sales) columns:


5. On Wizard screen #3 (filtering), leave all fields blank - we do not need any filtering.

6. On wizard screen #4, select SUM aggregate function:


7. On Wizard screen #5 (sorting), leave default values in all fields and click "Done".

8. As a result, you will get this data grid and chart:

Wednesday, August 31, 2016

Importing CSV/TSV data: COUNT aggregate function

Refresh your data journalism skills. Let's look at San Francisco police dept incident database for the years 2003-2013 available here: data.sfgov.org. For convenience, let's use a snapshot of it stored as Google spreadsheet. It contains about 15000 records, one record per incident.

Totals
Let's create a Pie chart with totals for each crime category.

1. Login to charte.ca and create a new Pie chart.

2. In the Data panel, click "Import" (you can find it right under the data grid).

3. On wizard screen #1, paste data from the spreadsheet (we will re-use this data in all examples):


4. On Wizard screen #2, select series (crime category), category(NONE) and value (NONE) columns:


5. On Wizard screen #3 (filtering), leave all fields blank - we do not need any filtering.

6. On wizard screen #4, keep aggregate function selection at COUNT, so data import routine simply counts correspondent records.

7. On Wizard screen #5 (sorting), leave default values in all fields and click "Done"

8. As a result, you will get this data grid and chart:


Totals by year
Naturally, now we are curious how crime picture progressed through the years. Initial steps are the same, but the chart type is Line now.

Monday, August 29, 2016

Importing CSV/TSV data: Tableau example

From now on, charte.ca allows you to import flat comma- or tab-separated data. Usually, CSV/TSV data is just a set of samples that are nor suitable for displaying in a Pie, Bar or Line charts. To overcome this limitation, powerful software packages offer additional functionality like Excel pivot tables. charte.ca allows you to build a pivot table in a few easy steps.

This example shows how you can import data from a Public Tableau visualization. Fortunately, Tableau has a CSV export feature and we will take advantage of it.

1. Login to charte.ca and create a new Regular Line chart.

2. In the Data panel, click "Import" (you can find it right under the data grid).

3. On wizard screen #1, paste data from 



Please note the data we are importing is flat indeed - series, categories and values are just raw data columns. On the next step, we will define the pivot table structure.

4. On Wizard screen #2, select series, category and value columns:


Please keep in mind that in charte.ca editor, series become grid columns, and categories become grid rows. 

5. On Wizard screen #3 (filtering), leave all fields blank - we do not need any filtering.

6. Nothing to select on Wizard screen #4 (aggregate function).

7. On Wizard screen #5 (sorting), leave default values in all fields and click "Done"

8. As a result, you will get this data grid and line chart:

Please note that pivot tables make sense only for Pie, Bar, or Line charts. Scatter or Bubble charts are based on flat data, so there is no need to perform this pivot table magic.

You have probably noticed that this example does not involve any aggregation: every series/category pair in the flat data points to only one value. If this is not the case, and our flat data samples have to be grouped, aggregate function should be used. But this is a good subject for a separate discussion.

Tuesday, May 10, 2016

Kentucky Derby serial winners

Yesterday I spent some time crunching Wikipedia numbers for Kentucky Derby and playing with the recent version of charte.ca. The idea was to mine for jockeys, trainers and owners who won the race more than once and present this data in a convenient way. Here is what I got.
Click-and-hold on a bar or a legend item to drill down to the details.