Wednesday, August 31, 2016

Importing CSV/TSV data: COUNT aggregate function

Refresh your data journalism skills. Let's look at San Francisco police dept incident database for the years 2003-2013 available here: data.sfgov.org. For convenience, let's use a snapshot of it stored as Google spreadsheet. It contains about 15000 records, one record per incident.

Totals
Let's create a Pie chart with totals for each crime category.

1. Login to charte.ca and create a new Pie chart.

2. In the Data panel, click "Import" (you can find it right under the data grid).

3. On wizard screen #1, paste data from the spreadsheet (we will re-use this data in all examples):


4. On Wizard screen #2, select series (crime category), category(NONE) and value (NONE) columns:


5. On Wizard screen #3 (filtering), leave all fields blank - we do not need any filtering.

6. On wizard screen #4, keep aggregate function selection at COUNT, so data import routine simply counts correspondent records.

7. On Wizard screen #5 (sorting), leave default values in all fields and click "Done"

8. As a result, you will get this data grid and chart:


Totals by year
Naturally, now we are curious how crime picture progressed through the years. Initial steps are the same, but the chart type is Line now.



1. On Wizard screen #2, select series (crime category), category(year) and value (NONE) columns:


2. On Wizard screen #3 (filtering), leave all fields blank - we do not need any filtering.

3. Click through wizards screens #4 and #5 and get this data grid and chart:

Seasonal drinking
It would be nice to see if there is a seasonal pattern in DRUNKINNESS offense.

1. On Wizard screen #2, select series (year), category(month) and value (NONE) columns:


2. On Wizard screen #3 (filtering), select crime category - DRUNKINNESS.


3. Click through wizards screens #4 and #5 and get this data grid and chart:


Apparently, we did not discover any really interesting patterns here. No spikes on big holiday months. Maybe just a crackdown on drunkards in May-Oct 2013.

Incident resolution
Finally, it would be nice to learn how SF police deals with the offenders. Let's add incident resolution to the picture.

1. On Wizard screen #2, select series (resolution), category(year) and value (NONE) columns:


2. On Wizard screen #3 (filtering), select crime category - DRUNKINNESS (apparently, this type of offense is the focus of our little research) and a few types of resolution that are common for this type of offence: "ARREST,BOOKED", "NONE", or "ARREST, CITED".



3. Click through wizards screens #4 and #5 and get this data grid and chart:


Apparently, during the 2015 crackdown, police officers were instructed to take serious offenders into custody, give warnings to peaceful random folks, and use less citations.

As you see, in all these examples we used only one kind of aggregate functions - COUNT. Read about AVG and SUM functions here.

No comments:

Post a Comment