Spreadsheets and the environment - intermediate level

Analyzing toxic sites

A tipsheet by David Poulson, associate director,
Knight Center for Environmental Journalism

Download a printable version of this tipsheet

Computer-assisted reporting yields great stories on the environment beat.  Regulators and others record lots of environmental data for enforcement or to monitor trends.  Here is an example of how CAR can sharpen environmental reporting:

Go to www.michigan.gov/deq, the Web site for the Michigan Department of Environmental Quality.  Click “online services” at the top, and then “Part 201 Site Search” from the list.

This interface searches for information on Michigan sites of environmental contamination. Part 201 refers to a section of state law requiring that these sites be listed.

Put in the name of a county or city and examine the result. That’s pretty handy!  But it’s also a bit limiting. What if you wanted to know which site is the greatest hazard in the state or how many sites are in Wayne County?  For those kinds of questions you need the data behind the interface.  Often in such situations you need to file a FOIA to get the data. In this case, you can grab it all online.

Click on the link that says “Download the Part 201 Site List in MS Excel” and save it to your desktop.  Now open the file in Excel. Notice that the column headers provide the same information that your online search produced. Let’s clean it up a bit before doing some analysis.

Notice the “score” column. The higher the score, the greater the risk. A 48 is the highest score possible. Let’s remove “out of 48” from each cell.  Hit control f to produce the find and replace box and click the replace tab. After “Find what:” insert “out of 48” without the quote marks. Leave the “Replace with:” box blank. Click on “replace all.”

For this exercise we don’t need the geographic information contained in columns K through O. Highlight them by putting the cursor on the letter K and dragging across to the letter O. Now click edit/delete.

Notice that the columns are often too narrow for the information they contain.  If you want to see more, go to the line separating the letters. Click and drag to widen it, or double click and the column will expand to accommodate the widest cell in that column.

Scroll down the records with the scroll bar on the right. How many sites are listed? Don’t forget to subtract for the first row that contains your column headers.

Now let’s figure the average score of Michigan’s toxic sites. Type the word “average” in a cell in column H that is two spaces below the data.  In the cell to the right, type this formula: =average(I2:I2946).  I2946 is the last cell in this example. Your last cell may be different. If so, insert that number instead.

Hit enter and Excel will average all the numbers in the I column.  Doesn’t that beat entering nearly 3,000 numbers into a calculator?

On the next line figure the median – the middle value – with a formula that says =median(I2:I2946).

Now we’ll sort the list so that we can figure out the location of the sites with the highest scores.  Click anywhere in the main data block. Simultaneously hit shift, control, asterisk to highlight all of the contiguous cells.  If a data point is separated from the block, it won’t be highlighted.  In this case we highlighted all of the information in the spreadsheet except the average and median.

Now click data at the top of the sheet and go to sort.

We want to sort by score. Make sure header row is checked, and select score from the drop down box. We want to know the highest score first, so click descending and hit OK.

The whole database is now sorted from highest to lowest score. If you had sorted by county or by site name, Excel would have sorted the database by reverse (descending) alphabetical order by those columns.

Find the local angle

This is good stuff, but it doesn’t tell us anything about what’s happening locally.  (This example assumes your readership is in Ingham, Clinton and Eaton counties; you’ll want to select counties relevant to your own publication.)

Go to data, but this time click filter and then auto filter.

Arrows appear to the right of each column header.  Click the one next to county and select custom.

We want to filter records where county equals Ingham or county equals Clinton.  When the filter box appears, insert those names and make sure that the “or” (not the “and”) is selected. Hit OK and now you’ve got a database of records affiliated with just Ingham and Clinton counties.

Let’s save it for analysis.  With the new spreadsheet highlighted (shift, control, asterisk), hit edit/copy (or control c.) Now go to insert at the top of the sheet and select worksheet.  

When it appears, click on square A1 and then edit/paste to paste your data. 

At the bottom of the spreadsheet are two tabs. One has the original spreadsheet (miSites) and the other has the one you just created.  Double click on the tab you just created (sheet 1) and change the name to “local sites.”

Click on the miSites tab. This time we’ll retrieve the Eaton County data. Click on the drop down arrow next to county.  But instead of custom, just select Eaton from the list.

Select the newly filtered data (but not the column names) by clicking and dragging.  Copy this data (edit/copy or control c) and move it into the “Local sites” spreadsheet by pasting (edit/paste or control v) it into the first vacant cell in the first column – just below the data that’s already there.

If you brought the column headers with you, it’s no big deal. Get rid of them by clicking the number of the row that contains them. Go to edit/delete and they’ll disappear.

The payoff

Now you’ve got a local subset of the state database.  Sort to find the sites with the highest scores. What are the average and median local scores? How do those scores compare with the statewide figures?

How about sorting by “score date” to find out when the last time some of your local sites were scored.  How come it’s been so long?

Answers to such questions are a good start for a local story with statewide context.  At a minimum, you now know something that you’ll want to talk to the DEQ about.

And if you do a similar analysis annually, you can report whether new sites were added or old sites were cleaned up or improved during the previous year.

Further analysis

Go back to the sheet with all the sites (MiSites) to figure out how many are contaminated with toluene. Highlight the data (shift, control, asterisk) and turn the filter back on by going to data/filter/auto filter.

Click on the arrow next to pollutant.  Select custom filter.  Fill in the blanks to search for records where pollutant contains toluene. You could do this for any pollutant.

Now let’s find out if any of the sites are owned by Michigan State University.  Go to data/filter/show all.

Highlight your data and go to data/filter.  Click the arrow next to name and select custom. Filter for records that contain Michigan State University OR MSU.

Notice that the source of two of the MSU sites is listed as “Colleges & Universities.”

By refiltering the data for sources that are “College & Universities” you could find contaminated sites affiliated with the state’s institutions of higher education. What other categories of ownership might you investigate?

Looking more deeply with pivot tables

You can analyze a database even more deeply with spreadsheets by using pivot tables.  Let’s figure out which county has the most sites.

Go to the miSites sheet and click data/filter to click the auto filter off.  Highlight all of your data. Go to Data and click on “PivotTable and PivotChart report”

We’re analyzing an Excel document so click next to accept the default.

Click next again to verify the range of data. It should be right because you already highlighted it.

Now click on layout.

Visualize the perfect chart for listing the number of sites. Each county would list the total sites in that county.  From the buttons on the right, click and drag county name into the row field. Now click and drag county name into the data area. It will change to “count of county name.”  Click OK.

Click Finish to create a new worksheet.

Now you’ve got a list of counties and a count of the number of sites in each one.

To put them in order, click on the first number under total. Go to data/sort and click descending in the box that appears.

Click OK and the number of sites should be in descending order. (A short cut is to click the button that says Z to A). Note that this data is now under a new tab of your spreadsheet.

Go to your local sheet and, using the same technique, figure out which cities in your readership area have the most sites.

Back to Spreadsheets and the environment

Back to top

line