|
|
|
|

Using Spreadsheet Pivot Tables to Analyze the
U.S. EPA Toxic Release Inventory
A tipsheet by Russ Clemings, Fresno Bee database reporter
(This article first appeared in the SEJournal)
Download a printable version of this tipsheet
Spreadsheet programs like Microsoft Excel have lots of tools to help you organize data and find patterns. One of the most versatile is the cross-tab, or what Excel calls a "pivot table." With a pivot table, you can quickly compute sums, counts, averages and other characteristics for two or more categories of data at once. This article will show how, using data from the U.S. EPA Toxic Release Inventory. It is aimed at users who are already familiar with Excel basics.
The data used here is for California and can be downloaded
There are almost 200 columns and 5,000 rows in the resulting spreadsheet, and making sense of them in their raw state is almost impossible. But a pivot table can make things easier. This exercise uses Excel 2000; the steps may differ slightly in other versions. Check the "help" for details.
To start creating a pivot table, define the entire block of data from cell A1 to cell HA4808. Then go to the menu bar at the top of the screen and select "data" followed by "pivot table and pivot chart report." Click next on steps 1 and 2, then "finish" on step 3. The program then creates a new sheet with a dialog labelled "PivotTable" and some cryptic instructions about where to "drop" various fields.
For this exercise, we'll keep things simple. The "PivotTable" dialog allows you to scroll through a list of fields to use in your table. Let your mouse hover on the abbreviated versions and the full names should appear. We're going to summarize fugitive air emissions—chemicals that escape to the air via routes other than a stack—by county.

First, click on the "facility county" field—it should be in the center of the third row—and drag it into the area labelled "drop column fields here." Then, scroll down to the "chemical name" field, which should be in the center of the 10th row, and drag it to the "drop row fields here" region.
We've now told the computer to summarize the data by chemical and county, but we have not yet told it where the data—the amount of emissions—is located. So scroll down until you find the field "fugitive air emissions—total release pounds" and drag it over to "drop data items here."
As soon as you do that, your previously empty pivot table is populated with data. But it's still huge—more than 200 rows deep and 60 wide, with one column for each county in California. (Well, not really. About a dozen are missing. Is that a story? Maybe.) But let's say you're really interested only in six central California counties—Fresno, Kings, Madera, Mariposa, Merced and Tulare. You can hide the rest by clicking on the little down-arrow next to the "facility county" field name and unchecking all of the counties you don't want.

Now you've got a useful starting point to explore the data. Take a look at cell A3. It tells you what the numbers on your chart represent. If it says "count of fugitive air emissions—total release pounds," then it's telling you how many sources released each chemical. If you'd rather know the total amounts of those releases, then right-click on the cell, pick "field settings" from the resulting menu, and click on "sum" in the "summarize by" box. Then click OK.
Experiment with the "field settings" dialog and you'll see other options. "Average" tells you the average (mean) emissions for all sources of each chemical. "Max" gives you the amount released by the largest single source.
One thing that you might want to do is find the chemical with the largest total emissions in a given county. You can do that by sorting the pivot-table data. First, use the "field settings" dialog to make sure you're displaying the sum of emissions. Then click anywhere in the data area of the column you want to sort and select "data/sort/descending" on the menu bar (or just click the little icon that has a "z" on top of an "a" next to an arrow). If you clicked in the "Fresno" column, the table should now say that ammonia is the most heavily released chemical, followed by styrene.
Which sources are releasing these chemicals? That question can be answered by double-clicking on the number you're interested in—the cell, in other words, where the chemical name and county name intersect. For styrene in Fresno County, this should be cell B6, and the amount listed should be "20174" pounds. Double click there and a new sheet will open with all of the data for the sources—there are two—of that chemical.
Keep in mind here that some pollution sources may not appear in the TRI inventory. For example, by far the largest sources of ammonia in central California are dairy farms, which release ammonia from cattle wastes, and farms that use ammonia for fertilizer. Neither appears here.
A different technique lets you see just selected parts of the hidden data for each chemical. Go back to your pivot table and double-click on the chemical name. A dialog appears that lists all of the available fields.
Let's say you just want the names of the sources for styrene, and you don't care about the rest of the data. Double-click on the word "styrene" and pick "facility name" from the resulting dialog. Click "OK" and a new column appears with the name of each source for all of the counties in your table. To hide that column and move on, just right-click on the "facility name" label (in cell B4) and select "hide."
This lesson has just scratched the surface of what you can do with pivot tables. For more, see the Excel help topic "About PivotTable reports: interactive data analysis."
Back to Spreadsheets and the environment
Back to top