Getting under the hood: Our methodology

How we investigated schools near sources of traffic pollution nationwide



Trucks rumble along the road outside Hawkins Street School in Newark, New Jersey, as a family waits to cross.

Jamie Smith Hopkins / The Center for Public Integrity

We’re all exposed to unhealthy traffic pollutants, but people who spend a lot of time on or very near higher-traffic roads get more. The Center for Public Integrity and Reveal from The Center for Investigative Reporting teamed up to look at the schools across the country that sit within 500 feet of busy roads.

We picked that distance because, in general, studies suggest that the biggest daytime exposures are within the first 500 feet from the road (though some studies have found elevated levels farther out, such as roughly 900 to 1,000 feet). California’s school-siting law, which aims to keep new schools away from freeways and other major routes, uses 500 feet as the area of concern.

The California law focuses on very heavily traveled roads, but there’s no true dividing line between bad and OK. Some studies have found health effects among people near roads with at least 10,000 vehicles a day, which includes routes with a tiny fraction of the traffic on an L.A. freeway. In fact, because steady speeds produce less pollution than acceleration, vehicles on highways that aren’t plagued by stop-by-go congestion are cleaner than they are on lower-speed roads with traffic lights and stop signs. And a road that draws diesel trucks, particularly old trucks, could be worse than a higher-traffic route with only cars.

We tried to account for these complexities with our traffic thresholds. We ended up defining a “busy road” as one with average daily traffic of at least 30,000 vehicles, or 500 or more trucks and at least 10,000 total vehicles.

We used schools data tracked by the National Center for Education Statistics, part of the U.S. Department of Education. It includes latitude and longitude for every school, along with information ranging from the type of school to the demographic details on the student body. The most recent full dataset from the NCES is for the 2014-15 school year.

Our traffic data came from the Federal Highway Administration, which has average daily traffic figures for total vehicles as well as trucks on roads across the country — not just highways, but also local roads. We used 2014 traffic data for every state except Iowa. Highway administration data wasn't available in 2014 for that state, so we used 2015 data instead.

Staffers at both agencies answered a lot of questions for us, from how the school geocoding was done (the NCES tries to put the coordinates on top of a school building whenever possible) to how the FHWA distinguishes trucks from cars (sensors in the roads, manual counts, estimates from the states).

We also received help from numerous academic researchers. People who conducted studies of schools near major routes and shared their expertise include Sergey Grinshpun with the University of Cincinnati, Gregory Wellenius of Brown University and Ryan Allen at Simon Fraser University.

Other academics who offered advice on a wide range of related issues include Julian Marshall and Matthew Bechle at the University of Washington, Steve Hankey at Virginia Tech, Dr. Janet Phoenix at the George Washington University, Nicky Sheats at Thomas Edison State University, Andrea Ferro at Clarkson University, Marc Serre at the University of North Carolina at Chapel Hill, Jonathan Buonocore at Harvard University, Julia Heck at UCLA and Stuart Batterman at the University of Michigan.

Some news organizations have covered this issue in their regions, including InvestigateWest’s excellent Exhausted at School series in Seattle, but we came across none that crunched the data nationally. Here’s why: It’s a headache. You can individually verify that the school locations are accurate and each record in the database is in fact a school when you’re looking at hundreds of sites in a city. You can’t do it one by one when you’re working with a dataset of just over 100,000 entries. 

If a school’s coordinates are off by even a few dozen yards, it could appear to be within 500 feet of a road that it actually isn’t, or farther away than it actually is. The location for each school is the equivalent of the pinpoint on a Google map, rather than the boundaries encompassing the entire property, so there’s not a lot of wiggle room.

The NCES dataset also includes entries that wouldn’t make sense for us to count in a story about K-12 schools educating kids close to traffic: online-only, adult ed, a host of programs that we’re not certain why school districts recorded as schools.

Reveal’s Eric Sagara and the Center’s Jamie Smith Hopkins and Chris Zubak-Skees spent several months verifying the data. Here’s what we did to improve its accuracy:

● We checked a random sample of schools showing up within 500 feet of busy roads and a random sample of schools geocoded a bit farther away, to see whether geocoding issues would lead to over- or undercounting of higher-traffic schools. (Justin Scoggins, a data-verification expert who is data manager at the University of Southern California’s Program for Environmental and Regional Equity, recommended this step.) What this suggested: More than 90 percent of schools that are supposedly within 500 feet of busy roads really are. Meanwhile, schools that are closer to those roads than they appear — that is, they seem to be more than 500 feet away but are actually less than 500 feet — outnumber the schools that are farther than they appear. That gave us confidence that we’re not overstating the problem.

● All told, we eyeballed the locations of hundreds of schools, which allowed us to make fixes where necessary and gave us an understanding of the issues on the ground. When adjusting a school’s coordinates, we put them on a building rather than, say, the playground, to be consistent with what NCES tries to do.

● Sometimes NCES is better at locating a school, and sometimes Google is. By comparing locations with the California School Campus Database, which provides mostly-accurate school boundaries in that state, we found that using Google’s geocoding service to locate a school’s address, and then using Google’s coordinates when those were available with so-called rooftop accuracy, improved the location accuracy for many schools. That's what we ultimately did for the entire country. (The Center’s Zubak-Skees, who worked through this issue, also conducted the geospatial analysis of schools and roads in the first place to determine what’s close to what.)

● We set to work figuring out which schools (and non-schools masquerading as schools) should not be counted. Online-only schools are supposed to flag themselves as such, but some don’t, so we ultimately excluded schools with “online,” “virtual” and “distance” in their names in addition to those that properly identified themselves as not teaching kids on site. Also kicked out: pre-K-only sites, adult-education sites, schools flagged as “future” or “closed” or “inactive,” locations with “program” in their names (other than a handful that our verification efforts showed really were schools), homeschool-support sites and homebound programs for ill students. We also didn’t count schools with fewer than 20 total students — smaller than the average size of a single classroom — as a way of further weeding out sites that really aren’t schools at all.

● It’s not unusual for districts to build several schools on the same property, but we were concerned that some of those clusters might not accurately reflect where the schools are located. We checked larger clusters across the country to verify whether the schools are there, as well as whether the coordinates reflect where on the property they sit. We cast a particularly close eye on clusters whose addresses matched their district headquarters address.

We didn’t exclude schools for not fully filling out their demographic data — giving the number of students in certain racial categories (say, white and black) but not the number of students in others (say, Pacific Islander). NCES staffers told us that it should be safe to consider these missing data points as “zero.” They don’t have a reason to believe there’s something fundamentally wrong with the numbers reported for those schools that would require invalidating them.

Our checks eliminated a little over 10,000 schools from our tally, bringing the total to roughly 90,000. And you know what? After all our efforts, the trends we found were the same ones that popped up with the raw data. Comforting and annoying.

Reveal’s Sagara then conducted a regression analysis to get a better understanding of what makes a school more likely to be near a busy road. Bottom line: Being in a big city. That might seem obvious, but there are plenty of schools near substantial traffic that aren’t in big cities, so this analysis was important for zeroing in on the key reason that predominantly minority schools are near these roads at a markedly higher rate than predominately white schools. (Why people live where they do, and how much traffic they’re exposed to, continues to be influenced by decades-old decisions about which neighborhoods to lend in and which to cut through when building major routes, as our story describes.)

If you’re wondering whether your child’s school falls within 500 feet of a busy road, check out our interactive data tool. You can enter any address, school or not, and see if it’s by a road that meets our traffic threshold.

Care about freedom of the press? Support independent investigative journalism.

Donate now
Donate now