Methodology for 'Cracking the Codes'

A glimpse into the data analysis for the "Cracking the Codes" investigation



For this series, the Center for Public Integrity and Palantir Technologies analyzed Medicare claims data obtained from the Centers for Medicare and Medicaid Services (CMS).

For privacy purposes and other reasons, the Center was limited to a 5 percent sample of national Medicare Part B data that contain claims for medical procedures, such as doctor office visits and emergency room procedures, and used mainly by researchers and consultants. Over and above the limitations of sampling, the data have only the quarter in which a procedure was performed, not actual dates. And a permanent federal injunction against the Department of Health and Human Services prevents data users from naming individual doctors who received payment for the claims. Some physicians subsequently contacted by the Center agreed to discuss their billing practices.

For the upcoding analysis, the Center and Palantir used a subset of the data submitted by physicians, hospitals and clinics from 1999 to 2008, the last year available at the time the data were acquired. The year 2002 was not included in the data, and any results for that year are imputed based on averaging 2001 and 2003 data. In addition, the Center and Palantir used CMS formulas for facility fees and co-payments, as CMS publishes formulas and modifier values to determine reimbursement amounts. Finally, Medicare Utilization reports published by CMS were used to look at specific billing codes for 2009 and 2010.

To calculate the possible taxpayer costs to upcoding, the Center and Palantir analyzed 14 sets of Current Procedure Terminology Evaluation and Management (E and M) codes published by the American Medical Association and used by most providers when filing their claims. Within each set are three to five billing codes requiring varying levels of Medicare reimbursement, based on the complexity of the treatment and the time spent by the doctor. We focused on a set of 84 million claims from office visits for established patients and five million emergency department visits in which  E and M codes were billed, as well as 12 other E and M categories. Denied claims were excluded from the analysis.

From those data subsets, we calculated costs from 2001 through 2008 for each code and compared trends within each of the 14 E and M groups. Data from 2009 and 2010 for some E and M code groups were added from the utilization reports. Using 2001 as a baseline, a percentage for each code from the total billing in each group was calculated, giving a decade-long trend line for a code in comparison with the other codes in its group. Then the 2001 ratio was applied to each subsequent year and dollar amounts adjusted for inflation. This allowed for comparisons of the actual trends to hypothetical trends if 2001 ratios had remained constant. The difference between the actual inflation-adjusted dollar amounts and the 2001-based projected dollar amounts were summed.

To look at trends in age among Medicare patients, the age at the time of a claim was averaged over geography, hospital or E and M code as needed. The CMS data only provided age ranges — under 65, 65-69, 70-74, 75-79, 80-84, and over 85 — in order to protect patient privacy. The under-65 age group typically represents exceptionally sick individuals with end-stage renal disease and was excluded from the analysis; the median values of the remaining age buckets (67, 72, 77, 82, and 87 for those over 85) were used to calculate the average age.

A geographical analysis revealed the nationwide trend of higher E and M billing. Claims were grouped by county and state, according to the beneficiary’s residence and visualized with heat maps to show geospatial and temporal trends of billing codes. A heat scale was applied with light red indicating a low percentage and a dark red indicating a high percentage of claims billed at the highest two codes for office visits emergency department visits.

In addition to the nationwide trends, hospitals, physicians, and counties with especially high rates of billing for the most expensive codes were examined in detail. E and M claims were aggregated by hospital, physician, or county, excluding those buckets that fell below a threshold for the minimum number of claims per year (50 claims per year for physicians, 100 for counties, and 100 for hospitals). Physicians who billed 50 percent, 75 percent, 90 percent, or 100 percent of claims at the highest two codes for a given year were analyzed for patterns of geography and specialty. Billing information was integrated with hospital affiliation, ownership, and electronic health-record use information to analyze patterns of billing within group practices and hospital chains.

Results from the 5 percent sample were multiplied by 20 to give a national scope to analyzed trends, an accepted survey research technique. However, even with a sample this large, it is impossible to account for all types of errors in the data. This means all calculations are estimates and rounded and must be considered imprecise. The Center and Palantir used accepted rounding practices. For analysis about specific doctors and some of their coding practices — not billing totals — sums were not multiplied by 20 and reported only as in the sample. When faced with a potential range of costs, we chose the smallest amount to keep estimates conservative. And dollar amounts were adjusted for inflation to prevent over-estimation so that the rising costs were indexed to 2001, the base year in the analysis.

Find our content interesting and worth supporting?

Donate to The Center for Public Integrity.

Donate now
Donate now