Mobility Report Scatterplot Matrix

For this project, I have used a Scatterplot Matrix in D3.js (version 5) to visualize data from the "Mobility Report Cards: The Role of Colleges in Intergenerational Mobility" project, written by Raj Chetty, John Friedman, Emmanuel Saez, Nicholas Turner, Danny Yagan, on December 2017.

About Data

The original dataset for this project is the “Baseline Cross-Sectional Estimates of Child and Parent Income Distributions by College” dataset (also referred to as “Online Data Table 2”). The direct links are:



Data Description

This table reports the baseline estimates of parents’ and children’s income distributions by college. The authors calculate college-level values as means over students in the 1980, 1981 and 1982 birth cohorts. When data for a college from any of these cohorts are incomplete, they use data from 1983 and 1984 cohorts to obtain an estimate. See the codebook for Online Data Table 1 for definitions of income variables, college attendance, and other details.

In some cases, multiple colleges share a single identifier in the tax records (e.g., the campuses of the University of Massachusetts or University of Illinois systems). In these cases (denoted by multi = 1 in the dataset), they report averages over all students attending the relevant group of colleges. For these groups of colleges, we assign locations based on the location of the largest school/campus with the group of colleges.

This data was accessed on Feb 26, 2020 and by this date, the authors characterize intergenerational income mobility at each college in the United States using data for over 30 million college students from 1999-2013. The dataset 36 columns. Each column represents different aspects as shown below:

  1. Super Opeid
  2. Name of college (or college group)
  3. Type
  4. Tier
  5. Name of college tier
  6. Iclevel
  7. Region
  8. State
  9. Cz: Commuting zone ID
  10. Czname: Commuting zone name
  11. Cfips: Combined state and county fips code
  12. County
  13. Multi: Indicator that equals 1 if multiple colleges (IPEDS Unit IDs) are grouped in this Super OPEID
  14. Count: Average number of kids per cohort
  15. Female: Fraction female among kids
  16. K Married: Fraction of kids married in 2014
  17. Mr Kq5 Pq1: Mobility rate (joint probability of parents in bottom quintile and child in top quintile of the income distribution)
  18. Mr Ktop1 Pq1: Upper-tail mobility rate (joint probability of parents in bottom quintile and child in top 1% of the income distribution)
  19. Par Mean: Mean parental income
  20. Par Median: Median parent household income (rounded to nearest $100)
  21. Par Rank: Mean parental income rank
  22. Par Q[PARQUINT]: Fraction of parents in an income quintile [PARQUINT]. 1 is the bottom quintile and 5 is the top.
  23. Par Top[PCTILE]pc: Fraction of parents in the top percentile [PCTILE]. For instance, par_toppt1pc refers to parents in the top 0.1% of the income distribution.
  24. K Rank: Mean kid earnings rank
  25. K Mean: Mean kid earnings
  26. K Median: Median child individual earnings in 2014 (rounded to the nearest $100)
  27. K Median Nozero: Median child individual earnings among positive earners in 2014 (rounded to the nearest $100)
  28. K 0inc: Fraction of kids with zero labor earnings
  29. K Q[KIDQUINT]: Fraction of kids in an income quintile [KIDQUINT]. 1 is the bottom quintile and 5 is the top.
  30. K Top[PCTILE]pc: Fraction of kids in the top percentile [PCTILE]. For instance, top1pc refers to children in the top 1% of the income distribution.
  31. K Rank Cond Parq[PARQUINT]: Mean kid earnings rank conditional on parent in quintile [PARQUINT]
  32. Kq[KIDQUINT] Cond Parq[PARQUINT]: Probability of kid in quintile [KIDQUINT] conditional on parent in quintile [PARQUINT]
  33. Ktop1pc Cond Parq[PARQUINT]: Probability of kid in top 1% conditional on parent in quintile [PARQUINT]
  34. K Married Cond Parq[PARQUINT]: Fraction of kids married conditional on parent in quintile [PARQUINT]
  35. Shareimputed: Share of count-weighted data that was imputed using information from the 1983-84 cohorts
  36. Imputed: Indicator if any data for that super_opeid was imputed

This data doesn’t have an official license, but the authors put out publicly available datasets that can be used by other researchers and practitioners to support their own work. Check out this link for more information about the public usage


Write a Comment