The original dataset for this project is the “Baseline Cross-Sectional Estimates of Child and Parent Income Distributions by College” dataset (also referred to as “Online Data Table 2”). The direct links are:
This table reports the baseline estimates of parents’ and children’s income distributions by college. The authors calculate college-level values as means over students in the 1980, 1981 and 1982 birth cohorts. When data for a college from any of these cohorts are incomplete, they use data from 1983 and 1984 cohorts to obtain an estimate. See the codebook for Online Data Table 1 for definitions of income variables, college attendance, and other details.
In some cases, multiple colleges share a single identifier in the tax records (e.g., the campuses of the University of Massachusetts or University of Illinois systems). In these cases (denoted by multi = 1 in the dataset), they report averages over all students attending the relevant group of colleges. For these groups of colleges, we assign locations based on the location of the largest school/campus with the group of colleges.
This data was accessed on Feb 26, 2020 and by this date, the authors characterize intergenerational income mobility at each college in the United States using data for over 30 million college students from 1999-2013. The dataset 36 columns. Each column represents different aspects as shown below:
- Super Opeid
- Name of college (or college group)
- Name of college tier
- Cz: Commuting zone ID
- Czname: Commuting zone name
- Cfips: Combined state and county fips code
- Multi: Indicator that equals 1 if multiple colleges (IPEDS Unit IDs) are grouped in this Super OPEID
- Count: Average number of kids per cohort
- Female: Fraction female among kids
- K Married: Fraction of kids married in 2014
- Mr Kq5 Pq1: Mobility rate (joint probability of parents in bottom quintile and child in top quintile of the income distribution)
- Mr Ktop1 Pq1: Upper-tail mobility rate (joint probability of parents in bottom quintile and child in top 1% of the income distribution)
- Par Mean: Mean parental income
- Par Median: Median parent household income (rounded to nearest $100)
- Par Rank: Mean parental income rank
- Par Q[PARQUINT]: Fraction of parents in an income quintile [PARQUINT]. 1 is the bottom quintile and 5 is the top.
- Par Top[PCTILE]pc: Fraction of parents in the top percentile [PCTILE]. For instance, par_toppt1pc refers to parents in the top 0.1% of the income distribution.
- K Rank: Mean kid earnings rank
- K Mean: Mean kid earnings
- K Median: Median child individual earnings in 2014 (rounded to the nearest $100)
- K Median Nozero: Median child individual earnings among positive earners in 2014 (rounded to the nearest $100)
- K 0inc: Fraction of kids with zero labor earnings
- K Q[KIDQUINT]: Fraction of kids in an income quintile [KIDQUINT]. 1 is the bottom quintile and 5 is the top.
- K Top[PCTILE]pc: Fraction of kids in the top percentile [PCTILE]. For instance, top1pc refers to children in the top 1% of the income distribution.
- K Rank Cond Parq[PARQUINT]: Mean kid earnings rank conditional on parent in quintile [PARQUINT]
- Kq[KIDQUINT] Cond Parq[PARQUINT]: Probability of kid in quintile [KIDQUINT] conditional on parent in quintile [PARQUINT]
- Ktop1pc Cond Parq[PARQUINT]: Probability of kid in top 1% conditional on parent in quintile [PARQUINT]
- K Married Cond Parq[PARQUINT]: Fraction of kids married conditional on parent in quintile [PARQUINT]
- Shareimputed: Share of count-weighted data that was imputed using information from the 1983-84 cohorts
- Imputed: Indicator if any data for that super_opeid was imputed
This data doesn’t have an official license, but the authors put out publicly available datasets that can be used by other researchers and practitioners to support their own work. Check out this link for more information about the public usage https://opportunityinsights.org/data/