Housing Affordability Methodology, 2020: A Bayesian Approach to Predicting Incomes

Chris Glynn
·4 min read

Past attempts at quantifying housing affordability suffered problems of both timeliness and specificity: Current housing costs were divided by literally years-old income data; and broad, median household income data was aggregated across two groups — homeowners and renters — with very different income profiles.

Traditional housing affordability measures based on income data from the American Community Survey (ACS) do not provide clear insight into the current state of affordability because comprehensive ACS data are published at a lag: Data for 2019, for example, is only just now being published, in late 2020. And while current income data is available through the Bureau of Labor Statistics (BLS), it is also aggregated data and does not discriminate between renter household income and homeowner household income. While BLS data do provide significant detail on industry specific incomes, industry-specific income data presents its own set of challenges for estimating renter and homeowner incomes separately, since the mix of renters and homeowners in each industry may vary significantly across different metropolitan areas.

Zillow's new statistical models aim to address these challenges by predicting renter and homeowner incomes forward from their last published observation, and applying them to current housing costs. We publish this housing affordability data — measured as the share of a typical renters' income spent on typical area rent, and the typical homeowners' income spent on a mortgage for the typical local home — for 100 metropolitan areas and the United States as a whole.

Our models include ACS data, the Zillow Observed Rent Index (ZORI), the Zillow Home Value Index (ZHVI) and earnings data from the Bureau of Labor Statistics. The final output (in this example, for renters) is based on this simple equation (with the numerator for homeowners substituted with observed Zillow Home Value Index for a given month):

The estimated monthly rent in the numerator is a weighted average of the reported contract rent in ACS microdata and Zillow's Observed Rent Index (ZORI). The weights are the percentage of renters who reported moving in the last 12 months (reported via the MIGRATE1 variable in ACS microdata). Our estimated monthly rent is:

Our goal is to compute the most timely measure of rent paid by the typical renter in each metro, and we include both ZORI and ACS data to leverage the strengths of different sources of information.

We predict ACS renter incomes in months where it is not directly observed using a Bayesian statistical model — both in months between the annual releases from 2014 to 2018 and months since the last release in 2018. Our Bayesian model provides the statistical machinery to incorporate multiple sources of information and update predictions as new data is released. Our model depends on observed ACS renter income and observed BLS income data. From a statistical standpoint, our main objective is to compute the posterior predictive distribution:


  • is monthly renter income from January of 2014 to September of 2020.

  • is the subsequence of renter incomes corresponding to ACS observations reported annually for 2014 – 2018

  • is the observed BLS monthly earnings data.

We model , the BLS reported income, as the sum of the renter income latent variable, and latent factor , the metro-specific deviation in general earnings from renter earnings. Latent factor follows a dynamic linear model with a second-order polynomial structure:

The model for latent renter income has three components:

  1. intercept , the expected renter income in December 2013

  2. global trend , the expected monthly increase in renter incomes

  3. a flexible function that models nonlinear deviations in renter income from the linear regression +


Function follows a Gaussian Process prior distribution and has a very important property for this application: the uncertainty intervals in predicted renter incomes widen as the time elapsed from the last observed renter income increases. Overall, this latent renter income model is:

Model parameters are estimated with a custom Markov chain Monte Carlo algorithm, and we compute full predictive probability distributions of renter incomes and rent affordability rather than single point estimates. Given that these estimates are statistical in nature, we believe quantifying the uncertainty in our affordability calculations is an important part of the analysis, and predictive intervals provide additional context to the state of housing affordability in the United States.

We follow a similar process to calculate homeowners' incomes based on ACS and BLS data. Monthly costs for homeowners assume the buyer is purchasing a home valued at the typical level for a given metro (as measured by the Zillow Home Value Index), is putting down 20%, and is assuming a 30-year, fixed-rate mortgage at prevailing rates (as measured by Freddie Mac). We also make adjustments for local taxes and homeowner insurance costs.

The post Housing Affordability Methodology, 2020: A Bayesian Approach to Predicting Incomes appeared first on Zillow Research.