Projects

Below is a list of some of the projects I have contributed to.

Language Support for US Elections Link to heading

Section 203(b) of the Voting Rights Act (VRA) requires some places to make voting materials available in languages other than English. The US Census Bureau determines whether a jurisdiction, e.g., county, is subject to such requirements based on the rates of limited English language proficiency and illiteracy within language minority groups (LMGs), in that jurisdiction. Since 2011, Section 203 determinations have been based on estimates from small-area models to compensate for low sample sizes in some areas. I have made several contributions to research that aims to improve these models. Examples include:

The Ranking Project Link to heading

It is common to see listicles ranking cities, states, or countries on some criteria or another. Typically, these rankings are derived from survey data, and do not take into account uncertainty in the ranking of the estimates due to sampling error. The Ranking Project aims to find more effective ways to understand and communicate the uncertainty around such rankings to the public. As a member of the ranking project team, I’ve made contributions to writing backend code for the following research visualizations of the uncertainty in the rankings of US states based on survey data from the American Community Survey (or ACS):

Generalizing Tsao & Wright’s Maximum Ratio Link to heading

Researchers often have multiple estimates of the same unknown quantity, but don’t know whether all of them are good. In response to this concern, Tsao and Wright (1983) proposed a method called the “maximum ratio test” that flags groups of estimates when one is “too far” from the truth. This method requires very few assumptions, but can be difficult to interpret. This project aims to address this problem by generalizing the maximum ratio test to higher dimensional versions that can be used when a parameter and its estimates are vectors, and by proposing useful heuristics that can be used to help interpret the results in practical contexts. This project is heavily inspired by the work done in Wright (2013).

Disclosure Avoidance and Privacy Protection Link to heading

Drawing valid statistical inference based on privacy protected data has been the topic of rigorous research at the US Census Bureau and other statistical agencies. The need to develop appropriate statistical methods based on perturbed or synthetic data rather than the original data stems from the fact that sometimes the original microdata are sensitive and cannot be released due to privacy considerations. What the statistical agencies will release instead is a synthetic version of the original data, hiding any confidential or sensitive parts.

Price Index Research Link to heading

I have contributed to research studying novel price indices and other methods to measure differences in purchasing power by using computer-generated retail scanner data sets to compensate for the absence of sufficiently detailed pricing data in the national accounts.