Effects of Community Structure on

Respondent-Driven Sampling

Abstract: Respondent-driven sampling (RDS) is a recently introduced, and now widely used, technique for estimating disease prevalence in hidden populations. The sample is collected through a form of snowball sampling where current sample members recruit future sample members. We re-interpret respondent-driven sampling as Markov chain Monte Carlo (MCMC) importance sampling, and examine the effects of community structure and recruitment methodology on the variance of RDS estimates. Past work on RDS has assumed that the variance of RDS estimates is primarily affected by segregation between healthy and infected individuals. We examine an illustrative model to show that this network feature, while important, in isolation tends to significantly underestimate the effects of community structure on RDS estimates. We also show that variance is increased by a sample design feature which allows sample members to recruit multiple future sample members. Our observations are further substantiated by network data collected as part of the National Longitudinal Study of Adolescent Health. This is joint work with Matthew Salganik.


Biography: Sharad Goel is a member of the Microeconomics and Social Systems Group at Yahoo! Research. He received his PhD in Applied Mathematics from Cornell University, and has held positions as a research fellow in the math departments of Stanford University and the University of Southern California. Sharad works at the interface of economics, statistics and computer science, and is particularly interested in problems of collective behavior.


Sharad Goel