In recent years, there has been an explosion in the availability of network data, and a corresponding explosion in studies using such data. Statistical methods for network data have also dramatically increased. Most of these methods, however, assume that the network is fully-observed. In practice, this is often not the case; some network members or ties may not be observed, or the observed network may be only a sampled portion of a larger network. I am a statistician, and most of my work has been aimed at at developing methods for partially-observed network data. I have worked on sampling and missing data issues for classes of statistical network models. Much of my recent work has been aimed at improving inference from data collected through respondent-driven sampling (RDS), a particularly widely-used and challenging form of network sampling. RDS is designed to elicit information from hard-to-reach human populations, often used in high-risk populations such as sex worker and drug users. Sampling begins with a small convenience sample of "seeds." Each participant is then given a small number of uniquely identified coupons to pass to other members of the target population, making them eligible for participation. Sampling proceeds in this manner until the desired sample size is reached. This strategy has been effective at recruiting large diverse samples from challenging populations in which other methods have been unsuccessful. Inference from the resulting samples, however, can be quite challenging. My work involves understanding and elucidating these challenges, and developing improved statistical methods to address them.
Social network analysis