Computational Social Science Institute

University of Massachusetts Amherst

Lecture series generously sponsored by Yahoo!

Videos of some CSSI seminars are available  here

Fall 2011

Krista Gile

Krista J. Gile, UMass Amherst

Inference from Partially-Observed Social Network Data
Friday, September 23, 2011 • 12:30PM–2PM • Lunch provided
Computer Science Building, Room 150/151

Abstract:Human populations are often connected by social networks of relations. Such social networks may either be of direct interest to researchers, or useful in designing sampling strategies through which to sample population members. Most existing strategies for statistical inference focus on cases where the full social network is observed.

In this talk, we present three cases in which the social network is only partially observed. In the first, inference is focused on understanding the social connections of adolescents in a high school, in a setting in which data are missing because some adolescents did not complete the survey. In the second, inference is focused on estimating features of disease transmission over a network of potentially disease-transmitting contacts, observed through a public health strategy known as 'contact tracing,' an intervention based on following high-risk links from known infected persons. In the final example, inference is focused on estimating disease prevalence, the summary of an individual-level characteristic, from a variant of link-tracing network sampling known as 'Respondent-Driven Sampling.'

Bio:Krista J. Gile is Assistant Professor of Statistics in the Department of Mathematics and Statistics at UMass, Amherst. She holds a PhD in Statistics from the University of Washington. Before coming to UMass, she was a Postdoctoral Prize Research Fellow at Nuffield College, Oxford. Her research focuses on developing statistical methodology for social and behavioral science research, particularly related to making inference from partially-observed social network structures. Most of her current work is focused on understanding the strengths and limitations of data sampled with link-tracing designs such as snowball sampling, contact tracing, and respondent-driven sampling.

The Small World of Al Capone: Power and Influence in Embedded Social Networks [video]
Friday, September 30, 2011 • 12:30PM–2PM • Lunch provided
Computer Science Building, Room 150/151

Abstract:Chicago’s Prohibition era syndicate represents one of the most studied criminal enterprises, and at its core is the mythical Al Capone—the organizational maestro of a massive criminal network that permeated the legitimate and political worlds. To date, most of this research has relied on historical and cultural analyses all of which have reinforced the structural and cultural importance of Al Capone and his associates. This paper re-examines the Capone era mob through a new analytical lens—social network analysis. Using a unique relational dataset created by coding more than 3,000 pages of primary documents, this paper examines the precise ways in which the criminal networks associated with Al Capone overlapped with political and union networks. These new data and the use of social network analysis mark this study as perhaps the first to (quite literally) map the small world of organized crime in Chicago and, in so doing, offer a structural analysis of the expansiveness of criminal, political, and union networks. The findings reveal a series of overlapping social networks of more than 1,200 individuals with more than 3,500 ties among and between them. The majority of these ties and individuals are in a single large network with only a hand full of individuals (less than 100) acting as links between the criminal, political, and union worlds. Findings compare the “structural signatures”—i.e., the various network permutations—of organized crime figures that have been deemed “important” from more traditional historical and cultural analyses with those deemed important from structural analysis. The implications of this research for our understanding of Al Capone as well as its relevance for the study of organized crime are also discussed.

Bio:Andrew Papachristos is a Robert Wood Johnson Health & Society Scholar at Harvard and an Assistant Professor of Sociology at the University of Massachusetts, Amherst. He is a researcher and policy analyst of urban neighborhoods, street gangs, violent crime, gun violence, and social networks. His writing has appeared in Foreign Policy, The American Journal of Sociology, Criminology & Public Policy, and several edited volumes and peer-reviewed journals. Papachristos received his Ph.D. from the University of Chicago.

Stratification in the Early Stages of Mate Choice [video]
Friday, October 7, 2011 • 12:30PM–2PM • Lunch provided
Computer Science Building, Room 150/151

Abstract:Sociologists have long studied mate choice patterns to understand the shape of stratification systems. Romantic pairing involves intimacy and trust, and is therefore a prime indicator of the extent to which members of different social groupings (race/ethnicity, social class, education, religion) accept each other as social equals. The majority of this literature focuses on marriage, given the commitment marriage implies and the availability of nationally-representative data. In my dissertation, I examine the opposite end of the relationship spectrum: The initial screening and sorting process whereby strangers consider each other as potential mates; express interest in some subset of this population but not others; and find that this interest is or is not reciprocated. This beginning stage in mate choice is particularly important for our understanding of social boundaries because personality factors are likely to matter less and social characteristics to matter more. Yet because these initial forays into relationships are typically unobserved, we know very little about whom people consider as potential mates in the first place. I ask the following questions, corresponding to three empirical chapters: First, how do individuals from different status backgrounds vary in the types of strategies that they pursue and the degree of success that they achieve? Second, what underlying dynamics of homophily, competition, and gender asymmetry give rise to observed patterns of interaction, and under what circumstances do some of these boundaries break down? Third, how do strategies as well as preferences vary at different stages of selection, and at what point is homogeneity created? To answer these questions, I use detailed longitudinal data from a popular online dating site. These data are particularly useful for the study of social inequality not only due to the unique quantity and nature of information that is available, but also because online dating has become one of the primary ways that singles meet and marry today.

Bio:Kevin is a Ph.D. Candidate in the Department of Sociology at Harvard University and a fellow at the Berkman Center for Internet & Society. Over the past several years he has overseen the development of a new cultural, multiplex, and longitudinal social network dataset using data from Facebook. This dataset has given rise to a number of collaborative projects exploring the intersection of social networks, cultural tastes (with Jason Kaufman and Marco Gonzalez), race/ethnicity (with Andreas Wimmer), and online privacy. Other current projects include a comparative study of culture in action in the context of contemporary tattooing; an analysis of reciprocity and dominance in a gang homicide network (with Andrew Papachristos); and an exploration of the "structure of activism" based on the Save Darfur campaign (with Jens Meierhenrich). His dissertation examines stratification in the early stages of mate choice using data from a popular online dating site.

Sinan Aral

Sinan Aral, New York University

Content and Causality in Influence Networks
Friday, October 14, 2011 • 12:30PM–2PM • Lunch provided
Computer Science Building, Room 150/151

Abstract:Many of us are interested in whether "networks matter." Whether in the spread of disease, the diffusion of information, the propagation of social contagions, the effectiveness of viral marketing, or the magnitude of peer effects in a variety of settings, two key questions must be answered before we can understand whether networks matter: 1) how the content that flows through networks affects the patterns of outcomes we see across nodes and 2) whether the statistical relationships we observe can be interpreted causally. Sinan will review what we know and where research might go with respect to content and causality in networks. He will provide two examples from each area to structure the discussion: One from an analysis of email networks and the information content that flows through them at a mid-sized executive recruiting firm and the other from a randomized field experiment on a popular social networking website that tests the effectiveness of "viral product design" strategies in creating peer influence and social contagion among the 1.4 million friends of 9,687 experimental users.

Bio:Sinan Aral is an Assistant Professor and Microsoft Faculty Fellow at the NYU Stern School of Business. His research focuses on social contagion and measuring and managing how information diffusion in massive social networks affects information worker productivity, consumer demand and viral marketing. This research has won numerous awards including the Microsoft Faculty Fellowship (2010), the PopTech Science and Public Leaders Fellowship (2010), an NSF Early Career Development (CAREER) Award (2009), the Best Overall Paper Award at the International Conference on Information Systems (ICIS) (in both 2006 and 2008), the ICIS Best Paper in IT Economics Award (2006), the ICIS Best Paper in IT Business Value Research Award (2006), the ACM SIGMIS Best Dissertation Award (2007), and the IBM Faculty Award (2009). Sinan has been a Fulbright Scholar, serves on the board of SocialAmp, a social commerce startup that enables targeting and peer referral in social media networks, and is currently an organizer of the Workshop on Information in Networks (WIN): . His work has been published in leading journals such as the American Journal of Sociology, IEEE Intelligent Systems, Information Systems Research, Management Science, Marketing Science, the Proceedings of the National Academy of Sciences (PNAS), Science, Organization Science, the Harvard Business Review and the Sloan Management Review, and been mentioned in popular press outlets such as the Economist, the New York Times, Businessweek, Wired and CIO Magazine. Sinan is a Phi Beta Kappa graduate of Northwestern University and holds masters degrees from the London School of Economics and Harvard University. He received his PhD from MIT. You can follow him on Twitter @sinanaral.

Web as a Laboratory for Studying Humanity [video]
Friday, October 21, 2011 • 12:30PM–2PM • Lunch provided
Computer Science Building, Room 150/151

Abstract:With an increasing amount of social interaction taking place in on-line settings, we are accumulating massive amounts of data about phenomena that were once essentially invisible to us: the collective behavior and social interactions of hundreds of millions of people. Analyzing this massive data computationally offers enormous potential both to address long-standing scientific questions, and also to harness and inform the design of future social computing applications: What are emerging ideas and trends? How is information being created, how it flows and mutates as it is passed from a node to node like an epidemic? How will a community or a social network evolve in the future? We discuss how computational perspective can be applied to questions involving structure of online networks and the dynamics of information flows through such networks, including analysis of massive data as well as mathematical models that seek to abstract some of the underlying phenomena.

Bio:Jure Leskovec ( is an assistant professor of Computer Science at Stanford University where he is a member of the Info Lab and the AI Lab. His research focuses on mining and modeling large social and information networks, their evolution, and diffusion of information and influence over them. Problems he investigates are motivated by large scale data, the Web and on-line media. He received six best paper awards, a ACM KDD dissertation award, Microsoft Research Faculty Fellowship and appeared on IEEE Intelligent Systems magazine "AI's 10 to Watch". Jure also holds three patents. Before joining Stanford Jure spent a year as a postdoctoral researcher at Cornell University. He completed his Ph.D. in computer science at Carnegie Mellon University in 2008. Jure has authored the Stanford Network Analysis Platform (SNAP), a general purpose network analysis and graph mining library that easily scales to massive networks with hundreds of millions of nodes, and billions of edges.

Benjamin Marlin

Benjamin Marlin, UMass Amherst

Recommender Systems and the Missing at Random Assumption: Why Big Data Doesn't Always Equal Good Science
Friday, October 28, 2011 • 12:30PM–2PM • Lunch provided
Computer Science Building, Room 150

Abstract:In a recommender system, the members of an on-line community rate items like books or movies. The goal of the system is to leverage this preference data to make personalized recommendations to individual community members. The study of recommender systems has many similarities to branches of computational social science including social network analysis and involves many of the same basic components including statistical modeling and the analysis of "big data". This talk will focus on how the natural interaction between people and recommender systems can result in big data sets that invalidate the fundamental assumptions underlying the treatment of missing values in standard statistical models (the missing at random assumption). I will present an experiment conducted in conjunction with Yahoo! Music to collect ratings for randomly selected items and describe how we leverage this small but highly focused data set to study the impact of non-random missing data on the performance of statistical rating prediction models

Bio:Benjamin Marlin is an assistant professor in the Department of Computer Science at the University of Massachusetts Amherst. He was previously a fellow of both the Pacific Institute for the Mathematical Sciences and the Killam Trusts at the University of British Columbia where he was based in the Laboratory for Computational Intelligence in the Department of Computer Science. Benjamin completed his PhD in machine learning in the Department of Computer Science at the University of Toronto.

Community Detection in Multislice Networks [video]
Friday, November 4, 2011 • 12:30PM–2PM • Lunch provided
Computer Science Building, Room 150/151

Abstract:Network science is an interdisciplinary endeavor with methods and applications drawn from across the natural, social, and information sciences. A prominent problem in network science is the algorithmic detection of tightly connected groups of nodes known as communities. While community detection has been used successfully in a number of applications, its use has been largely limited to the study of single, static networks. Through the study of dynamical processes on networks, we developed a generalized framework of quality functions for partitions that allows for the study of community structure in multislice networks, which are combinations of networks coupled by identification of each node in one network slice to itself in other slices. This framework allows studies of community structure in a general setting encompassing networks that change over time, have multiple types of links, and have multiple community scales. No prior knowledge about community detection in networks will be assumed for this presentation.

Bio:After a childhood spent mostly in Minnesota, Peter moved east to attend college at Cornell University where he majored in Engineering Physics. He then took a Churchill Scholarship to study in the Cavendish Laboratory at Cambridge with an M.Phil. in Physics. Returning to the States, he continued his studies at Princeton, leading to an M.A. and Ph.D. in Applied and Computational Mathematics. Following a postdoctoral instructorship in applied mathematics at MIT, and a tenure-track assistant professorship in Mathematics at Georgia Tech, Peter moved to Chapel Hill to join the Department of Mathematics and the Institute for Advanced Materials, Nanoscience and Technology at UNC.

Seven Deadly Sins of Contemporary Quantitative Political Analysis [video]
Friday, November 18, 2011 • 12:30PM–2PM • Lunch provided
Computer Science Building, Room 150/151

Abstract:A combination of technological change, methodological drift and a certain degree of intellectual sloth and sloppiness, particularly with respect to philosophy of science, has allowed contemporary quantitative political analysis to accumulate a series of dysfunctional habits that have rendered a great deal of contemporary research more or less scientifically useless. The cure for this is not to reject quantitative methods---and the cure is most certainly not a postmodernist nihilistic rejection of all systematic methods---but rather to return to some fundamentals, and take on some hard problems rather than expecting to advance knowledge solely through the ever-increasing application of fast-twitch muscle fibers to computer mice. In the original paper, presented at the American Political Science Association meetings in 2010 and subsequently one of the most frequently downloaded papers from the meeting, these "seven deadly sins" are identified as : 1. Kitchen sink models that ignore the effects of collinearity; 2. Pre-scientific explanation in the absence of prediction; 3. Reanalyzing the same data sets until they scream; 4. Using complex methods without understanding the underlying assumptions; 5. Interpreting frequentist statistics as if they were Bayesian; 6. Linear statistical monoculture at the expense of alternative structures, particularly those in machine learning; 7. Confusing statistical controls and experimental controls. [The talk will focus on topics 2, 5, and 6] The answer to these problems is solid, thoughtful, original work driven by an appreciation of both theory and data. Not postmodernism. The talk will close with a review of how we got to this point from the perspective of 17th through 20th century philosophy of science, and provides suggestions for changes in philosophical and pedagogical approaches that might serve to correct some of these problems.

Bio:Before coming to Penn State, Professor Schrodt was a professor of political science at the University of Kansas and at Northwestern University in Illinois, where he helped develop Northwestern's programs on Mathematical Methods in the Social Sciences and the multidisciplinary program in international studies. Dr. Schrodt has also taught at the Naval Postgraduate School in Monterey, California, the American University in Cairo, the University of California at Davis, Bir Zeit University in the West Bank, and spent a year at the University of Lancaster (England) on a NATO Postdoctoral fellowship. Dr. Schrodt's major areas of research are formal models of political behavior, with an emphasis on international politics, and political methodology. His current research focuses on predicting political change using statistical and pattern recognition methods. He teaches a variety of courses in international relations, with an emphasis on international conflict, and U.S. defense policy. Dr. Schrodt has published more than 75 articles in political science journals including International Studies Quarterly, Journal of Conflict Resolution, Foreign Policy Analysis and the American Political Science Review. Additionally, his Kansas Event Data System computer program won the "Outstanding Computer Software Award" from the American Political Science Association in 1995.

The Evolution and Formation of Amicus Curiae Networks [video]
Friday, December 2, 2011 • 12:30PM–2PM • Lunch provided
Computer Science Building, Room 150/151

Abstract:We investigate two age-old questions of interest group behavior: how have interest group coalition strategies changed over time; and which factors determine whether interest groups work together? Through the creation of a new network measure of interest group coalitions based on cosigner status to United States Supreme Court amicus curiae briefs, we illuminate the central players and overall characteristics of this dynamic network from 1930 to present-day. We also model the attribute homophily and structure of the network. We find assortative mixing of interest groups based on policy area, region, size, and other business characteristics.

Bio:Janet M. Box-Steffensmeier (Ph.D., Texas, 1993), Vernal Riffe Professor of Political Science, pursues research and teaching interests in American politics (legislative politics, public opinion, and voting behavior) and in methodology (time series, event history, and network analysis). She has published articles in the American Political Science Review, American Journal of Political Science, Journal of Politics, Political Analysis, and Legislative Studies Quarterly. She is the author of Event History Modeling: A Guide for Social Scientists, published by Cambridge University Press. She is currently working on a Monte Carlo project to evaluate the treatment of heterogeneity in event history models, which is partially funded by the National Science Foundation. Other funded research includes projects on the use of Blue Slips by Senators to oppose court nominations. She has twice received the Gosnell Award for the best work in political methodology and the Emerging Scholar Award of the Elections, Public Opinion, and Voting Behavior Section of the American Political Science Association. She is the former treasurer of the American Political Science Association and President Elect of the Midwest Political Science Association.

Exploring Clustering Structure in Ranking Data [video]
Friday, December 9, 2011 • 12:30PM–2PM • Lunch provided

Abstract:Cluster analysis is concerned with finding homogeneous groups in a population. Model-based clustering methods provide a framework for developing clustering methods through the use of statistical models. This approach allows for uncertainty to be quantified using probability and for the properties of a clustering method to be understood on the basis of a well defined statistical model. Mixture models provide a basis for many model-based clustering methods. Ranking data arise when judges rank some or all of a set of objects. Examples of ranking data include voting data from elections that use preferential voting systems (eg. PR-STV) and customer preferences for products in marketing applications. A mixture of experts model is a mixture model in which the model parameters are functions of covariates. We explore the use of mixture of experts models in cluster analysis, so that clustering can be better understood. The choice of how and where covariates enter the mixture of experts model has implications for the clustering performance and the interpretation of the results. The use of covariates in clustering is demonstrated on examples from studying voting blocs in elections and examining customer segments marketing. This work was completed in collaboration with Claire Gormley.

Bio:Brendan Murphy is Associate Professor of Statistics in the School of Mathematical Sciences & the Complex and Adaptive Systems Laboratory at University College Dublin, Ireland. His research focuses on statistical model-based approaches to clustering, classification and network modeling. He is interested in applications of statistical methods in social science, food science and bioinformatics. He completed his PhD in Statistics at Yale University since then has held positions in Trinity College Dublin and the University of Washington. He currently serves as associate editor for the Annals of Applied Statistics and Statistics and Computing and review editor for Statistical Analysis & Data Mining.

Past CSSI Seminars :

Spring 2011