Five College Statistics Program


Yeazel co-authors paper on pollen seasonality

As part of her work following her Junior Year Abroad program in Cordoba, Spain, Linnea Yeazel (SC ’13) undertook the statistical analysis of pollen seasonality for olive trees in southern Spain.  Her collaborators at the University of Cordoba, in conjunction with Smith College Professor Esteban Montserrat, found that the olive reproductive cycle is changing considerably, likely due to climate change. The paper is now published in Science of the Total Environment. 

Amherst College hosts Sports Analytics Conference

The conference, held on Sunday, April 13, will bring together experts in the field of sports analytics, professors and students to learn and discuss the increasing role analytics plays in the world of athletics. The conference includes presentations by guest speakers Chris Anderson and David Sally, authors of "The Numbers Game," and UMass statistician Gregory Matthews. Ben Baumer of Smith College will also participate. Amherst College students will also be presenting their personal research in the field of sports analytics.

Horton co-authors alcohol intervention paper published in JAMA

A new paper published in the Journal of the American Medical Association was co-authored by Five College Statistician Nick Horton. The paper describes the results of a randomized trial conducted in New Zealand designed to measure the effectiveness of web-based screening and intervention for alcohol use. A webcast can is now available. 

Balasubramanian paper published in Royal Statistical Society Journal

Five College Statistician Raji Balasubramanian’s paper on variable importance in matched case–control studies in settings of high dimensional data was recently published in Journal of the Royal Statistical Society: Series C (Applied Statistics). 

Abstract: We propose a method for assessing variable importance in matched case–control investigations and other highly stratified studies characterized by high dimensional data (p>>n). In simulated and real data sets, we show that the algorithm proposed performs better than a conventional univariate method (conditional logistic regression) and a popular multivariable algorithm (random forests) that does not take the matching into account. The methods are applicable to wide ranging, high impact clinical studies including metabolomic, proteomic studies and neuroimaging analyses, such as those assessing stroke and Alzheimer's disease. The methods proposed have been implemented in a freely available R library (

New Major in Statistics at Amherst College

To meet the educational needs of our students and address the growing demand and interest in the area of Statistics, Amherst College has created a new major in Statistics as part of a newly renamed Department of Mathematics and Statistics.  The new major will help students develop the capacity to turn data into information that can be used to guide decision-making.  The new major consists of foundational courses in mathematics and computer science, along with a series of introductory, intermediate and advanced courses in statistics, culminating in a capstone course (Advanced Data Analysis) and a comprehensive evaluation of a project.  More information can be found at the department's website or in this article in the student newspaper.

Pioneers in Civic Data: Breaking into the Open(Data) and other lessons on approaching a new frontier

Announcing the Five College DataFest

We are happy to announce that the Five College DataFest, sponsored in part by the Five College Statistics Program, will be held the weekend of March 29th and 30th at the University of Massachusetts in Amherst. DataFest is a nationally-coordinated competition that challenges undergraduates working in teams of up to five to extract meaningful insights from a rich and complex data set. A number of prizes will be awarded, including: "Best in Show", "Best Visualization", and "Best Use of External Data". Previous DataFests held at UCLA and Duke University have drawn large numbers of students, and we are hoping for a great turnout from the Five Colleges: Amherst, Hampshire, Mount Holyoke, and Smith Colleges, as well as UMass. Sponsorships opportunities are available -- please contact Ben Baumer ( or Andrew Bray ( for more information. 

Inferring Causation without Randomization: A matched design to assess the number of embryos to transfer during in vitro fertilization 

  • Cassandra Pattanayak, Wellesley College
  • Monday, September 23rd -- talk at 4:00pm with refreshments at 3:30pm
  • Amherst College, Seeley Mudd room 206

Transferring one rather than two embryos during in vitro fertilization has been endorsed as a way to reduce multiple birth rates, but no large-scale randomized trial has evaluated the impact of the number of embryos transferred on birth outcomes. This presentation describes the design of a non-randomized study that parallels a hypothetical randomized experiment to examine the effect of single versus double embryo transfer. Using national surveillance data from the Centers for Disease Control and Prevention, single and double embryo cycles were paired on estimated propensity scores to create matched treated and control groups that are as similar on the observed background covariates as if the number of embryos transferred had been randomly assigned. This example illustrates a general framework for drawing causal rather than associative inferences from non-randomized studies, and the crucial role of checking balance between treatment and control groups on key background covariates is emphasized. 

Big Data:  A perspective on their current uses and potential future uses by the Federal Statistical System

  • Mike Horrigan, Associate Commissioner for Prices and Living Conditions, Bureau of Labor Statistics, Washington, DC
  • Monday, September 30th talk at 4:30pm with tea at 4:00pm
  • Smith College Ford Hall room 240

This talk will explore the world of big data in terms of how they are currently used by the Federal Statistical system and explore possible ways in which big data sources may be leveraged in the future.   In an era of declining real budgets for the Federal Statistical agencies, big data are often seen as an efficient and economical way to replace or supplement existing data collection programs.  However, the blending of existing Federal data series collected using established statistical survey practices with big data sources that are not necessarily representative samples of a larger universe frame poses some significant challenges to the Federal statistical system, especially in terms of the quality tradeoffs we may be making.  We also face challenges in maintaining our goal of methodological transparency when the potential biases of some big data sources are not always well understood.  The talk begins with an attempt to define big data.  I then present the results (to date) of an environmental scan we are conducting on the uses of big data across Federal statistical agencies as well as a scan of big data uses in academia and private business.  The remainder of the talk addresses the issue of the potential future uses of big data, developing a perspective based on existing frameworks for judging the quality of economic statistics as well as looking at the statistical issues associated with blending survey based data with big data sources.  I am particularly interested in your thoughts on the statistical issues and potential solutions to such issues posed by blended statistics.  I end the seminar with concluding thoughts on the future of big data in the Federal Statistical system based on my role as a Director of several major statistical survey programs.

This talk reflects the current status of a project being sponsored by the American Economic Association Data Subcommittee on Big Data, for which I am a co-chair along with Ana Aizcorbe of the Bureau of Economic Analysis.  The goal of the project is to report on the current and potential uses of big data across the Federal Statistical system.

The talk is part of the activities of the International Year of Statistics and is sponsored by the Departments of Mathematics and Statistics as well as Economics at Smith College and co-sponsored by the Five College Statistics Program and the Boston Chapter of the American Statistical Association.

For more information about the talk, contact Katherine Halvorsen (, 413-585-3874).  More information about Mike Horrigan can be found here:

Hack for Western Mass

On the weekend of June 1st, 2013, web and software developers, designers, community organizers, and other folks from all over Western Mass will gather to tackle local challenges with technology -- sponsored in part by the Five College Statistics Program. 

International Year of Statistics Talk

Mark Hansen, Columbia University, Tuesday, March 12th, 2013, Ford Hall 240 Smith College.

Position announcement

Postdoctoral Associate in statistics at the Five-College consortium (Amherst, Hampshire, Mount Holyoke, and Smith Colleges and the University of Massachusetts Amherst), 3 Years.

Application Deadline : 2012/12/03

File attachments: