Five College Statistics Program


Announcing the Five College DataFest 2015!

We are pleased to announce that the Five College DataFest, sponsored in part by the Five College Statistics Program, will be held the weekend of March 27 and 29 at the University of Massachusetts Amherst. DataFest is a nationally coordinated competition that challenges undergraduates working in teams of up to five to extract meaningful insights from a rich and complex data set. A number of prizes will be awarded. Sponsorships opportunities are available—please contact Ben Baumer ( or Andrew Bray ( for more information.

ESPN publishes an article by Ben Baumer about the use of statistics in baseball

Ben Baumer of Smith College published an article for ESPN about which baseball teams are using sabermetics (statistical analysis of baseball data) to improve their performance. He was also interviewed on ESPN where he discusses this work. 

HackEbola at UMass aids fight against West African epidemic

An article from the Daily Hampshire Gazette:

From across the world, young mathematicians, biologists and other scholars at the University of Massachusetts Amherst crunched numbers this weekend in hopes of aiding the fight against the Ebola epidemic in West Africa.

“It has to start somewhere,” said Andrew Smith, 26, a fourth-year doctoral candidate in organismic and evolutionary biology at UMass.

Smith was among more than 50 students from the Five Colleges who took part in a HackEbola event in the John W. Lederle Research Center at UMass planned from Friday evening to Sunday afternoon.

UMass searching for 2nd position in Biostatistics

The Biostatistics Program at the University of Massachusetts/Amherst is seeking talented applicants qualified for an assistant or associate professor position.  Under exceptional circumstances, highly qualified candidates at other ranks may receive consideration.  Individuals with demonstrated potential for or experience in developing an extramurally funded research program in biostatistical methods, with application to areas such as big data, bioinformatics, clinical trials, survey research, or epidemiology/public health are encouraged to apply. We will consider applications from individuals without a degree in biostatistics or statistics if they have research experience in the areas mentioned above.

Please see the ad below for the full details. 

Bray pens article on DataFest in Amstat News

ASA President Nat Schenker writes: A June Amstat News article by Robert Gould, Benjamin Baumer, Mine Çetinkaya-Rundel, and Andrew Bray described DataFest, an annual Big Data analysis competition for college students. The ASA board, at its April meeting, approved a proposal from the DataFest organizers to make the ASA the national headquarters for DataFest. For this month’s President’s Corner, I invited Andrew Bray, postdoctoral research associate at the University of Massachusetts, Amherst, to write about student perspectives on DataFest. As a UCLA graduate student, Andrew helped Rob Gould, DataFest’s founder, organize the first few UCLA events. This year, he and Ben Baumer organized the inaugural Five College DataFest.

Five College Guide to R and RStudio available

Students in a number of Five College institutions are using R and RStudio, and a guide to using this powerful (and free) system is now available at

Meetup brings useR happenings to Pioneer Valley

The most recent meeting of the Western Mass Data Science, Stats, and R Meetup brought the latest and greatest from the useR! conference to statisticians of all stripes in the Pioneer Valley. Over drinks at the Amherst Brewing Company, Nick Reich of UMass described the latest iteration of Hadley Wickham's dplyr package, and Andrew Bray of Mt. Holyke College presented ggvis, RStudio's attempt to bring dynamic data visualizations to R. 

Stoudt SC'15 wins First Prize in Statistics in Sports Undergraduate Research Competition

Sara Stoudt of Smith College was awarded First Prize in the 2014 Statistics in Sports Undergraduate Research Competition held at the Joint Statistical Meetings in Boston in August. The competition was sponsored by the Statistics in Sports section of the American Statistical Association and was open to all undergraduate students doing research in sports analytics. Stoudt received a $250 cash prize for her entry: The Perfect Bracket: Machine Learning in NCAA Basketball.

UMass School of Public Health searching for tenure-track biostatistician

The Biostatistics Program seeks a tenure track faculty (open rank) with demonstrated experience in developing an extramurally funded research program in biostatistical methods, with application to areas such as clinical trials, survey research, epidemiology/public health, and addressing analysis needs for application in big data including medical informatics, neuroimaging, bioinformatics, and genomics.

Please see the full description in the file attached below. 

Five College students recognized in Amstat News for DataFest

Several Five College students were featured in a recent article about DataFest published in the Amstat News. The inaugural Five College DataFest took place at UMass in late March. The article describes how DataFest has spread to multiple locations across the country in 2014. 

Yeazel co-authors paper on pollen seasonality

As part of her work following her Junior Year Abroad program in Cordoba, Spain, Linnea Yeazel (SC ’13) undertook the statistical analysis of pollen seasonality for olive trees in southern Spain. Her collaborators at the University of Cordoba, in conjunction with Smith College professor Esteban Montserrat, found that the olive reproductive cycle is changing considerably, likely due to climate change. The paper is now published in Science of the Total Environment

Amherst College hosts Sports Analytics Conference

The conference, held Sunday, April 13, will bring together experts in the field of sports analytics, professors and students to learn and discuss the increasing role analytics plays in the world of athletics. The conference includes presentations by guest speakers Chris Anderson and David Sally, authors of The Numbers Game, and UMass statistician Gregory Matthews. Ben Baumer of Smith College will also participate. Amherst College students will also be presenting their personal research in the field of sports analytics.

Horton co-authors alcohol intervention paper published in JAMA

A new paper published in the Journal of the American Medical Association was co-authored by Five College statistician Nick Horton. The paper describes the results of a randomized trial conducted in New Zealand designed to measure the effectiveness of web-based screening and intervention for alcohol use. A webcast can is now available. 

Balasubramanian paper published in Royal Statistical Society Journal

Five College statistician Raji Balasubramanian’s paper on variable importance in matched case–control studies in settings of high dimensional data was recently published in Journal of the Royal Statistical Society: Series C (Applied Statistics). 

Abstract: We propose a method for assessing variable importance in matched case–control investigations and other highly stratified studies characterized by high dimensional data (p>>n). In simulated and real data sets, we show that the algorithm proposed performs better than a conventional univariate method (conditional logistic regression) and a popular multivariable algorithm (random forests) that does not take the matching into account. The methods are applicable to wide ranging, high impact clinical studies including metabolomic, proteomic studies and neuroimaging analyses, such as those assessing stroke and Alzheimer's disease. The methods proposed have been implemented in a freely available R library (

New Major in Statistics at Amherst College

To meet the educational needs of their students and address the growing demand and interest in the area of statistics, Amherst College has created a new major in Statistics as part of a newly renamed Department of Mathematics and Statistics. The new major will help students develop the capacity to turn data into information that can be used to guide decision-making. The new major consists of foundational courses in mathematics and computer science, along with a series of introductory, intermediate and advanced courses in statistics, culminating in a capstone course (Advanced Data Analysis) and a comprehensive evaluation of a project. More information can be found at the department's website or in this article in the student newspaper.

Pioneers in Civic Data: Breaking into the Open(Data) and other lessons on approaching a new frontier

Announcing the Five College DataFest

We are happy to announce that the Five College DataFest, sponsored in part by the Five College Statistics Program, will be held the weekend of March 29 and 30 at the University of Massachusetts Amherst. DataFest is a nationally coordinated competition that challenges undergraduates working in teams of up to five to extract meaningful insights from a rich and complex data set. A number of prizes will be awarded, including: "Best in Show," "Best Visualization," and "Best Use of External Data." Previous DataFests held at UCLA and Duke University have drawn large numbers of students, and we are hoping for a great turnout from the Five Colleges: Amherst, Hampshire, Mount Holyoke, and Smith Colleges, as well as UMass. Sponsorships opportunities are available—please contact Ben Baumer ( or Andrew Bray ( for more information.

Inferring Causation without Randomization: A matched design to assess the number of embryos to transfer during in vitro fertilization 

  • Cassandra Pattanayak, Wellesley College
  • Monday, September 23, talk at 4:00 pm with refreshments at 3:30 pm
  • Amherst College, Seeley Mudd room 206

Transferring one rather than two embryos during in vitro fertilization has been endorsed as a way to reduce multiple birth rates, but no large-scale randomized trial has evaluated the impact of the number of embryos transferred on birth outcomes. This presentation describes the design of a non-randomized study that parallels a hypothetical randomized experiment to examine the effect of single versus double embryo transfer. Using national surveillance data from the Centers for Disease Control and Prevention, single and double embryo cycles were paired on estimated propensity scores to create matched treated and control groups that are as similar on the observed background covariates as if the number of embryos transferred had been randomly assigned. This example illustrates a general framework for drawing causal rather than associative inferences from non-randomized studies, and the crucial role of checking balance between treatment and control groups on key background covariates is emphasized. 

Big Data:  A perspective on their current uses and potential future uses by the Federal Statistical System

  • Mike Horrigan, Associate Commissioner for Prices and Living Conditions, Bureau of Labor Statistics, Washington, DC
  • Monday, September 30, talk at 4:30 pm with tea at 4:00 pm
  • Smith College Ford Hall room 240

This talk will explore the world of big data in terms of how they are currently used by the Federal Statistical system and explore possible ways in which big data sources may be leveraged in the future. In an era of declining real budgets for the Federal Statistical agencies, big data are often seen as an efficient and economical way to replace or supplement existing data collection programs. However, the blending of existing Federal data series collected using established statistical survey practices with big data sources that are not necessarily representative samples of a larger universe frame poses some significant challenges to the Federal statistical system, especially in terms of the quality tradeoffs we may be making. We also face challenges in maintaining our goal of methodological transparency when the potential biases of some big data sources are not always well understood. The talk begins with an attempt to define big data. I then present the results (to date) of an environmental scan we are conducting on the uses of big data across Federal statistical agencies as well as a scan of big data uses in academia and private business. The remainder of the talk addresses the issue of the potential future uses of big data, developing a perspective based on existing frameworks for judging the quality of economic statistics as well as looking at the statistical issues associated with blending survey based data with big data sources. I am particularly interested in your thoughts on the statistical issues and potential solutions to such issues posed by blended statistics. I end the seminar with concluding thoughts on the future of big data in the Federal Statistical system based on my role as a Director of several major statistical survey programs.

This talk reflects the current status of a project being sponsored by the American Economic Association Data Subcommittee on Big Data, for which I am a co-chair along with Ana Aizcorbe of the Bureau of Economic Analysis. The goal of the project is to report on the current and potential uses of big data across the Federal Statistical system.

The talk is part of the activities of the International Year of Statistics and is sponsored by the Departments of Mathematics and Statistics as well as Economics at Smith College and co-sponsored by the Five College Statistics Program and the Boston Chapter of the American Statistical Association.

For more information about the talk, contact Katherine Halvorsen (khalvors@smith.edu413-585-3874). More information about Mike Horrigan can be found here:

Hack for Western Mass

On the weekend of June 1, 2013, web and software developers, designers, community organizers, and other folks from all over Western Mass gathered to tackle local challenges with technology at the event Hack for Western Mass. Sponsored in part by the Five College Statistics Program. 

International Year of Statistics Talk

Mark Hansen, Columbia University, Tuesday, March 12, 2013, Ford Hall 240 Smith College.

Position announcement

Postdoctoral Associate in statistics at the Five-College consortium (Amherst, Hampshire, Mount Holyoke, and Smith Colleges and the University of Massachusetts Amherst), 3 Years.

Application Deadline : 2012/12/03

File attachments: