Five College Consortium

Five College Statistics Program


Upcoming Events

Women in Data Science (WiDS) livestream at Mount Holyoke College

Friday, February 3, 2017,  11:00 am to 5:30 pm, Gamble Auditorium, 50 College St, South Hadley, MA 

Mount Holyoke College will be hosting a Women in Data Science (WiDS) livestream on Feb 3. Stanford has set up an international Women in Data Science event with regional hosts around the globe, and Mount Holyoke is one of the hosts. The livestream has a very impressive line up, and there will also have a local panel. The agenda and link to register for the event are at:

Talk: Activity Monitors: Some interesting data and challenges

John Staudenmayer, Department of Mathematics and Statistics, University of Massachusetts, Amherst

Friday, February 3, 2017,  12:00 PM to 1:00 PM,  Seeley Mudd Room 206, 31 Quadrangle Drive, Amherst, MA

AbstractA surprising number of people use some sort of activity monitor to assess aspects of daily physical activity, sedentary behavior, and sleep. This non-technical talk will have two parts. In the first part, I will describe how the monitors work.More specifically, this will include an introduction to the data that the monitors collect, an overview of the types of statistical models and algorithms that the monitors apply to those data, and a summary of how those algorithms are calibrated. In the second part of the talk, I will discuss aspects of activity monitoring in more detail, including: (1) the sedentary sphere, (2) step detection methods, and (3) what makes a good calibration? (This is based on joint work with a number of people, especially Professor Patty Freedson, Department of Kineseiology at UMass-Amherst.)

Recurring Events

Past Events


Christine Zhang 

Los Angeles Times,  Knight-Mozilla OpenNews Fellow

Tuesday, December 6, 20164:30 PM to 5:30 PM, McConnell 103 (Auditorium), Smith College. Light refreshments will be served at 4 pm preceding the talk.

Title:  Data Journalism at the Los Angeles Times

Abstract: Data journalism could be either the future of news or just another fad. Although the philosophy of data journalism is nothing new, practices have changed with time and technology. Christine Zhang ’09, a Knight-Mozilla OpenNews Fellow at the LA Times, will present what she's learned about finding stories in numbers.

Two talks by Jessica Utts

Professor of Statistics, University of California Irvine, President of the American Statistical Association 

Monday, October 17, 4:30pm , Amherst College,  Seeley Mudd 206 

Title:  Communicating the Value of Statistics

Abstract: Statistical methods impact almost every facet of daily life, from the health care we receive to the entertainment options available to us. Knowledge of the basic concepts of statistics can help everyone make better decisions. How can we convey the importance of understanding statistical results to health care workers, public policy decision-makers, educators, and the myriad other professionals who could benefit from that understanding? We need to communicate the role and value of statistical thinking for everyone, whether they are professional consumers of statistical results or simply could benefit from tools for making better decisions in daily life. In this talk I will discuss some things the American Statistical Association (ASA) is doing to help, and give suggestions and encouragement for the members of the audience to do what they can to help as well.

Tuesday, October 18, 4:30pm , McConnell 103 (Auditorium), Smith College,  

Title:  Understanding p-values and the Controversy Surrounding Them

Abstract: Most researchers and journals rely heavily on p-values for determining whether the results of a study are worthy of publication. But recently p-values have come under attack, and one social science journal has gone as far as banning their use for papers submitted to the journal. These developments led the American Statistical Association (ASA) to release a statement titled “Statement on Statistical Significance and P-values” with six principles underlying the proper use and interpretation of p-values and statistical significance. In this talk I will present the ASA’s six principles and discuss what p-values really measure, some pitfalls related to their use, and what steps you can take to make sure your use of them is appropriate.

Yoshua Bengio
Professor of Computer Science and Operations Research Universite de Montreal

Thursday, September 29, 2:00pm , UMass, CS 150/151  (Reception at 1:40 pm)

Title:  Improving the memory capability of recurrent networks

AbstractSince the 90s we have known about the fundamental challenge in training a parametrized dynamical system such as a recurrent networks to capture long-term dependencies. The notion of stable memory is crucial in understanding this issue, and is behind the LSTM and GRU architectures, as well as the recent work on networks with an external memory. We present several new ideas exploring how to further expand the reach of recurrent architectures, improve their training and scale up their memory, in particular to model language-related data and better capture semantics for question answering, machine translation and dialogue.

Events during the 2015-2016 academic year

Three talks on statistics and neuroscience by Rob Kass

Robert E. Kass
Professor, Department of Statistics and Machine Learning Department
Interim Co-Director, Center for the Neural Basis of Cognition
Carnegie Mellon University

Monday, April 11, 4:00 p.m., UMass, 16th floor of Lederle Graduate Research Tower Room 1634, Amherst, MA

Title: Statistics and Bayesian Inference in Neuroscience

Abstract: Experimenters are typically adept at applying standard statistical techniques, while computational neuroscientists are capable of formulating mathematically sophisticated data analytic methods to attack novel problems in data analysis. Yet, in many situations, statisticians proceed differently than those without formal training in statistics. What is different about the the way statisticians approach problems? I will give you my thoughts on this subject, and will illustrate with examples, including a new extension of Bayesian control of false discoveries, applied to neural synchrony detection across a network of interacting spiking neurons. I will conclude with some related comments on scientific reproducibility, illustrating them with an experiment in which brain signals were used to run a robotic device.

Tuesday, April 12, 12:10 p.m., McConnell Auditorium, Smith College, Northampton, MA

Title: Statistical Ideas in Neuroscience

Abstract: The brain sciences seek to discover mechanisms by which neural activity is generated, thoughts are created, and behavior is produced. Major advances have been based on careful consideration of data, which often involves statistics, especially when the phenomenon involves subtle variations in the signal, relative to noise. In addition, since the beginnings of neuroscience, it has proven useful to describe many aspects of brain activity by incorporating ideas from statistics. I will review these points in the context of visual perception, and will include a popular theory of what might be going on in the brain when we "pay attention to" a visual stimulus. I will also make some comments on scientific reproducibility, illustrating them with an experiment in which brain signals were used to run a robotic device.

Tuesday, April 12, 4:00 p.m., Adele Simmons Hall Room 112, Hampshire College

Title:  Neuroscience in the Age of Big Data

Abstract:  New technologies are creating exciting opportunities in neuroscience, but they are also posing new Big Data analytic challenges. Successful solutions will combine high-powered computational algorithms together with fundamental statistical principles for taming the inherently variability in brain-based data. One of the major approaches to Big Data, the Bayesian approach, is also relevant to neuroscience from a theoretical perspective because it helps capture the idea that evolution has driven the brain to perform optimally. I will review these points, focusing especially on visual perception, and will include a popular notion of what might be going on in the brain when we "pay attention," as in paying attention to a visual stimulus. I will also make some comments on scientific reproducibility, illustrating them with an experiment in which brain signals were used to run a robotic device.

DataFest 2016

Friday, April 1, 6:30PM, to Sunday, April 3, 4pm. University of Massachusetts, Amherst, MA

DataFest is a nationally-coordinated undergraduate competition in which teams of up to 5 students work over a weekend to extract insight from a rich and complex data set. The mission of DataFest is to expose undergraduate students to challenging questions with immediate real-world significance that can be addressed through data analysis. Apart from developing data analysis and team building skills, students can win cash prizes, fame, glory, or some combination thereof… and will get a free t-shirt! More information can be found at:

There are also a number of tutorials to help get prepare for DataFest. A list of these tutorials can be found on the Pioneer Valley and Five Colleges Statistics and Data Science Meetup page

UMass students only: an informational/organizational meeting will be held on Tuesday, March 1st from 4-4:30pm in LGRCA201 or signup here:

Hampshire students only: an informational/organizational meeting will be held on Monday, February 29th at 7pm in ASH 222 or sign up here: 

Protein Data Analysis at Scale

Aaron Coburn, Amherst College

Monday, February 22nd, 4:30pm, Amherst College, Seeley Mudd 206 (refreshments at 4:00pm in Seeley Mudd 208)

Abstract:The R programming environment is used by researchers and students alike to perform all manner of statistical analysis. This talk will describe a project at Amherst College that is using R to simulate and analyze protein structure perturbation, an area of study that has wide ranging implications for many branches of medical research. While writing the code for these simulations in R is relatively straight forward, running the code on a single machine can be prohibitively time consuming. In this situation, the typical approach is to rewrite the software so that it can be deployed on a Hadoop- or MPI-based cluster.

With this project, we took a different approach and used Spark, one of the newer distributed computational platforms. The primary advantage of Spark in this case is its tight integration with R execution environment, meaning the R code did not need to be substantially changed in order to run across a cluster of worker machines. Using the protein simulation project as a case study, this talk will explore how Spark allows data scientists to make use of existing R-based code both to explore very large datasets and run highly parallelized computations.


Friday February 5th, 6:00 p.m., to Sunday February 7th at 1pm. UMass, 16th floor of Lederle Graduate Research Tower Room 1634, Amherst, MA

The Graduate Researchers interested in Data club (GRiD) at UMass Amherst is partnering with the Pioneer Valley Transit Authority(PVTA) to host the first HackPVTA event. In this event, students of the Five Colleges will have a chance to play with data from the PVTA More information is available at:

Statistical Analysis of Big Genetics and Genomics Data

Xihong Lin, Harvard T.H. Chan School of Public Health

Monday, November 16, 4:30 p.m., Amherst College, Seeley Mudd 206 -- refreshments to precede talk at 4 p.m. in Seeley Mudd 208. 

RSVP by Tuesday, November 10, 2015, to

Abstract: The human genome project in conjunction with the rapid advance of high throughput technology has transformed the landscape of health science research. The genetic and genomic era provides an unprecedented promise of understanding genetic underpinnings of complex diseases or traits, studying gene-environment interactions, predicting disease risk, and improving prevention and intervention, and advancing precision medicine. A large number of genome-wide association studies conducted in the last ten years have identified over 1,000 common genetic variants that are associated with many complex diseases and traits. Massive targeted, whole exome and whole genome sequencing data as well as different types of -omics data have become rapidly available in the last few years. These massive genetic and genomic data present many exciting opportunities as well as challenges in data analysis and result interpretation. They also call for more interdisciplinary knowledge and research, e.g., in statistics, machine learning, data curation, molecular biology, genetic epidemiology and clinical science. In this talk, I will discuss analysis strategies for some of these challenges, including rare variant analysis of whole-genome sequencing association studies; analysis of multiple phenotypes (pleiotropy), and integrative analysis of different types of genetic and genomic data. 

Data Visualization at The New York Times

Amanda Cox is a graphics editor at The New York Times

Friday December 4th, 4:30 p.m., Smith College, Ford Hall Room 240, 100 Green Street, Northampton, MA

Abstract: Amanda Cox is a graphics editor at The New York Times, where she makes charts and maps for the paper and its website. A 2002 graduate of St. Olaf College, she worked at the Federal Reserve Board and earned a master's degree in statistics from the University of Washington before joining the Times in 2005. She was part of a team that won a National Design Award in 2009 and received the Excellent in Statistical Reporting Award from the American Statistical Association in 2012.

Statistics and Policy Making: The Case of Greece

Andreas Georgiou, Former Head of Greek Statistics Office

Wednesday, November 11, 8:00 p.m., Amherst College, Converse Hall, Cole Assembly Room

Abstract: The talk will focus on the importance of statistics in effective policy making and good governance and will relate his five years of working as the head of the Greek Statistical Office. In 2010, Georgiou was appointed the head of the Greek Office, after serving as an economist for 21 years at the IMF. His discoveries regarding the Greek deficit led him to the center of a political firestorm that included personal felony charges that could have led to a life in prison and enormous fines. Amherst College Converse Hall, Cole Assembly Room, 8 p.m.


Data Visualization with D3

Dana Udwin and Deirdre Fitzpatick

Thursday, November 12, 7:00 p.m., Smith College, McConnell 103 

Abstract: Data visualizations empower individuals of all backgrounds to explore complex, multivariate data, an area where traditional statistical analyses can fall short. 

Dana Udwin and Deirdre Fitzpatick will present a motivating use case from their work at MassMutual Data Labs in Amherst and walk through creating data visualizations from a beginners’ perspective. In the process, they will introduce web coding basics and useful JavaScript libraries, including D3, C3, dc, and Crossfilter, that make such visualizations simple to build and host online. 

Colloqium on "Causal inference: identifying subgroups by their response to treatment”

Sarah Anoke, Department of Biostatistics, Harvard T.H. Chan School of Public Health  

Thursday, October 22, 4:30 p.m., Amherst College, Seeley Mudd 206
Abstract: Causal inference is a field of statistics focused on measuring a particular type of relationship between two variables. Referring to these two variables as the `treatment’ and the `outcome’, we consider the value that an individual’s outcome would take if the treatment was present, and the value that the individual’s outcome would take if the treatment was absent. The difference in these two potential outcomes is the treatment effect. Every individual has their own individual treatment effect (ITE). But because only one of these two potential outcomes is observable, ITEs cannot be estimated from observed data. To overcome this problem, the average outcome among a group of individuals unexposed to treatment is subtracted from the average outcome among a group of individuals exposed to treatment, yielding an average treatment effect (ATE). It is of interest to identify subgroups for which the subgroup-specific ATE is very different from the overall ATE. Knowing the overall ATE is arguably misleading; we would prefer to know that the drug has no effect within women but a dramatic effect within men. How then, can the data tell us which subgroups respond particularly well or poorly to treatment, without advance knowledge of these subgroups?

Colloqium on "Statistical tools and challenges for monitoring migratory birds”

Emily Silverman, Statistician, Division of Migratory Bird Management, U.S. Fish & Wildlife Service

Friday, October 30th, 2015 2:00 p.m., Amherst College, Frost Library 211

Abstract: Federal management of migratory birds began 100 years ago, when the United States signed the 1916 Convention for the Protection of Migratory Birds with Great Britain (for Canada).  These protections were codified in the Migratory Bird Treaty Act (MBTA) of 1918, which now covers over 800 species of birds, and stands as one of the earliest U.S. environmental laws.  The evolution of management approaches since the MBTA has led to the development of monitoring programs and quantitative methods in wildlife science. I will present the history of bird monitoring and statistical methods for population assessment and will discuss new approaches, challenges, and how a solid understanding of statistical concepts is essential for informed management.  Drawing on examples from my own work, I will highlight the interdisciplinary skills needed to operate effectively as a scientist and statistician in a resource management agency.  As our ability to collect information about the natural world expands in an increasingly digital world, the need for innovative, technically-adept wildlife scientists is expanding.

Events during the 2014-2015 academic year

  • On Sunday, February 15 from 1 PM – 4 PM, Amherst College will be hosting a Sports Analytics Forum in the Cole Assembly Room on campus. The day will consist of several guest speakers who work and/or do research in the field of Sports Analytics, in addition to a handful of research presentations by students. This event is free and open to the public. If you would like an in-depth look into the day, please visit: Sports Analytics Forum Details.
    • Sunday, February 15 from 1 PM – 4 PM, Amherst College, Cole Assembly Room in Converse Hall, Amherst, MA.

  • Taylor Arnold, PhD of AT&T Labs will talk on: Oh the Places You'll Go: The Surprising Complexity of Statistics' Most Basic Model. Abstract: A Bernoulli distribution describes a process which has only two outcomes. Despite the simplicity, it is used in a wide array of applications including a simple coin toss, the outcome of a sporting event, weather models, and the winner of an election. A careful analysis of the Bernoulli distribution also raises a number of theoretical questions leading to topics such as Bayesian inference, minimaxity and estimator theory. Fortunately, the relatively uncomplicated nature of the Bernoulli model allows such questions to be studied without the advanced mathematical machinery required for a more general treatment. This talk will explore these practical and theoretical considerations, with a focus towards the `big questions' which continue to guide modern statistical research. Two real datasets, from baseball and medicine, will be used throughout to guide the discussion.
    • Monday, February 16, 4:30PM, Amherst College, Seeley Mudd building Rm. 206. 

  • DataFest 2015: DataFest is a nationally-coordinated undergraduate competition in which teams of up to 5 students work over a weekend to extract insight from a rich and complex data set. The mission of DataFest is to expose undergraduate students to challenging questions with immediate real-world significance that can be addressed through data analysis. Apart from developing data analysis and team building skills, students can win cash prizes, fame, glory, or some combination thereof… and will get a free t-shirt!
    • Friday, March 27, 6:30PM, to Sunday, March 29, 4pm. University of Massachusetts, Amherst, MA.

  • Five College DataFest
  • New England Statistics Symposium
  • HackEbola: Sponsored by Graduate Researchers in Data (GRiD) will take place at UMass over the weekend before Thanksgiving, Nov 21-23, 2014. The event will bring students, faculty, and other professionals together to work on learning more about Ebola by analyzing real-time data on the outbreak and response. For more information, check out this website:
  • 2014 Alice Ambrose Lazerowitz-Thomas Tymoczko Dinner and Lecture: "Predictive Accuracy and the Bayesian Approach to Inductive Inference,James M. Joyce, Cooper Harold Langford Collegiate Professor of Philosophy and Statistics, University of Michigan. Thursday, December 4, 2014, Smith College at 5:30pm.

  • Ben Alamar, Sports Analytics Consultant and Researcher, currently works for ESPN, author of "Sports Analytics: A Guide for Coaches, Managers, and Other Decision Makers" and also was the founding editor of the Journal of Quantitative Analysis in Sports ( Wednesday, November 5, 2014, 7:30PM, Amherst College, Lewis-Sebring Commons (Desserts will be served)

  • Graduate study in Biostatistics: a panel discussion with faculty and current students from the Center for Statistical Sciences at Brown University. 
    Come hear about opportunities for graduate study in biostats (both MA and PhD) from Prof. Christopher Schmid and two current students at the Center for Statistical Sciences. Tuesday, October 21st, 2014, 4:30 pm, Amherst College, Seeley Mudd 206
  • "A Crash Course in Shiny via ShinyHelper," Jay (John) Emerson from Yale has offered to give a crash course in Shiny, a straightforward means to create interactive web applications in R and RStudio.Wednesday, Sept. 18, 4 pm, Seeley Mudd Room 207, Amherst College
  • "Inferring Causation without Randomization: A matched design to assess the number of embryos to transfer during in vitro fertilization," a talk by Cassandra Pattanayak, Wellesley College. Monday, Sept. 23, 4 pm, Seeley Mudd 206, Amherst College
  • "A beginner's guide to using SQLite with R: database usage for fun and profit," Nick Horton will be giving a beginner's guide to using SQLite with R, in the context of the Data Expo 2009 airline delays dataset ( This includes 150,000,000 rows corresponding to every commercial flight in the US from 1987 to 2012. Thursday, Sept. 26, 7 pm, National Priorities Project, 243 King Street, Northampton, MA
  • "Big Data:  A perspective on their current uses and potential future uses by the Federal Statistical System," a talk by Mike Horrigan, Associate Commissioner for Prices and Living Conditions, Bureau of Labor Statistics. Monday, Sept. 30, 4:30 pm, Ford Hall 240, Smith College
  • "Turning a group of programming newbies into R users," Andy Smith. Wednesday, Oct. 2, 7 pm, Room 222, Morrill Science Center, UMass
  • "openWAR: An Open Source System for Overall Player Performance in Major League Baseball," with Ben Baumer and Greg Matthews.  Thursday, Oct. 17, 4:30 pm, Room TBD, UMass
  • "Mapping Spatial Dynamics of Yellowtail Flounder on the Northeast Shelf with R," a talk by Megan O'Connor and Carl Dunham. 
    • Tuesday, Oct. 22, 7 pm, McConnell Room B05, Smith College
  • "Business Analytics Research at IBM," a talk by Bonnie Ray of the IBM T.J. Watson Research Center.
    • Thursday, November 14, 4 pm, Seeley Mudd 206, Amherst College, refreshments at 3:30 pm. 
  • "TBA," a talk by Brianna Heggeseth, Williams College
    • Monday, November 18, 4 pm, Lederle Graduate Research Tower 1634, UMass, tea at 3:45 pm.
  • "Taking a Passion for Statistics to the Classroom, into the MOOC World and Back Again," by Lisa Dierker, Wesleyan University. 
    • Monday, March 24, 4:30 pm, Amherst College, Seeley Mudd 206 -- refreshments to precede talk at 4 pm in Seeley Mudd 208
  • "STAT4STEM: Online resources to help students (and instructors!) learn stats," by Eric Simoneau (Boston Latin) and Neil Heffernan (Worcester Polytechnic Institute) will be talking about their successful ASSISTments and STATS4STEM projects.
    • Monday, September 22, 4:30 pm, Amherst College, Seeley Mudd 206 -- refreshments to precede talk at 4 pm in Seeley Mudd 208
File attachments: