Sem: Time Series

Offered as MTH 391 and SDS 391. Time series refers to datasets where there is a sequential order for the observations. The primary objective of time series analysis is to develop mathematical models that characterize the relationship of observed time series data. Topics in this course include perspectives from linear regression (time dependent covariates or errors), nonparametric techniques (smoothing, moving averages, nearest neighbors), and time domain models (autoregressive and moving average models, and their extensions).

Mathematical Statistics

Offered as MTH 320 and SDS 320. An introduction to the mathematical theory of statistics and to the application of that theory to the real world. Discussions include functions of random variables, estimation, likelihood and Bayesian methods, hypothesis testing and linear models. Prerequisites: a course in introductory statistics, MTH 212 and MTH 246, or equivalent. Enrollment limited to 20.

Sem:T-Disability,Inclusn&Data

Students learn the social model of disability and critical disability theory as well as research design and process, and work on a research project analyzing disability inclusion public data. The statistical methods covered in this course may include logistic regression, multivariate analysis, factor analysis, etc. Students are expected to submit their final projects to a journal, conference or competition by the end of the semester. Prerequisite: SDS 201, SDS 220 or ECO 220. Restrictions: Juniors and seniors only. Enrollment limited to 12. Instructor permission required.

Modeling for Machine Learning

In the era of “big data,” statistical models are becoming increasingly sophisticated. This course begins with linear regression models and introduces students to a variety of techniques for learning from data, as well as principled methods for assessing and comparing models. Topics include bias-variance trade-off, resampling and cross-validation, linear model selection and regularization, classification and regression trees, bagging, boosting, random forests, support vector machines, generalized additive models, principal component analysis, unsupervised learning and k-means clustering.

Multiple Regression

(Formerly MTH 291/ SDS 291). Theory and applications of regression techniques: linear and nonlinear multiple regression models, residual and influence analysis, correlation, covariance analysis, indicator variables and time series analysis. This course includes methods for choosing, fitting, evaluating and comparing statistical models and analyzes data sets taken from the natural, physical and social sciences. Students who have completed SDS 100 in a previous semester need not repeat it. Corequisite: SDS 100.

Multiple Regression

(Formerly MTH 291/ SDS 291). Theory and applications of regression techniques: linear and nonlinear multiple regression models, residual and influence analysis, correlation, covariance analysis, indicator variables and time series analysis. This course includes methods for choosing, fitting, evaluating and comparing statistical models and analyzes data sets taken from the natural, physical and social sciences. Students who have completed SDS 100 in a previous semester need not repeat it. Corequisite: SDS 100.

Research Design & Analysis

(Formerly MTH/SDS 290). A survey of statistical methods needed for scientific research, including planning data collection and data analyses that provide evidence about a research hypothesis. The course can include coverage of analyses of variance, interactions, contrasts, multiple comparisons, multiple regression, factor analysis, causal inference for observational and randomized studies and graphical methods for displaying data. Special attention is given to analysis of data from student projects such as theses and special studies. Statistical software is used for data analysis.

Program/Data Science: Python

This course covers the skills and tools needed to process, analyze and visualize data in Python and work on collaborative projects. Topics include functional and object oriented programming in Python, data wrangling in Pandas, visualization in Matplotlib in seaborn, as well as creating a reproducible workflow: debugging, testing and documenting programs, and effectively using version control. The major goal for the course is to create a viable, open-source Python package like those in the Python Package Index (PyPI). Prerequisites: SDS 192 and CSC 110. Enrollment limited to 40.

Programming Data Science: R

This course is not about data analysis—rather, students learn the R programming language at a deep level. Topics may include data structures, control flow, regular expressions, functions, environments, functional programming, object-oriented programming, debugging, testing, version control, documentation, literate programming, code review and package development. The major goal for the course is to contribute to a viable, collaborative, open-source, publishable R package. Prerequisites: SDS 192 and CSC 110, or equivalent. Enrollment limited to 40.

Colq:Data Science, Movies

Movies tell stories with data and about data. How is the understanding of data, data science, and the power of data science influenced and reinforced by popular media? Students explore the social, ethical, and cultural dimensions of data and data science using contemporary film and TV shows. Through close reading of visual media, students develop critical thinking about data provenance, data integrity, and the social stakes of data science.
Subscribe to