Data Science

Faculty

E. A. Rundensteiner, The William Smith Dean's Professor, Computer Science and Program Director, Data Science Program. Ph.D., University of California, Irvine, 1992. Big data systems, big data analytics, visual analytics, machine learning/deep learning, health analytics, AI and fairness.
F. Emdad, Teaching Professor, Data Science Program. Ph.D., Colorado State University, 2007. Business analytics, computational and applied mathematics.
T. Ghoshal, Assistant Teaching Professor, Data Science Program. Ph.D., University of Mississippi, 2020. Feature Engineering, Deep Learning, and Natural Language Processing.
L. Harrison, Associate Professor, Computer Science Department. Ph.D., University of North Carolina, 2013. Data visualization, visual analytics, human computer interaction.
X. Kong, Associate Professor, Computer Science Department, Ph.D., University of Illinois, 2014. Data mining and big data analysis, with emphasis on addressing the data variety issues in biomedical research and social computing, and healthcare analytics.
N. Kordzadeh, Ph.D., University of Texas at San Antonio. Assistant Professor of Information Systems,WPI Business School. Organizational and individual adoption and use of social media in healthcare; business intelligence and analytics with an emphasis on algorithmic fairness and ethical decision-making.
K. Lee, Associate Professor, Computer Science Department, Ph.D., Texas A&M University, 2013. Big data analytics and mining, social computing, and cybersecurity over large-scale networked information systems such as the Web, social media and crowd-based systems.
Y. Li, Associate Professor, Computer Science Department, Ph.D., University of Minnesota, 2013. Ph.D., BUPT, Beijing, China, 2009. Data mining and artificial intelligence with applications in urban computing, smart transportation, and human mobility analysis.  
X. Liu, Associate Professor, Computer Science Department. Ph.D., Syracuse University, 2011. Natural language processing, deep learning, information retrieval, data science, and computational social sciences.
O. Mangoubi, Assistant Professor, Mathematical Science Department. PhD, Massachusetts Institute of Technology, 2016. Optimization, Machine learning, Statistical algorithms.
F. Murai, Assistant Professor, Data Science Program, Computer Science Department. Ph.D. University of Massachusetts, Amherst, 2016. Application of mathematical modeling, statistics and machine learning to computer, informational and social networks.
C. Ngan, Assistant Teaching Professor, Data Science Program, Ph.D., George Mason University, 2013. Time Series Analysis, Decision Guidance and Support Systems.
R. C. Paffenroth, Associate Professor, Mathematical Sciences Department, Ph.D., University of Maryland, 1999. Large scale data analytics, statistical machine learning, compressed sensing, network analysis.
C. Ruiz, Professor, Computer Science Department. Ph.D., University of Maryland, 1996. Data mining, machine learning, artificial intelligence, health, clinical medicine.
R. Shraga, Assistant Professor, Data Science Program, Computer Science Department. Ph.D., Technion - Israel Institute of Technology, 2020. Database systems, data discovery and integration, applied machine/deep Learning, human-in-the-loop, human-AI collaboration, information retrieval.
D. M. Strong, Professor and Department Head, WPI Business School. Ph.D., Carnegie Mellon University, 1988. Healthcare and business data analytics, computing applications in organizations.
A. C. Trapp, Associate Professor, WPI Business School. Ph.D., University of Pittsburgh, 2011. Mathematical optimization and analytics with applications to benefit society, focusing on improving outcomes of vulnerable populations.
R. Zekavat, Professor, Physics Department, Ph.D., Colorado state University, 2002. Statistical Signal Processing, Sensor Data Analysis and Machine Learning.
J. Zou, Associate Professor, Mathematical Sciences Department. Ph.D., University of Connecticut, 2009. Financial time series and spatial statistics with applications to epidemiology, public health and climate change.

Collaborative Faculty

E. O. Agu, Professor, Computer Science Department. Ph.D., University of Massachusetts, 2001. Mobile and ubiquitous health, machine and deep learning applications, and computer graphics.
A. Arnold, Assistant Professor; Ph.D ., Case Western University, 2014 . Mathematical biology, bayesian inference, parameter estimation in biological systems .
D. Brown, III, Professor and Department Head, Department of Electrical and Computer Engineering. Ph.D., Cornell University, 2000. Communication systems and networking, signal processing, information theory.
S. Djamasbi, Professor, WPI Business School. Ph.D., University of Hawaii, 2004. Management information systems.
L. Fichera, Assistant Professor; Ph.D., University of Genoa/Italian Institute of Technology.Continuum robotics, medical robotics, surgical robotics, image-guided surgery, laser-based surgery, medical devices.
T. Guo, Assistant Professor, Computer Science; Ph.D., University of Massachusetts Amherst, 2016. Distributed systems, cloud computing, data-intensive systems.
N. T. Heffernan, Professor, Computer Science Department and Co-Director Learning Sciences and Technologies. Ph.D., Carnegie Mellon University, 2001. Educational data mining, Machine Learning applied to educational context.  A/B testing.
X. Huang, Professor, Department of Electrical and Computer Engineering. Ph.D., Virginia Tech, 2001. Reconfigurable computing, ubiquitous computing and RFID.
D. Korkin, Professor in Computer Science, and BCB Program Director; Ph.D., University of New Brunswick, Canada, 2003. Big data analytics in life sciences, machine learning and its applications, visualization of complex biological data, network science, bioinformatics and personalized medicine.
R. Neamtu, Associate Teaching Professor, Computer Science; Ph.D., Worcester Polytechnic Institute.
R. Y. Lui, Professor; Ph .D ., University of Minnesota, 1981 . Mathematical biology, partial differential equations
D. Reichman, Assistant Professor; Ph .D ., Weizmann Institute, 2014 . Algorithms, Machine Learning, Artificial Intelligence .
A. Sales, Assistant Professor, Mathematical Sciences. Ph.D., University of Michigan, 2013. Methods for causal inference using administrative or high-dimensional data, especially in education.
H. Walker, Professor Emeritus, Mathematical Sciences. Ph.D., New York University, 1970. Computational mathematics, numerical methods for systems of linear and nonlinear equations.
J. R. Whitehill, Assistant Professor, Computer Science Department. Ph.D., University of California, San Diego, 2012. Machine learning, crowdsourcing, automated teaching, human behavior recognition.
Z. Wu, Associate Professor, Mathematical Sciences. Ph.D., Yale University, 2009. Big data statistical analytics, bioinformatics.
Z. Zhang, Associate Professor; Ph.D., Brown University, 2014, Shanghai University, 2011.Numerical analysis, scientific computing, computational and applied mathematics, uncertainty qualification.

Adjunct Faculty

Mohamed Eltabakh, Associate Professor, Computer Science Department. Ph.D., Purdue University, 2010. Database management systems and information management, query processing and optimization, indexing techniques, scientific data management, and big data analytics.
Feifan Liu, Assistant Professor, UMass Chan Medical School, Department Population and Quantitative Health Sciences, Ph.D., Health sciences, natural language processing, deep learning.

Faculty Research

Our faculty work in many areas related to Data Science, including in:

  • Big data and high performance analytics
  • Bioinformatics and genomic data bases
  • Business intelligence and predictive analytics
  • Cybersecurity analytics
  • Cryptography and data security
  • Educational data mining
  • Financial decision making
  • Healthcare data analytics
  • Internet big data analysis
  • Large-scale data management and infrastructures
  • Machine learning, data mining & knowledge discovery
  • Natural language processing
  • Signal processing and information theory
  • Social media analytics
  • Statistical learning
  • Visual and numerical analysis of large data sets

Program of Study

The WPI Data Science (DS) program offers graduate studies toward an M.S., B.S./M.S. and Ph.D.  Degree as well as a Certificate in Data Science. This Data Science program educates professionals, Data Scientists, with interdisciplinary skills in analytics, computing, statistics, and business intelligence. Key skills include the ability to recognize problems that can be solved with data analytics, apply the appropriate technologies on a given data problem, and communicate those solutions effectively to relevant stakeholders. Our faculty, together with our industrial partners, provide students with the resources and opportunities to engage in practical, purpose-driven projects, formal course work, and mentored interdisciplinary research work. This Data Science program requires advanced, in-depth course work in business, innovation, data analytics, computing, and statistical foundations. The program is designed to provide focused study in an area of interest to the student, ranging from general data analytics, computing, mathematical analytics, and business analytics, to specialized concentrations in financial analytics, healthcare analytics, biomedical analytics, analytics for sustainability, and learning sciences, among others. Due to their increased interdisciplinary perspective, our graduates will have a clear competitive advantage over professionals who are trained in a single discipline, such as business administration, statistics, or computer science, and who are seeking to work in the data analytic industry.  As such, they will be poised to successfully become leaders in Data Science, helping to formalize and realize its vision.
The graduate degree program in Data Science is designed to produce the future generation of data scientists who are proficient in their ability to:

  • Assess the suitability of, apply, and advance state-of-art data analytics tools and methods from data analysis, statistics, data mining, data management, computational thinking, big data algorithms, and visualization to bring about transformative solutions to important real-world problems across a number of domains.
  • Bring to bear their integrative, interdisciplinary knowledge and skills in the core disciplines central to Data Science (Computing, Statistics, and Business) to understand and then to explain analytics results and their applicability and validity to those responsible for solving real-world problems.
  • Serve as visionary leaders and project managers in data analytics, with the technical, and professional knowledge and skills needed for the current and future career demands of data scientists working on impactful projects.

Admissions Requirements

Students applying to the graduate degree program in Data Science (DS) are expected to have a bachelor’s degree with a strong quantitative and computational background including coursework in programming, data structures, algorithms, univariate and multivariate calculus, linear algebra and introductory statistics. Students with a bachelor’s degrees in computer science, mathematics, business, engineering and quantitative sciences would typically qualify if they meet the above background requirements.A strong applicant who is missing necessary data science background may be admitted with the expectation that he or she will take the Data Science transition courses as needed, which include CS5007 if missing programming and algorithms background, and DS517 if missing statistics background. Credits for these transition courses count towards the M.S. degree. Students applying to the Certificate in Data Science are expected to meet the same qualifications described above.

Affiliated Departments and Programs

This is a joint program administered by the Computer Science Department, Mathematical Sciences Department, and the WPI Business School. Closely affiliated departments  also include Social Science and Policy Studies Department, Learning Sciences and Technologies Program, Bioinformatics and Computational Biology Program, and the Electrical and Computer Engineering Department. Data Science faculty are comprised of faculty interested in Data Science graduate education and research and who hold advanced degrees.

Industrial Relationships

In collaboration with WPI’s Corporate and Professional Education, the Data Science faculty work with industrial, government and academic partners who serve on an Advisory Board to help shape the WPI Data Science program and its offerings to assure its continued relevancy. In addition, these Advisory Board members provide input on industrial hiring needs, offer projects and internships to Data Science students, and serve as employers of our graduates.

Classes

DS/CS 541: Deep Learning

Credits 3.0

This course will offer a mathematical and practical perspective on artificial neural networks for machine learning. Students will learn about the most prominent network architectures including multilayer feedforward neural networks, convolutional neural networks (CNNs), auto-encoders, recurrent neural networks (RNNs), and generative-adversarial networks (GANs). This course will also teach students optimization and regularization techniques used to train them — such as back- propagation, stochastic gradient descent, dropout, pooling, and batch normalization. Connections to related machine learning techniques and algorithms, such as probabilistic graphical models, will be explored. In addition to understanding the mathematics behind deep learning, students will also engage in hands-on course projects. Students will have the opportunity to train neural networks for a wide range of applications, such as object detection, facial expression recognition, handwriting analysis, and natural language processing.

Prerequisites

Machine Learning (CS 539), and knowledge of Linear Algebra (such as MA 2071) and Algorithms (such as CS 2223

DS/CS 547: Information Retrieval

Credits 3.0

This course introduces the theory, design, and implementation of text-based and Web-based information retrieval systems. Students learn the key concepts and models relevant to information retrieval and natural language processing on large-scale corpus such as the Web and social systems. Topics include vector space model, crawling, indexing, web search, ranking, recommender systems, embedding and language model.

Prerequisites

statistical learning at the level of DS 502/MA 543 and programming skills at the level of CS 5007.

DS/ECE 577: Machine Learning in Cybersecurity

Machine Learning has proven immensely effective in a diverse set of applications. This trend has reached a new high with the application of Deep Learning virtually in any application domain. This course studies the applications of Machine Learning in the sub domain of Cybersecurity by introducing a plethora of case studies including anomaly detection in networks and computing, side-channel analysis, user authentication and biometrics etc. These case studies are discussed in detail in class, and further examples of potential applications of Machine Learning techniques including Deep Learning are outlined. The course has a strong hands-on component, i.e. students are given datasets of specific security applications and are required to perform simulations.

DS/MA 517: Mathematical Foundations for Data Science

Credits 3.0

The foci of this class are the essential statistics and linear algebra skills required for Data Science students. The class builds the foundation for theoretical and computational abilities of the students to analyze high dimensional data sets. Topics covered include Bayes’ theorem, the central limit theorem, hypothesis testing, linear equations, linear transformations, matrix algebra, eigenvalues and eigenvectors, and sampling techniques, including Bootstrap and Markov chain Monte Carlo. Students will use these techniques while engaging in hands-on projects with real data.

Prerequisites

Some knowledge of integral and differential calculus is recommended.

DS 501: Introduction to Data Science

Department
Credits 3.0
Introduction to Data Science provides an overview of Data Science, covering a broad selection of key challenges in and methodologies for working with big data. Topics to be covered include data collection, integration, management, modeling, analysis, visualization, prediction and informed decision making, as well as data security and data privacy. This introductory course is integrative across the core disciplines of Data Science, including databases, data warehousing, statistics, data mining, data visualization, high performance computing, cloud computing, and business intelligence. Professional skills, such as communication, presentation, and storytelling with data, will be fostered. Students will acquire a working knowledge of data science through hands-on projects and case studies in a variety of business, engineering, social sciences, or life sciences domains. Issues of ethics, leadership, and teamwork are highlighted.
Prerequisites

None beyond meeting the Data Science admission criteria.

DS 502/MA 543: Statistical Methods for Data Science

Department
Credits 3.0
Statistical Methods for Data Science surveys the statistical methods most useful in data science applications. Topics covered include predictive modeling methods, including multiple linear regression, and time series, data dimension reduction, discrimination and classification methods, clustering methods, and committee methods. Students will implement these methods using statistical software.
Prerequisites

DS 517/ MA 517, Statistics at the level of MA 2611 and MA 2612 and linear algebra at the level of MA 2071.

DS 503/CS 585: Big Data Management

Department
Credits 3.0
Big Data Management deals with emerging applications in science and engineering disciplines that generate and collect data at unprecedented speed, scale, and complexity that need to be managed and analyzed efficiently. This course introduces the latest techniques and infrastructures developed for big data management including parallel and distributed database systems, map-reduce infrastructures, scalable platforms for complex data types, stream processing systems, and cloud-based computing. Query processing, optimization, access methods, storage layouts, and energy management techniques developed on these infrastructures will be covered. Students are expected to engage in hands-on projects using one or more of these technologies.
Prerequisites

A beginning course in databases at the level of CS 4432 or equivalent knowledge, and programming experience.

DS 504/CS 586: Big Data Analytics

Department
Credits 3.0
Big Data Analytics addresses the obstacle that innovation and discoveries are no longer hindered by the ability to collect data, but by the ability to summarize, analyze, and discover knowledge from the collected data in a scalable fashion. This course covers computational techniques and algorithms for analyzing and mining patterns in large-scale datasets. Techniques studied address data analysis issues related to data volume (scalable and distributed analysis), data velocity (high-speed data streams), data variety (complex, heterogeneous, or unstructured data), and data veracity (data uncertainty). Techniques include mining and machine learning techniques for complex data types, and scale-up and scale-out strategies that leverage big data infrastructures. Real-world applications using these techniques, for instance social media analysis and scientific data mining, are selectively discussed. Students are expected to engage in hands-on projects using one or more of these technologies.
Prerequisites

A beginning course in databases and a beginning course in data mining, or equivalent knowledge, and programming experience.

DS 551: Reinforcement Learning

Department
Credits 3.0

Reinforcement Learning is an area of machine learning concerned with how agents take actions in an environment with a goal of maximizing some notion of “cumulative reward”. The problem, due to its generality, is studied in many disciplines, and applied in many domains, including robotics and industrial automation, marketing, education and training, health and medicine, text, speech, dialog systems, finance, among many others. In this course, we will cover topics including: Markov decision processes, reinforcement learning algorithms, value function approximation, actor-critics, policy gradient methods, representations for reinforcement learning (including deep learning), and inverse reinforcement learning. The course project(s) will require the implementation and application of many of the algorithms discussed in class.

DS 595: Special Topics in Data Science

Department
Credits 3.0
Special Topics in Data Science is course offering that will cover a topic of current interest in detail. This serves as a flexible vehicle to provide a one-time offering of topics of current interest as well as to offer new topics before they are made into a permanent course.
Prerequisites

will vary with topic.

DS 596: Independent Study

Department
Credits 3.0
Independent Study, as the name suggests, is a course that allows a student to study a chosen topic in Data Science under the guidance of a faculty member affiliated with the Data Science program. The student must produce a written report to satisfy the course requirement.

DS 597: Directed Research

Department
Credits 3.0
Directed Research study, conducted under the guidance of a faculty member affiliated with the Data Science Program, investigates the challenges and techniques central to data science, and aims to develop novel approaches and techniques towards solving these challenges. The student who chooses this course must produce a written report to fulfil the course requirement.

DS 598: Graduate Qualifying Project

Department
Credits 3.0
This 3-credit graduate qualifying project, done in teams, can be taken a second time for credit with permission by the instructor, up to a total of 6 credits. The project is to be carried out in cooperation with a sponsor or industrial partner. It must be overseen by a faculty member affiliated with the Data Science Program. This offering integrates theory and practice of Data Science, and includes the utilization of tools and techniques acquired in the Data Science Program. In addition to a written report, this project must be presented in a formal presentation to faculty of the Data Science program and sponsors. Professional development skills, such as communication, teamwork, leadership, and collaboration, along with storytelling, will be practiced.
Prerequisites

DS students should have completed at least 24 credits of the DS MS degree, or consent of the instructor, before starting the GQP project class. DS students seeking to take this course a second time for credits, up to a total of 6 credits, must get the instructor’s approval. Non-DS students must get the instructors approval before taking this course for any number of credits.

DS 599: Master's Thesis in Data Science

Department
Credits 9.0

The Master’s Thesis in Data Science consists of a research and development project worth a minimum of 9 graduate credit hours and is advised by a faculty member affiliated with the Data Science Program. A thesis proposal must be approved by the DS Program Review Board and the student’s advisor, before the student can register for more than three thesis credits. The student must satisfactorily complete a written thesis document, and present the results to the DS faculty in a public presentation.

DS 699: Dissertation Research.

Department
Credits 3.0
Intended for doctoral students admitted to candidacy wishing to obtain research credit toward their dissertations.
Prerequisites

Consent of Dissertation Advisor

DS 5900: Data Science Internship

Department
Credits 0.0
The internship is an elective-credit option designed to provide an opportunity to put into practice the principles studied in previous Data Science courses. Internships will be tailored to the specific interests of the student. Each internship must be carried out in cooperation with a sponsoring organization, generally from off campus and must be approved and advised by a core faculty member in the Data Science program. The internship must include proposal, design and documentation phases. Following the internship, the student will report on his or her internship activities in a mode outlined by the supervising faculty member. Students are limited to counting a maximum of 3 internship credits towards their degree requirements for the M.S. degree in Data Science. We expect a full-time graduate student to take on only part-time (20 hours or less of) internship work during the regular academic semester, while a full-time internship of 40 hours per week is appropriate during the summer semester as long as the student does not take a full class load at the same time. Internship credit cannot be used towards a certificate degree in Data Science. The internship may not be completed at the students current place of employment.
Prerequisites

Registration for internship credit requires prior approval and signature by the academic advisor.

ECE/DS 577: Machine Learning in Cybersecurity

Machine Learning has proven immensely effective in a diverse set of applications. This trend has reached a new high with the application of Deep Learning virtually in any application domain. This course studies the applications of Machine Learning in the sub domain of Cybersecurity by introducing a plethora of case studies including anomaly detection in networks and computing, side-channel analysis, user authentication and biometrics etc. These case studies are discussed in detail in class, and further examples of potential applications of Machine Learning techniques including Deep Learning are outlined. The course has a strong hands-on component, i.e. students are given datasets of specific security applications and are required to perform simulations.