Data Science
Faculty
E. A. Rundensteiner, The William Smith Dean's Professor, Computer Science and Program Head, Data Science Program. Ph.D., University of California, Irvine, 1992. Big data systems, big data analytics, visual analytics, machine learning/deep learning, health analytics, AI and fairness.
F. Emdad, Teaching Professor, Data Science Program. Ph.D., Colorado State University, 2007. Business analytics, computational and applied mathematics.
T. Ghoshal, Assistant Teaching Professor, Data Science Program. Ph.D., University of Mississippi, 2020. Feature Engineering, Deep Learning, and Natural Language Processing.
L. Harrison, Associate Professor, Computer Science Department. Ph.D., University of North Carolina, 2013. Data visualization, visual analytics, human computer interaction.
X. Kong, Associate Professor, Computer Science Department, Ph.D., University of Illinois, 2014. Data mining and big data analysis, with emphasis on addressing the data variety issues in biomedical research and social computing, and healthcare analytics.
N. Kordzadeh, Ph.D., University of Texas at San Antonio. Assistant Professor of Information Systems,WPI Business School. Organizational and individual adoption and use of social media in healthcare; business intelligence and analytics with an emphasis on algorithmic fairness and ethical decision-making.
K. Lee, Associate Professor, Computer Science Department, Ph.D., Texas A&M University, 2013. Big data analytics and mining, social computing, and cybersecurity over large-scale networked information systems such as the Web, social media and crowd-based systems.
Y. Li, Associate Professor, Computer Science Department, Ph.D., University of Minnesota, 2013. Ph.D., BUPT, Beijing, China, 2009. Data mining and artificial intelligence with applications in urban computing, smart transportation, and human mobility analysis.
X. Liu, Associate Professor, Computer Science Department. Ph.D., Syracuse University, 2011. Natural language processing, deep learning, information retrieval, data science, and computational social sciences.
O. Mangoubi, Assistant Professor, Mathematical Science Department. PhD, Massachusetts Institute of Technology, 2016. Optimization, Machine learning, Statistical algorithms.
F. Murai, Assistant Professor, Data Science Program, Computer Science Department. Ph.D. University of Massachusetts, Amherst, 2016. Application of mathematical modeling, statistics and machine learning to computer, informational and social networks.
C. Ngan, Assistant Teaching Professor, Data Science Program, Ph.D., George Mason University, 2013. Time Series Analysis, Decision Guidance and Support Systems.
R. C. Paffenroth, Associate Professor, Mathematical Sciences Department, Ph.D., University of Maryland, 1999. Large scale data analytics, statistical machine learning, compressed sensing, network analysis.
C. Ruiz, Professor, Computer Science Department. Ph.D., University of Maryland, 1996. Data mining, machine learning, artificial intelligence, health, clinical medicine.
R. Shraga, Assistant Professor, Data Science Program, Computer Science Department. Ph.D., Technion - Israel Institute of Technology, 2020. Database systems, data discovery and integration, applied machine/deep Learning, human-in-the-loop, human-AI collaboration, information retrieval.
D. M. Strong, Professor and Department Head, WPI Business School. Ph.D., Carnegie Mellon University, 1988. Healthcare and business data analytics, computing applications in organizations.
A. C. Trapp, Associate Professor, WPI Business School. Ph.D., University of Pittsburgh, 2011. Mathematical optimization and analytics with applications to benefit society, focusing on improving outcomes of vulnerable populations.
R. Zekavat, Professor, Physics Department, Ph.D., Colorado state University, 2002. Statistical Signal Processing, Sensor Data Analysis and Machine Learning.
J. Zou, Associate Professor, Mathematical Sciences Department. Ph.D., University of Connecticut, 2009. Financial time series and spatial statistics with applications to epidemiology, public health and climate change.
Collaborative Faculty
E. O. Agu, Professor, Computer Science Department. Ph.D., University of Massachusetts, 2001. Mobile and ubiquitous health, machine and deep learning applications, and computer graphics.
A. Arnold, Assistant Professor; Ph.D ., Case Western University, 2014 . Mathematical biology, bayesian inference, parameter estimation in biological systems .
D. Brown, III, Professor and Department Head, Department of Electrical and Computer Engineering. Ph.D., Cornell University, 2000. Communication systems and networking, signal processing, information theory.
S. Djamasbi, Professor, WPI Business School. Ph.D., University of Hawaii, 2004. Management information systems.
L. Fichera, Assistant Professor; Ph.D., University of Genoa/Italian Institute of Technology.Continuum robotics, medical robotics, surgical robotics, image-guided surgery, laser-based surgery, medical devices.
T. Guo, Assistant Professor, Computer Science; Ph.D., University of Massachusetts Amherst, 2016. Distributed systems, cloud computing, data-intensive systems.
N. T. Heffernan, Professor, Computer Science Department and Co-Director Learning Sciences and Technologies. Ph.D., Carnegie Mellon University, 2001. Educational data mining, Machine Learning applied to educational context. A/B testing.
X. Huang, Professor, Department of Electrical and Computer Engineering. Ph.D., Virginia Tech, 2001. Reconfigurable computing, ubiquitous computing and RFID.
D. Korkin, Professor in Computer Science, and BCB Program Director; Ph.D., University of New Brunswick, Canada, 2003. Big data analytics in life sciences, machine learning and its applications, visualization of complex biological data, network science, bioinformatics and personalized medicine.
R. Neamtu, Associate Teaching Professor, Computer Science; Ph.D., Worcester Polytechnic Institute.
R. Y. Lui, Professor; Ph .D ., University of Minnesota, 1981 . Mathematical biology, partial differential equations
D. Reichman, Assistant Professor; Ph .D ., Weizmann Institute, 2014 . Algorithms, Machine Learning, Artificial Intelligence .
A. Sales, Assistant Professor, Mathematical Sciences. Ph.D., University of Michigan, 2013. Methods for causal inference using administrative or high-dimensional data, especially in education.
H. Walker, Professor Emeritus, Mathematical Sciences. Ph.D., New York University, 1970. Computational mathematics, numerical methods for systems of linear and nonlinear equations.
J. R. Whitehill, Assistant Professor, Computer Science Department. Ph.D., University of California, San Diego, 2012. Machine learning, crowdsourcing, automated teaching, human behavior recognition.
Z. Wu, Associate Professor, Mathematical Sciences. Ph.D., Yale University, 2009. Big data statistical analytics, bioinformatics.
Z. Zhang, Associate Professor; Ph.D., Brown University, 2014, Shanghai University, 2011.Numerical analysis, scientific computing, computational and applied mathematics, uncertainty qualification.
Adjunct Faculty
Mohamed Eltabakh, Associate Professor, Computer Science Department. Ph.D., Purdue University, 2010. Database management systems and information management, query processing and optimization, indexing techniques, scientific data management, and big data analytics.
Feifan Liu, Assistant Professor, UMass Chan Medical School, Department Population and Quantitative Health Sciences, Ph.D., Health sciences, natural language processing, deep learning.
Faculty Research
Our faculty work in many areas related to Data Science, including in:
- Big data and high performance analytics
- Bioinformatics and genomic data bases
- Business intelligence and predictive analytics
- Cybersecurity analytics
- Cryptography and data security
- Educational data mining
- Financial decision making
- Healthcare data analytics
- Internet big data analysis
- Large-scale data management and infrastructures
- Machine learning, data mining & knowledge discovery
- Natural language processing
- Signal processing and information theory
- Social media analytics
- Statistical learning
- Visual and numerical analysis of large data sets
Program of Study
The WPI Data Science (DS) program offers graduate studies toward an M.S., B.S./M.S. and Ph.D. Degree as well as a Certificate in Data Science. This Data Science program educates professionals, Data Scientists, with interdisciplinary skills in analytics, computing, statistics, and business intelligence. Key skills include the ability to recognize problems that can be solved with data analytics, apply the appropriate technologies on a given data problem, and communicate those solutions effectively to relevant stakeholders. Our faculty, together with our industrial partners, provide students with the resources and opportunities to engage in practical, purpose-driven projects, formal course work, and mentored interdisciplinary research work. This Data Science program requires advanced, in-depth course work in business, innovation, data analytics, computing, and statistical foundations. The program is designed to provide focused study in an area of interest to the student, ranging from general data analytics, computing, mathematical analytics, and business analytics, to specialized concentrations in financial analytics, healthcare analytics, biomedical analytics, analytics for sustainability, and learning sciences, among others. Due to their increased interdisciplinary perspective, our graduates will have a clear competitive advantage over professionals who are trained in a single discipline, such as business administration, statistics, or computer science, and who are seeking to work in the data analytic industry. As such, they will be poised to successfully become leaders in Data Science, helping to formalize and realize its vision.
The graduate degree program in Data Science is designed to produce the future generation of data scientists who are proficient in their ability to:
- Assess the suitability of, apply, and advance state-of-art data analytics tools and methods from data analysis, statistics, data mining, data management, computational thinking, big data algorithms, and visualization to bring about transformative solutions to important real-world problems across a number of domains.
- Bring to bear their integrative, interdisciplinary knowledge and skills in the core disciplines central to Data Science (Computing, Statistics, and Business) to understand and then to explain analytics results and their applicability and validity to those responsible for solving real-world problems.
- Serve as visionary leaders and project managers in data analytics, with the technical, and professional knowledge and skills needed for the current and future career demands of data scientists working on impactful projects.
Admissions Requirements
Students applying to the graduate degree program in Data Science (DS) are expected to have a bachelor’s degree with a strong quantitative and computational background including coursework in programming, data structures, algorithms, univariate and multivariate calculus, linear algebra and introductory statistics. Students with a bachelor’s degrees in computer science, mathematics, business, engineering and quantitative sciences would typically qualify if they meet the above background requirements. A strong applicant who is missing necessary data science background may be admitted with the expectation that he or she will take the Data Science transition courses as needed, which include CS5007 if missing programming and algorithms background, and DS517 if missing statistics background. Credits for these transition courses count towards the M.S. degree. Students applying to the Certificate in Data Science are expected to meet the same qualifications described above.
Affiliated Departments and Programs
This is a joint program administered by the Computer Science Department, Mathematical Sciences Department, and the WPI Business School. Closely affiliated departments also include Social Science and Policy Studies Department, Learning Sciences and Technologies Program, Bioinformatics and Computational Biology Program, and the Electrical and Computer Engineering Department. Data Science faculty are comprised of faculty interested in Data Science graduate education and research and who hold advanced degrees.
Industrial Relationships
In collaboration with WPI’s Graduate & Professional Studies, the Data Science faculty work with industrial, government and academic partners who serve on an Advisory Board to help shape the WPI Data Science program and its offerings to ensure its continued relevancy. In addition, these Advisory Board members provide input on industrial hiring needs, offer projects and internships to Data Science students, and serve as employers of our graduates.
Other Data Science Programs:
-
Certificate in Data Science, Certificate -
M.S. in Data Science, Master of Science -
Ph.D. in Data Science, Ph.D.
Classes
CS551/DS 551: Reinforcement Learning
Reinforcement Learning is an area of machine learning concerned with how agents take actions in an environment with a goal of maximizing some notion of “cumulative reward”. The problem, due to its generality, is studied in many disciplines, and applied in many domains, including robotics and industrial automation, marketing, education and training, health and medicine, text, speech, dialog systems, finance, among many others. In this course, we will cover topics including: Markov decision processes, reinforcement learning algorithms, value function approximation, actor-critics, policy gradient methods, representations for reinforcement learning (including deep learning), and inverse reinforcement learning. The course project(s) will require the implementation and application of many of the algorithms discussed in class.
Machine Learning (CS 539), statistical learning at the level of DS 502/MA 543, and programming skills at the level of CS 5007.
CS 552/DS 552: Generative Artificial Intelligence
Generative Artificial Intelligence (Gen-AI) is a class of machine learning models that generate new data (text, images, faces, voice, artwork) that is near indistinguishable from the equivalent real data typically generated by humans. These models are trained based on realistic example data sets from the real world. This course covers the underlying fundamentals of generative models. It also introduces the design and modeling of some of the modern generative models: Variational Autoencoders (VAEs), Generative Adversarial Networks (GANs), Diffusion models, ChatGPT, Large Language Models, to name a few. Several applications will be discussed, ranging from image generation for engineering or science applications to the utilization of generated data for data augmentation in AI systems. Ethical concerns related to the danger of these generative technologies concerning issues from misinformation, bias, to data ownership are reviewed.
Core artificial intelligence classes, such as machine learning and deep learning, or equivalent background is highly recommended.
CS 553/DS 553: Machine Learning Development and Operations
This course teaches students the computational skills required in the fields of Artificial Intelligence (AI) and Data Science. As data-driven decision-making and AI applications continue to transform industries, proficiency in programming and machine learning tools is important. In this course, you will develop a strong foundation in programming languages commonly used in AI and Data Science (such as Python). This course will cover the development, debugging, deployment, and subsequent monitoring phases of models in end-to-end pipelines core to machine learning systems. You will also familiarize yourself with popular libraries, frameworks and debugging on IDEs, such as PyCharm, PyTorch, scikit-learn, and/or pandas. Possible topics may include practice code development with a copilot as well as deployment of models on a cloud computing environment The student will engage in hands-on projects to practice their programming skills to solve realworld AI and Data Science problems.
Basic understanding of programming concepts, and preferably some python knowledge.
CS 554/DS 554: Natural Language Processing
Natural Language Processing (NLP) is an interdisciplinary field at the intersection of artificial intelligence, linguistics, and computer science, dedicated to enabling computers to understand, interpret, and generate human language. NLP underpins advancements in human-computer interaction, information retrieval, sentiment analysis, chatbots, and a multitude of other applications. The course may cover a wide range of topics, including language modeling, sequence-to-sequence architectures, sentiment analysis, machine translation, and advanced techniques for natural language understanding and generation, providing a comprehensive foundation for NLP expertise.
Programming skills at the level of CS 5007.
CS 555/DS 555: Responsible Artificial Intelligence
Artificial Intelligence (AI) algorithms have a significant impact on people’s lives. In this course, we discuss social responsibility around data privacy, bias in data and decision-making, policies as guardrails, fairness and transparency in the context of applying AI algorithms. Case studies considering societal challenges caused by AI technologies may include AI-based hiring recommendations stemming from societal biases present in training datasets, AI-empowered selfdriving cars behaving in a dangerous manner when encountering atypical road conditions, digital health applications inadvertently revealing private patient information, or large language models like chat-GPT generating incorrect or harmful responses. This course also studies AI-based algorithmic solutions to some of these challenges. These include the design of robust machine learning algorithms with constraints to ensure fairness, privacy, and safety. Strategies for how to apply these methods to design safe and fair AI are introduced. Topics may include min-max optimization with applications to training machine learning models robust to adversarial attacks, stochastic methods for preserving privacy of sensitive data, and multi-agent machine learning models for reducing algorithmic bias and polarization in recommender systems.
Machine Learning at the graduate level, undergraduate level (CS 4342), or equivalent knowledge.
CS 594/DS 594: Graduate Qualifying Project in Artificial Intelligence
This 3-credit graduate qualifying project, typically done in teams, provides a capstone experience in applying Artificial Intelligence skills to a real-world problem. It will be carried out in cooperation with an industrial sponsor, and is approved and overseen by a core or collaborative faculty member in the Artificial Intelligence Program. This offering integrates theory and practice of Artificial Intelligence, and includes the utilization of tools and techniques acquired in the Artificial Intelligence Program to a real-world problem. In addition to a written report, this project must be presented in a formal presentation to faculty of the AI program and sponsors. Professional development skills, such as communication, teamwork, leadership, and collaboration, will be practiced. This course is a degree requirement for the Master of Science in Artificial Intelligence (MS-AI) and may not be taken before completion of 21 credits in the program. Students outside the MS-AI program must get the instructor’s approval before.
Completion of at least 24 credits of the AI degree, or consent of the instructor. With permission of the instructor, the GQP can be taken a 2nd time for a total of 6 credits.
DS/CS 541: Deep Learning
This course will offer a mathematical and practical perspective on artificial neural networks for machine learning. Students will learn about the most prominent network architectures including multilayer feedforward neural networks, convolutional neural networks (CNNs), auto-encoders, recurrent neural networks (RNNs), and generative-adversarial networks (GANs). This course will also teach students optimization and regularization techniques used to train them — such as back- propagation, stochastic gradient descent, dropout, pooling, and batch normalization. Connections to related machine learning techniques and algorithms, such as probabilistic graphical models, will be explored. In addition to understanding the mathematics behind deep learning, students will also engage in hands-on course projects. Students will have the opportunity to train neural networks for a wide range of applications, such as object detection, facial expression recognition, handwriting analysis, and natural language processing.
Machine Learning (CS 539), and knowledge of Linear Algebra (such as MA 2071) and Algorithms (such as CS 2223
DS/CS 547: Information Retrieval
This course introduces the theory, design, and implementation of text-based and Web-based information retrieval systems. Students learn the key concepts and models relevant to information retrieval and natural language processing on large-scale corpus such as the Web and social systems. Topics include vector space model, crawling, indexing, web search, ranking, recommender systems, embedding and language model.
statistical learning at the level of DS 502/MA 543 and programming skills at the level of CS 5007.
DS/ECE 577: Machine Learning in Cybersecurity
Machine Learning has proven immensely effective in a diverse set of applications. This trend has reached a new high with the application of Deep Learning virtually in any application domain. This course studies the applications of Machine Learning in the sub domain of Cybersecurity by introducing a plethora of case studies including anomaly detection in networks and computing, side-channel analysis, user authentication and biometrics etc. These case studies are discussed in detail in class, and further examples of potential applications of Machine Learning techniques including Deep Learning are outlined. The course has a strong hands-on component, i.e. students are given datasets of specific security applications and are required to perform simulations.
DS/MA 517: Mathematical Foundations for Data Science
The foci of this class are the essential statistics and linear algebra skills required for Data Science students. The class builds the foundation for theoretical and computational abilities of the students to analyze high dimensional data sets. Topics covered include Bayes’ theorem, the central limit theorem, hypothesis testing, linear equations, linear transformations, matrix algebra, eigenvalues and eigenvectors, and sampling techniques, including Bootstrap and Markov chain Monte Carlo. Students will use these techniques while engaging in hands-on projects with real data.
Some knowledge of integral and differential calculus is recommended.
DS 501: Introduction to Data Science
None beyond meeting the Data Science admission criteria.
DS 502/MA 543: Statistical Methods for Data Science
DS 517/ MA 517, Statistics at the level of MA 2611 and MA 2612 and linear algebra at the level of MA 2071.
DS 503/CS 585: Big Data Management
A beginning course in databases at the level of CS 4432 or equivalent knowledge, and programming experience.
DS 504/CS 586: Big Data Analytics
A beginning course in databases and a beginning course in data mining, or equivalent knowledge, and programming experience.
DS 595: Special Topics in Data Science
will vary with topic.
DS 596: Independent Study
DS 597: Directed Research
DS 598: Graduate Qualifying Project
DS students should have completed at least 24 credits of the DS MS degree, or consent of the instructor, before starting the GQP project class. DS students seeking to take this course a second time for credits, up to a total of 6 credits, must get the instructor’s approval. Non-DS students must get the instructors approval before taking this course for any number of credits.
DS 599: Master's Thesis in Data Science
The Master’s Thesis in Data Science consists of a research and development project worth a minimum of 9 graduate credit hours and is advised by a faculty member affiliated with the Data Science Program. A thesis proposal must be approved by the DS Program Review Board and the student’s advisor, before the student can register for more than three thesis credits. The student must satisfactorily complete a written thesis document, and present the results to the DS faculty in a public presentation.
DS 699: Dissertation Research.
Consent of Dissertation Advisor
DS 5006: Machine Learning for Engineering and Science Applications
This course surveys the application of data science (DS) and machine learning (ML) to problems arising in engineering and the sciences. While DS and ML have profoundly affected domains such as image understanding and natural language processing, ML has seen comparatively less impact in chemistry, physics, chemical engineering, electrical engineering, and many other important application domains. Topics covered will include predictive modeling, feature engineering, and model assessment, with a particular focus on the small-data limit. We will analyze and apply algorithms with wide applicability in engineering and sciences including classic techniques such as multiple linear regression and random forests, and state-of-the-art techniques such as deep neural networks.
The intention is for the class to be accessible to a wide audience in disciplines outside of Computer Science and Data Science, though some basic background topics such as statistics or linear algebra, and the ability to learn Python programming at a basic level would be helpful.
DS 5900: Data Science Internship
Registration for internship credit requires prior approval and signature by the academic advisor.
ECE 556/CS 556/DS 556: On-Device Deep Learning
Deep Learning, a core of modern Artificial Intelligence, is rapidly expanding to resourceconstrained devices, including smartphones, wearables, and intelligent embedded systems for improving response time, privacy, and reliability. This course focuses on bringing these powerful deep-learning applications from central data centers and large GPUs to distributed ubiquitous systems. On-Device Deep Learning is an interdisciplinary topic at the intersection of artificial intelligence and ubiquitous systems, dedicated to enabling computing on edge devices. This course includes a wide range of topics related to deep learning in resource constrained settings including pruning and sparsity, quantization, neural architecture search, knowledge distillation, on-device training and transfer learning, distributed training, gradient compression, federated learning, efficient data movement and accelerator design, dynamic network inference, and advanced compression and approximation techniques for enabling on-device deep neural network inference and training. This course provides a comprehensive foundation for cutting-edge “tinyML” expertise
The students should have an introductory undergraduate-level or graduate-level introductory background in machine learning and deep neural networks.