Data are increasingly cheap and ubiquitous. We are now digitizing historical data and collecting myriad new types of data from web logs, mobile devices, sensors, instruments, and transactions. At the same time, new technologies are emerging to organize, analyze, and make sense of this avalanche of data in order to improve decision making and create commercial and social value. The rise of big data and data science has the potential to deepen our understanding of phenomena ranging from physical and biological systems to human social and economic behavior. Accordingly, there is significant and growing demand for data-savvy professionals in businesses, public agencies, and nonprofits, which is reflected by rapidly rising salaries for data engineers, data scientists, statisticians, and data analysts. Data science is the study of the generalizable extraction of knowledge from data, yet the key word is science. It incorporates varying elements and builds on techniques and theories from many fields, including mathematics, probability models, machine learning, statistical learning, computer programming, data engineering, pattern recognition and learning, visualization, data warehousing, etc., with the goal of extracting useful meanings from data and creating data products. This course provides a comprehensive introduction to data science. It covers the foundational principles of data science including statistical inference and exploratory data analysis, machine learning algorithms, data visualization, and big data. Concepts are explained in the context of real life examples. The course includes hands-on exercises using open source software platforms.
Prerequisite: IS 633 or experience in database design and query processing.