Overview
Skills & Tools
- Use Python to mine datasets and predict patterns.
Production Standard
- Build statistical models — regression and classification — that generate usable information from raw data.
The Big Picture- Master the basics of machine learning and harness the power of data to forecast what’s next.
Curriculum
Unit 1: Programming Basics
What is Data Science
- Describe course syllabus and establish the classroom environment.
- Answer the questions: "What is Data Science? What roles exist in Data Science?"
- Define the workflow, tools and approaches data scientists use to analyze data.
Your Development Environment
- Navigate through directories using the command line
- Use git and GitHub to share repositories
Python Foundations
- Conduct arithmetic and string operations in Python
- Assign variables
- Implement loops and conditional statements
- Use Python to clean and edit datasets
Unit 2: Research Design and Exploratory Data Analysis
Exploratory Data Analysis
- Use Data
- Frames and Series to read data
- Rename, remove, combine, select, and join data
- Identify and handle null and missing values
Experiments and Hypothesis Testing
- Determine causality and sampling bias
- Test a hypothesis using a sample case study
- Validate your findings using statistical analysis (p-values, confidence intervals)
Data Visualization in Python
- Define key principles of data visualization
- Create line plots, bar plots, histograms and box plots using Seaborn and Matplotlib
Statistics in Python
- Use NumPy and Pandas libraries to analyze datasets using basic summary statistics
- Create data visualization – scatter plots, scatter matrix, line graph, box plots, and histograms – to discern characteristics and trends in a dataset
- Identify a normal distribution within a dataset using summary statistics and visualization
Unit 3: Foundations of Data Modeling
Linear Regression
- Define data modeling and linear regression
- Differentiate between categorical and continuous variables
- Build a linear regression model using a dataset that meets the linearity assumption using the scikit-learn library
Evaluating Model Fit
- Define regularization, bias, and errors metrics
- Evaluate model fit by using loss functions including mean absolute error, mean squared error, root mean squared error
- Select regression methods based on fit and complexity
KNN and Classification
- Define a classification model
- Build a K–Nearest Neighbors using the scikit–learn library
- Evaluate and tune model by using metrics such as classification accuracy⁄error
Logistic Regression
- Build a Logistic regression classification model using the scikit learn library
- Describe the sigmoid function, odds, and odds ratios and how they relate to logistic regression
- Evaluate a model using metrics such as classification accuracy ⁄ error, confusion matrix, ROC ⁄ AOC curves, and loss functions
Unit 4: Machine Learning
Decision Trees and Random Forest
- Describe the difference between classification and regression trees and how to interpret these models
- Explain and communicate the tradeoffs of decision trees vs regression models
- Build decision trees and random forests using the scikit-learn library
Working with API Data
- Access public APIs and get information back
- Read and write data in JSON
- Use the requests library
Natural Language Processing
- Demonstrate how to tokenize natural language text using NLTK
- Categorize and tag unstructured text data
- Explain how to build a text classification model using NLTK
Working with Time Series Data
- Explain why time series data is different than other data and how to account for it
- Create rolling means and plot time series data using the Pandas library
- Perform autocorrelation on time series data
Final Presentations
- Present final presentation to peers, instructor, and guest panelists who will identify strengths and areas for improvement
FAQsWhy is this course relevant today?Given the prevalence of technologies and the amount of data available in the online world about users, products, and the content that we generate, businesses can be making so much more well-informed decisions if this vast amount of data was more deeply analyzed through the use of data science. The data science course provides the tools, methods, and practical experience to enable you to make accurate predictions about data, which ultimately leads to better decision-making in business, and the use of smarter technology (think recommendation systems or targeted ads).
What practical skill sets can I expect to have upon completion of the course?
This course will provide you with technical skills in machine learning, algorithms, and data modeling which will allow you to make accurate predictions about your data. You will be creating your models using Python so you will gain a good grasp of this programming language. Furthermore, you will learn how to parse and clean your data which can take up to 70% of your time as a data scientist.
Whom will I be sitting next to in this course?
Individuals who have a strong interest in manipulating large data sets, finding patterns in data, and making predictions.
Are there any prerequisites?
- A basic understanding of statistics
- A basic understanding of variables, functions, and lists in Python
School Notes:
For students enrolling in 12 week part time and immersive classes, it is not recommended that you book more than one class simultaneously.