Intro to Machine LearningUdacity
What you'll learn on the course
Approx. 10 weeks
Assumes 6hrs/wk (work at your own pace)Join thousands of students Course Summary
Machine Learning is a first-class ticket to the most exciting careers in data analysis today. As data sources proliferate along with the computing power to process them, going straight to the data is one of the most straightforward ways to quickly gain insights and make predictions.
Machine learning brings together computer science and statistics to harness that predictive power. It’s a must-have skill for all aspiring data analysts and data scientists, or anyone else who wants to wrestle all that raw data into refined trends and predictions.
This is a class that will teach you the end-to-end process of investigating data through a machine learning lens. It will teach you how to extract and identify useful features that best represent your data, a few of the most important machine learning algorithms, and how to evaluate the performance of your machine learning algorithms.
This course is also a part of our Data Analyst Nanodegree.Why Take This Course?
In this course, you’ll learn by doing! We’ll bring machine learning to life by showing you fascinating use cases and tackling interesting real-world problems like self-driving cars. For your final project you’ll mine the email inboxes and financial data of Enron to identify persons of interest in one of the greatest corporate fraud cases in American history.
When you finish this introductory course, you’ll be able to analyze data using machine learning techniques, and you’ll also be prepared to take our Data Analyst Nanodegree. We’ll get you started on your machine learning journey by teaching you how to use helpful tools, such as pre-written algorithms and libraries, to answer interesting questions.Prerequisites and Requirements
To succeed in this course, you must be proficient at programming in Python and basic statistics. If you need a refresher on any of these topics, you can check out these courses:
Intro to Computer Science (You should know basic data structures and control statements, and be able to write and import functions.)
One additional course that would be nice to have is Intro to Data Science, as this will get you familiar with scientific problem-solving. However, completion of that class isn't required for success. We will also use a tiny bit of git, which you can also learn about on Udacity.
One thing that we don’t require is previous exposure to machine learning. If you’re a machine learning beginner, you’re in the right place.
See the Technology Requirements for using Udacity.What Will I Learn? Projects P5: Identify Fraud from Enron Email Play detective and put your machine learning skills to use by building an algorithm to identify Enron Employees who may have committed fraud based on the public Enron financial and email dataset. Syllabus
You’ll learn how to start with a question and/or a dataset, and use machine learning to turn them into insights.Lessons 1-4: Supervised Classification
Naive Bayes: We jump in headfirst, learning perhaps the world’s greatest algorithm for classifying text.
Support Vector Machines (SVMs): One of the top 10 algorithms in machine learning, and a must-try for many classification tasks. What makes it special? The ability to generate new features independently and on the fly.
Decision Trees: Extremely straightforward, often just as accurate as an SVM but (usually) way faster. The launch point for more sophisticated methods, like random forests and boosting.Lesson 5: Datasets and Questions
Behind any great machine learning project is a great dataset that the algorithm can learn from. We were inspired by a treasure trove of email and financial data from the Enron corporation, which would normally be strictly confidential but became public when the company went bankrupt in a blizzard of fraud. Follow our lead as we wrestle this dataset into a machine-learning-ready format, in anticipation of trying to predict cases of fraud.Lesson 6 and 7: Regressions and Outliers
Regressions are some of the most widely used machine learning algorithms, and rightly share prominence with classification. What’s a fast way to make mistakes in regression, though? Have troublesome outliers in your data. We’ll tackle how to identify and clean away those pesky data points.Lesson 8: Unsupervised Learning
K-Means Clustering: The flagship algorithm when you don’t have labeled data to work with, and a quick method for pattern-searching when approaching a dataset for the first time.Lessons 9-12: Features, Features, Features
Feature Creation: Taking your human intuition about the world and turning it into data that a computer can use.
Feature Selection: Einstein said it best: make everything as simple as possible, and no simpler. In this case, that means identifying the most important features of your data.
Principal Component Analysis: A more sophisticated take on feature selection, and one of the crown jewels of unsupervised learning.
Feature Scaling: Simple tricks for making sure your data and your algorithm play nicely together. Learning from Text: More information is in text than any other format, and there are some effective but simple tools for extracting that information.Lessons 13-14: Validation and Evaluation
Training/testing data split: How do you know that what you’re doing is working? You don’t, unless you validate. The train-test split is simple to do, and the gold standard for understanding your results.
Cross-validation: Take the training/testing split and put it on steroids. Validate your machine learning results like a pro.
Precision, recall, and F1 score: After all this data-driven work, quantify your results with metrics tailored to what is most important to you.Lesson 15: Wrapping it all Up
We take a step back and review what we’ve learned, and how it all fits together.Projects
Mini-project at the end of each lesson
Final project: searching for signs of corporate fraud in Enron data