Contents

Schedule

Please check here for the latest course schedule of activities.

Week

Session

Date

Day

Topic

Summary

Assignment

Due

1

1

08/31

Mon

Course Overview & Introduction to the Data Science Lifecycle

In this class we will simply be providing a high level overview of the class. We will introduce the basics of the concepts and the approaches used. Link

This introductory assignment introduces the basics of loading files from a variety of formats and updating a number of different types of objects. It also introduces the concepts of packages. Starter

09/10

1

2

09/03

Thu

Python Basics

This lecture discusses the general strategic impact of data, open data, data encoding, data provenance, data wrangling, includeing merging, aggregation, filtering. Continued introduction to coding includes conditionals, loops, functions, missing values, filtering, group-by. We will also introduce a basic Kaggle model for the Titantic dataset. Link

2

09/07

Mon

Labor day – no class

2

3

09/08

Tue

Python Basics (First in Person Class, Tuesday follow Monday Schedule)

This lecture discusses the general strategic impact of data, open data, data encoding, data provenance, data wrangling. Continued introduction to coding includes Numpy and Pandas Link

This assignment will require you to gain some familiarity with working with a variety of different Python data structures (sets, lists, dictionaries) as well as packages (numpy, pandas) Starter

09/17

2

4

09/10

Thu

Python conditionals, loops, functions, aggregating.

More operationalization of Python basics as they relate to data. Link

3

5

09/14

Mon

Python conditionals, loops, functions, aggregating (continued)

More operationalization of Python basics as they relate to data. Link

This has us create a few different functions and our first simple model. Starter

09/24

3

6

09/17

Thu

Python visualization, data manipulation , and feature creation.

Introduction to visualiation, APIs, web scraping feature creation, and feature creation/extraction. The genaral goal is to get students to the point where they are able to start to do some data manipulation and utilize code they haven’t created (packages, functions) Link

4

7

09/21

Mon

Python visualization, data manipulation , and feature creation (continued)

Introduction to visualiation, APIs, web scraping feature creation, and feature creation/extraction. The genaral goal is to get students to the point where they are able to start to do some data manipulation and utilize code they haven’t created (packages, functions) Link

Some exercises with visualization and web scraping. Starter

10/01

4

8

09/24

Thu

Overview of Modeling

We examine the basics of classess of supervised, unsupervised, reenforcement learning. Also examine overfitting and how cross validation is used for overfitting and how hypterparameters are used to optimize models. Link

5

9

09/28

Mon

Overview of Classification

We examine the basics of classess of supervised, unsupervised, reenforcement learning. Also examine overfitting and how cross validation is used for overfitting and how hypterparameters are used to optimize models. Link

Manipulating data Starter

10/12

5

10

10/01

Thu

Overview of Classification

We examine the basics of classess of supervised, unsupervised, reenforcement learning. Also examine overfitting and how cross validation is used for overfitting and how hypterparameters are used to optimize models. Link

6

11

10/05

Mon

Python and Regression

Regression models similarly a a major type of machine learning application. In this Link

6

12

10/08

Thu

Python and Regression

Lab/homework Link

7

10/12

Mon

Columbus day – no class

7

13

10/15

Thu

Unsupervised Models

Unsupervised models are frequently used to subset data into subpoluations or to generate features. Link

8

14

10/19

Mon

Midterm Exam

Midterm. Available 8:00 AM EST. Due Midnight. Link

8

15

10/22

Thu

Time Series Analysis

Time series and panel data is a bit different and requires a different approach. Here we cover some of the basics. Link

9

16

10/26

Mon

Time Series Analysis

Time series and panel data is a bit different and requires a different approach. Here we cover some of the basics. Link

Unsupervised Starter

11/05

9

16

10/26

Mon

Time Series Analysis

Time series and panel data is a bit different and requires a different approach. Here we cover some of the basics. Link

Project First 3 sections

11/08

9

17

10/29

Thu

Text and NLP

The goal of this class is to investigate basic concepts surrounding text mining. Link

Midterm-Correction: Correct your midterm so that it passes all of the tests. Starter

11/12

10

18

11/02

Mon

Text and NLP

The goal of this class is to investigate basic concepts surrounding text mining. Link

Deep Learning Excel Lab Assignment

11/16

10

19

11/05

Thu

Introduction to Deep Learning

Deep learning with Tensorflow Link

Final Project Presentation

12/06

10

19

11/05

Thu

Introduction to Deep Learning

Deep learning with Tensorflow Link

Final Project

12/13

11

20

11/09

Mon

Introduction to Deep Learning

Deep learning with Tensorflow Link

11

21

11/12

Thu

Introduction to Deep Learning

Deep learning with Tensorflow Link

12

22

11/16

Mon

Image Data and Deep Learning

Image data is different and deep learning has transformed the ability of machines to process image data. In this lecture we will get an overview of image processing and deep learning techniques. Link

12

23

11/19

Thu

NLP and Deep Learning

NLP Data and Deep Learniing Link

13

24

11/23

Mon

R and Machine Learning

The goal is to get you familiar with Spark and the general big data infrastructure. Link

13

11/26

Thu

Thanksgiving

Advanced tools for model search

14

25

11/30

Mon

Big Data

The goal is to get you familiar with Spark and the general big data infrastructure. Link

14

26

12/03

Thu

Open project questions.

The goal is to try to answer some of the questions you have seen that you weren’t sure about. Link

15

27

12/07

Mon

Final Presentations

Please aim for a 5-7 min presentation covering key insights from EDA and modeling, with a focus on modeling. Link

15

28

12/10

Thu

Final Presentations

Please aim for a 5-7 min presentation covering key insights from EDA and modeling, with a focus on modeling. Link

17

29

12/15

Tue

Final Exam

The final exam will be comprehensive. Tuesday, 12/15 11:30-2:30 Link