10. Kaggle Baseline¶

10.1. Running Code using Kaggle Notebooks¶

Kaggle utilizes Docker to create a fully functional environment for hosting competitions in data science.
You could download/run this locally or view the published version and fork it.
Kaggle has created an incredible resource for learning analytics. You can view a number of toy examples that can be used to understand data science and also compete in real problems faced by top companies.

!wget https://raw.githubusercontent.com/rpi-techfundamentals/spring2019-materials/master/input/train.csv
!wget https://raw.githubusercontent.com/rpi-techfundamentals/spring2019-materials/master/input/test.csv

import numpy as np 
import pandas as pd 

# Input data files are available in the "../input/" directory.
# Let's input them into a Pandas DataFrame
train = pd.read_csv("train.csv")
test  = pd.read_csv("test.csv")

10.2. `train` and `test` set on Kaggle¶

The train file contains a wide variety of information that might be useful in understanding whether they survived or not. It also includes a record as to whether they survived or not.
The test file contains all of the columns of the first file except whether they survived. Our goal is to predict whether the individuals survived.

train.head()

test.head()

10.3. Baseline Models: No Survivors¶

The Titanic problem is one of classification, and often the simplest baseline of all 0/1 is an appropriate baseline.
Think of the baseline as the simplest model you can think of that can be used to lend intuition on how your model is working.
Even if you aren’t familiar with the history of the tragedy, by checking out the Wikipedia Page we can quickly see that the majority of people (68%) died.
As a result, our baseline model will be for no survivors.

test["Survived"] = 0

submission = test.loc[:,["PassengerId", "Survived"]]

submission.head()

10.4. Write to CSV¶

The code below will write your dataframe to a CSV.

submission.to_csv('everyone_dies.csv', index=False)

10.5. Download from Colab¶

Working on colab requires you to download a file via a google specific package.

from google.colab import files
files.download('everyone_dies.csv')

10.6. The First Rule of Shipwrecks¶

You may have seen it in a movie or read it in a novel, but women and children first has at it’s roots something that could provide our first model.
Now let’s recode the Survived column based on whether was a man or a woman.
We are using conditionals to select rows of interest (for example, where test[‘Sex’] == ‘male’) and recoding appropriate columns.

#Here we can code it as Survived, but if we do so we will overwrite our other prediction. 
#Instead, let's code it as PredGender

test.loc[test['Sex'] == 'male', 'PredGender'] = 0
test.loc[test['Sex'] == 'female', 'PredGender'] = 1
#test.PredGender.astype(int)
test

submission = test.loc[:,['PassengerId', 'PredGender']]
# But we have to change the column name.
# Option 1: submission.columns = ['PassengerId', 'Survived']
# Option 2: Rename command.
submission.rename(columns={'PredGender': 'Survived'}, inplace=True)

10.7. Writeout and then Download your File¶

Try your first submission to Kaggle!

submission.to_csv('women_survive.csv', index=False)

from google.colab import files
files.download('women_survive.csv')

MGMT 4190/6560 Introduction to Machine Learning Applications @Rensselaer

Introduction to Python - Kaggle Baseline

rpi.analyticsdojo.com

10. Kaggle Baseline¶

10.1. Running Code using Kaggle Notebooks¶

10.2. `train` and `test` set on Kaggle¶

10.3. Baseline Models: No Survivors¶

10.4. Write to CSV¶

10.5. Download from Colab¶

10.6. The First Rule of Shipwrecks¶

10.7. Writeout and then Download your File¶

MGMT 4190/6560 Introduction to Machine Learning Applications @Rensselaer

Introduction to Python - Kaggle Baseline

rpi.analyticsdojo.com

10. Kaggle Baseline¶

10.1. Running Code using Kaggle Notebooks¶

10.2. train and test set on Kaggle¶

10.3. Baseline Models: No Survivors¶

10.4. Write to CSV¶

10.5. Download from Colab¶

10.6. The First Rule of Shipwrecks¶

10.7. Writeout and then Download your File¶

10.2. `train` and `test` set on Kaggle¶