Streaming Analysis

Introduction

In this notebook, I'll take a look at my streaming viewing history data from Hulu, Netflix, and Prime Video accounts. I will merge these datasets with an additional dataset that contains more information about the streaming titles like run time, genres, and Imdb score. This will enable me to gain more insight into my streaming history.

Changes to the Project Plan

I turned in my Project Plan when I only had my Netflix data. Since then, I've been able to obtain my Hulu and Prime Video data.

My original idea was to compare my streaming data to the top streaming titles but I wasn't able to find a dataset that worked with my data.

Project Challenges

I realized (too late for this project) that Netflix and Prime Video create a new "watch event" every time a stream is paused and played again. This means that my data shows individual titles with artificially high watch counts. I made the decision not to remove the duplicates because doing so would have removed genuine new watch events and I would rather go through the datasets myself to determine which ones to delete.

I removed more data from my streaming viewing history than I would have liked in order to combine them. While Hulu had the best data because it did not count each pause as a new "watch event", the downside was the data came split over multiple PDFs and would have required far more time to clean. I made the decision to stick with the basic data from each streaming service on the basis of time.

I struggled finding an additional dataset that worked with my combined streaming viewing history data. In the end, I stuck with the one that gave me the largest merged dataset. I know I could have used just the three streaming datasets but I wanted the challenge of merging additional data.

Datasets Used

Personal Streaming Viewing History

Hulu
Netflix
Prime Video

Additional dataset

Kaggle - Netflix, Movies, and Popularity

How to Run this Project

Using a Virtual Environment

Clone this repo git clone https://github.com/istarlet/streaming_analysis.git
Create a new folder in the cloned repo called datasets
Download the datasets here and add the downloaded datasets to the datasets folder See note
CD into cloned project folder
Install virtual venv if you don't already have it installed pip install virtualenv
Activate the virtual environment (see intructions here)
Install the requirements.txt file pip install -r requirements.txt
Then run these Juptyer Notebook files: Hulu, Imdb, Netflix, Prime Video BEFORE running the main project notebook Streaming Data

(Note: If you downloaded the dataset as a .zip file, make sure to add the individual datasets to the new datasets folder and not the folder they were zipped in.)

Activating a Virtual Environment

On Mac/Linux

Open the Terminal and create a virtual environment with the command python3 -m venv virtual-env

Activate the virtual environment with the command source virtual-env/bin/activate

On Windows

Open the Command Prompt and create a virtual environment with the command python -m venv virtual-env

Activate the virtual environment with the command virtual-env\Scripts\activate.bat

Deactivate the Virtual Environment

Type deactivate

Python packages used in this project:

datetime
matplotlib
pandas
seaborn

Project Requirements

1. Loading data - Feature 1

Read TWO data files (JSON, CSV, Excel, etc.).

I read in four CSV files.

AshleyViewingActivity.csv
DigitalPrimeVideoViewingHistory.csv
HuluViewingHistoryUpdated.csv
titles.csv

2. Clean and operate on the data while combining them - Feature 2

Clean your data and perform a pandas merge with your two data sets, then calculate some new values based on the new data set.

I cleaned the each dataset in their own Jupyter Notebook.

Hulu Imdb Netflix Prime Video

I then concatonated the Hulu, Netflix, and Prime Video datasets together.

With my new combined streaming dataset, I merged it with the Imdb dataset.

I added new columns to the merged dataset by extracting the day, month, and hour from the "Date Watched" column.

3. Visualize / Present your data - Feature 3

Make 3 matplotlib or seaborn (or another plotting library) visualizations to display your data.

4. Best practices - Feature 4

Utilize a virtual environment and include instructions in your README on how the user should set one up

I created a virtual environment with instructions in the How to Run this Project section.

5. Interpretation of your data - Feature 5

Annotate your code with markdown cells in Jupyter Notebook, write clear code comments, and have a well-written README.md.

In my Jupyter Notebooks, I annotated my code with markdown cells and wrote clear code comments. I have included a README.md.

Name		Name	Last commit message	Last commit date
Latest commit History 199 Commits
.gitignore		.gitignore
README.md		README.md
hulu.ipynb		hulu.ipynb
imdb.ipynb		imdb.ipynb
netflix.ipynb		netflix.ipynb
prime_video.ipynb		prime_video.ipynb
requirements.txt		requirements.txt
streaming_data.ipynb		streaming_data.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Streaming Analysis

Introduction

Changes to the Project Plan

Project Challenges

Datasets Used

Personal Streaming Viewing History

Additional dataset

How to Run this Project

Activating a Virtual Environment

Deactivate the Virtual Environment

Python packages used in this project:

Project Requirements

1. Loading data - Feature 1

2. Clean and operate on the data while combining them - Feature 2

3. Visualize / Present your data - Feature 3

4. Best practices - Feature 4

5. Interpretation of your data - Feature 5

About

Releases

Packages

Languages

istarlet/streaming_analysis

Folders and files

Latest commit

History

Repository files navigation

Streaming Analysis

Introduction

Changes to the Project Plan

Project Challenges

Datasets Used

Personal Streaming Viewing History

Additional dataset

How to Run this Project

Activating a Virtual Environment

Deactivate the Virtual Environment

Python packages used in this project:

Project Requirements

1. Loading data - Feature 1

2. Clean and operate on the data while combining them - Feature 2

3. Visualize / Present your data - Feature 3

4. Best practices - Feature 4

5. Interpretation of your data - Feature 5

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages