- 27th Jun 2024
- 09:41 am
In this project, Our ML experts will help you use real-world COVID-19 data to train machine learning models that can predict future numbers of total covid cases and total covid deaths in Canada. While you have up-to-date COVID-19 data from the entire world at your disposal, the objective is to make accurate predictions for Canada as a whole. You are also expected to explore external data sources which you think can help with the predictions. Keep in mind, however, that you need to include in your submission everything required to run your code.
Unlike the assignments, you are asked to tell a data-driven story in a Jupyter notebook. While making accurate predictions through machine learning is an important part of this project, you will also need to clearly and convincingly communicate which information is most predictive, any patterns you identified in the data, your feature engineering strategy, and justification for your machine learning model design. Your notebook should tell the story step-by-step using both text and code, just like any good Jupyter notebook tutorial you would find online.
Data Source
You will be provided the Our World in Data Covid 19 Dataset. This is a dataset which is aggregated for you and available here: Our World In Data. There github is available here: OWID Navigate to the download button at the bottom of the figure and select the full dataset. Data is only as good as its ground truth, and for unaggregated data you can use the publicly available COVID-19 dataset from the Center for Systems Science and Engineering (CSSE) at Johns Hopkins University which you can find in this GitHub repository. Navigate to “COVID-19/csse_covid_19_data/csse_covid_19_time_series/” and you will find the following data files that you will can use for this project:
- time_series_covid19_confirmed_global.csv
- time_series_covid19_deaths_global.csv
These data files are updated daily (around midnight UTC), but some of the source data is no longer updated daily (therefore there will be no new reported cases or deaths, except in week jumps) Everyday, a new column is added to both data files.
You should include every file your notebook needs to run in your submission. Make sure to pull the most recent data available as close to submission as possible. Optionally, you are free to use data from other external sources if you think they can help with the predictions. Just make sure your notebook executes without any issue. Either submit the external data as part of your submission.
Prediction Output
Your code should print the predictions to the screen in a table and save them to a separate csv file (file name: prediction.csv) in the following format:
Canada |
April 18 |
April 19 |
April 20 |
Total Cases |
|
|
|
Total Deaths |
|
|
|
The above table should be populated with predicted cumulative numbers of either confirmed cases or deaths.
Machine Learning Models - DATA 622 / MDCH 615 - Get Assignment Solution
Please note that this is a sample assignment solved by our Machine Learning Programmers. These solutions are intended to be used for research and reference purposes only. If you can learn any concepts by going through the reports and code, then our Python Tutors would be very happy.
- Option 1 - To download the complete solution along with Code, Report and screenshots - Please visit our Python Assignment Sample Solution page
- Option 2 - Reach out to our Python Tutors to get online tutoring related to this assignment and get your doubts cleared
- Option 3 - You can check the partial solution for this assignment in this blog below
Free Assignment Solution - Machine Learning Models - DATA 622 / MDCH 615
{
"cells": [
{
"cell_type": "markdown",
"id": "3fa692f0",
"metadata": {},
"source": [
"# Importing liblaries"
]
},
{
"cell_type": "code",
"execution_count": 1,
"id": "1ace2aa8",
"metadata": {},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"/tmp/ipykernel_5903/2901762111.py:6: FutureWarning: pandas.Int64Index is deprecated and will be removed from pandas in a future version. Use pandas.Index with the appropriate dtype instead.\n",
" from pandas import MultiIndex, Int64Index\n",
"/home/acer/anaconda3/envs/tf-2.0/lib/python3.9/site-packages/xgboost/compat.py:36: FutureWarning: pandas.Int64Index is deprecated and will be removed from pandas in a future version. Use pandas.Index with the appropriate dtype instead.\n",
" from pandas import MultiIndex, Int64Index\n"
]
}
],
"source": [
"import pandas as pd\n",
"import numpy as np\n",
"import matplotlib.pyplot as plt\n",
"import seaborn as sns\n",
"import datetime\n",
"from pandas import MultiIndex, Int64Index\n",
"\n",
"%matplotlib inline\n",
"\n",
"import xgboost as xgb\n",
"from sklearn.preprocessing import MinMaxScaler\n",
"from sklearn.model_selection import train_test_split"
]
},
{
"cell_type": "markdown",
"id": "cedbcc69",
"metadata": {},
"source": [
"# Loading dataset"
]
},
{
"cell_type": "code",
"execution_count": 2,
"id": "708cbf3f",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"
"\n",
"\n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
"
Province/State | Country/Region | Lat | Long | 1/22/20 | 1/23/20 | 1/24/20 | 1/25/20 | 1/26/20 | 1/27/20 | ... | 4/4/22 | 4/5/22 | 4/6/22 | 4/7/22 | 4/8/22 | 4/9/22 | 4/10/22 | 4/11/22 | 4/12/22 | 4/13/22 | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
40 | Alberta | Canada | 53.9333 | -116.5765 | 0 | 0 | 0 | 0 | 0 | 0 | ... | 540733 | 540733 | 546247 | 546247 | 546247 | 546247 | 546247 | 546247 | 546247 | 552403 |
41 | British Columbia | Canada | 53.7267 | -127.6476 | 0 | 0 | 0 | 0 | 0 | 0 | ... | 356858 | 357500 | 357758 | 357974 | 357974 | 357974 | 357974 | 357974 | 357974 | 357974 |
42 | Diamond Princess | Canada | 0.0000 | 0.0000 | 0 | 0 | 0 | 0 | 0 | 0 | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
43 | Grand Princess | Canada | 0.0000 | 0.0000 | 0 | 0 | 0 | 0 | 0 | 0 | ... | 13 | 13 | 13 | 13 | 13 | 13 | 13 | 13 | 13 | 13 |
44 | Manitoba | Canada | 53.7609 | -98.8139 | 0 | 0 | 0 | 0 | 0 | 0 | ... | 135214 | 135214 | 135214 | 136573 | 136573 | 136573 | 136573 | 136573 | 136573 | 136573 |
"
5 rows × 817 columns
\n","
"
],
"text/plain": [
" Province/State Country/Region Lat Long 1/22/20 1/23/20 \\\n",
"40 Alberta Canada 53.9333 -116.5765 0 0 \n",
"41 British Columbia Canada 53.7267 -127.6476 0 0 \n",
"42 Diamond Princess Canada 0.0000 0.0000 0 0 \n",
"43 Grand Princess Canada 0.0000 0.0000 0 0 \n",
"44 Manitoba Canada 53.7609 -98.8139 0 0 \n",
"\n",
" 1/24/20 1/25/20 1/26/20 1/27/20 ... 4/4/22 4/5/22 4/6/22 4/7/22 \\\n",
"40 0 0 0 0 ... 540733 540733 546247 546247 \n",
"41 0 0 0 0 ... 356858 357500 357758 357974 \n",
"42 0 0 0 0 ... 0 0 0 0 \n",
"43 0 0 0 0 ... 13 13 13 13 \n",
"44 0 0 0 0 ... 135214 135214 135214 136573 \n",
"\n",
" 4/8/22 4/9/22 4/10/22 4/11/22 4/12/22 4/13/22 \n",
"40 546247 546247 546247 546247 546247 552403 \n",
"41 357974 357974 357974 357974 357974 357974 \n",
"42 0 0 0 0 0 0 \n",
"43 13 13 13 13 13 13 \n",
"44 136573 136573 136573 136573 136573 136573 \n",
"\n",
"[5 rows x 817 columns]"
]
},
"execution_count": 2,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"confirmed = pd.read_csv('/home/acer/Desktop/ML_Projects/machine_learning_yashika_14th Apr/time_series_covid19_confirmed_global.csv')\n",
"\n",
"# we select only rows where country is Canada\n",
"\n",
"confirmed_canada = confirmed[confirmed['Country/Region'] == 'Canada']\n",
"\n",
"confirmed_canada.head()"
]
},
{
"cell_type": "code",
"execution_count": 3,
"id": "4c2e444d",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"
"\n",
"\n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
"
Province/State | Country/Region | Lat | Long | 1/22/20 | 1/23/20 | 1/24/20 | 1/25/20 | 1/26/20 | 1/27/20 | ... | 4/4/22 | 4/5/22 | 4/6/22 | 4/7/22 | 4/8/22 | 4/9/22 | 4/10/22 | 4/11/22 | 4/12/22 | 4/13/22 | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
40 | Alberta | Canada | 53.9333 | -116.5765 | 0 | 0 | 0 | 0 | 0 | 0 | ... | 4074 | 4074 | 4104 | 4104 | 4104 | 4104 | 4104 | 4104 | 4104 | 4141 |
41 | British Columbia | Canada | 53.7267 | -127.6476 | 0 | 0 | 0 | 0 | 0 | 0 | ... | 3002 | 3002 | 3002 | 3004 | 3004 | 3004 | 3004 | 3004 | 3004 | 3004 |
42 | Diamond Princess | Canada | 0.0000 | 0.0000 | 0 | 0 | 0 | 0 | 0 | 0 | ... | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 |
43 | Grand Princess | Canada | 0.0000 | 0.0000 | 0 | 0 | 0 | 0 | 0 | 0 | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
44 | Manitoba | Canada | 53.7609 | -98.8139 | 0 | 0 | 0 | 0 | 0 | 0 | ... | 1744 | 1744 | 1744 | 1751 | 1751 | 1751 | 1751 | 1751 | 1751 | 1751 |
"
5 rows × 817 columns
\n","
"
],
"text/plain": [
" Province/State Country/Region Lat Long 1/22/20 1/23/20 \\\n",
"40 Alberta Canada 53.9333 -116.5765 0 0 \n",
"41 British Columbia Canada 53.7267 -127.6476 0 0 \n",
"42 Diamond Princess Canada 0.0000 0.0000 0 0 \n",
"43 Grand Princess Canada 0.0000 0.0000 0 0 \n",
"44 Manitoba Canada 53.7609 -98.8139 0 0 \n",
"\n",
" 1/24/20 1/25/20 1/26/20 1/27/20 ... 4/4/22 4/5/22 4/6/22 4/7/22 \\\n",
"40 0 0 0 0 ... 4074 4074 4104 4104 \n",
"41 0 0 0 0 ... 3002 3002 3002 3004 \n",
"42 0 0 0 0 ... 1 1 1 1 \n",
"43 0 0 0 0 ... 0 0 0 0 \n",
"44 0 0 0 0 ... 1744 1744 1744 1751 \n",
"\n",
" 4/8/22 4/9/22 4/10/22 4/11/22 4/12/22 4/13/22 \n",
"40 4104 4104 4104 4104 4104 4141 \n",
"41 3004 3004 3004 3004 3004 3004 \n",
"42 1 1 1 1 1 1 \n",
"43 0 0 0 0 0 0 \n",
"44 1751 1751 1751 1751 1751 1751 \n",
"\n",
"[5 rows x 817 columns]"
]
},
"execution_count": 3,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"death = pd.read_csv('/home/acer/Desktop/ML_Projects/machine_learning_yashika_14th Apr/time_series_covid19_deaths_global.csv')\n",
"\n",
"# we select only rows where country is Canada\n",
"death_canada = death[death['Country/Region'] == 'Canada']\n",
"\n",
"death_canada.head()"
]
},
{
"cell_type": "markdown",
"id": "cd9b0b16",
"metadata": {},
"source": [
"# Data exploration"
]
},
{
"cell_type": "markdown",
"id": "ee0b3e5d",
"metadata": {},
"source": [
"Time-Series of Confirmed Cases in Mainland Canada\n",
"\n",
"First, a list called provinces_list needs to be extracted from the selected rows, and then be concatenated with the category (e.g. _Confirmed), in order to differentiate from the other two categories (e.g. _Recovered and _Deaths)."
]
},
{
"cell_type": "code",
"execution_count": 4,
"id": "a904791b",
"metadata": {},
"outputs": [],
"source": [
"provinces_list = confirmed[confirmed['Country/Region'] == 'Canada'].iloc[:,0:1].T.values.tolist()[0]\n",
"\n",
"map_output = map(lambda x: x + '_Confirmed', provinces_list)\n",
"list_map_output = list(map_output)\n",
"\n"
]
},
{
"cell_type": "markdown",
"id": "e1b83316",
"metadata": {},
"source": [
"\n",
"\n",
"Next, let's remove the first five rows from the DataFrame df (which are the row#, Province/State, Country/Region, Unnamed:2, and Unnamed:3 columns, and are not needed for time-series charting), specify the index to the matrix, and perform a Transpose to have the date_time index shown as row indices.\n"
]
},
{
"cell_type": "code",
"execution_count": 5,
"id": "6a058b80",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"
"\n",
"\n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
"
date_time | Alberta_Confirmed | British Columbia_Confirmed | Diamond Princess_Confirmed | Grand Princess_Confirmed | Manitoba_Confirmed | New Brunswick_Confirmed | Newfoundland and Labrador_Confirmed | Northwest Territories_Confirmed | Nova Scotia_Confirmed | Nunavut_Confirmed | Ontario_Confirmed | Prince Edward Island_Confirmed | Quebec_Confirmed | Repatriated Travellers_Confirmed | Saskatchewan_Confirmed | Yukon_Confirmed |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
4/9/22 | 546247 | 357974 | 0 | 13 | 136573 | 48197 | 40336 | 11552 | 65991 | 3531 | 1176866 | 28375 | 992646 | 13 | 133256 | 3940 |
4/10/22 | 546247 | 357974 | 0 | 13 | 136573 | 48197 | 40336 | 11552 | 65991 | 3531 | 1176866 | 28375 | 992646 | 13 | 133256 | 3940 |
4/11/22 | 546247 | 357974 | 0 | 13 | 136573 | 48197 | 40336 | 11552 | 65991 | 3531 | 1176866 | 28375 | 992646 | 13 | 133256 | 3940 |
4/12/22 | 546247 | 357974 | 0 | 13 | 136573 | 48197 | 40336 | 11552 | 65991 | 3531 | 1176866 | 28375 | 992646 | 13 | 133256 | 3940 |
4/13/22 | 552403 | 357974 | 0 | 13 | 136573 | 48197 | 41587 | 11682 | 65991 | 3531 | 1216491 | 30604 | 1007003 | 13 | 133256 | 4000 |
"
"
],
"text/plain": [
"date_time Alberta_Confirmed British Columbia_Confirmed \\\n",
"4/9/22 546247 357974 \n",
"4/10/22 546247 357974 \n",
"4/11/22 546247 357974 \n",
"4/12/22 546247 357974 \n",
"4/13/22 552403 357974 \n",
"\n",
"date_time Diamond Princess_Confirmed Grand Princess_Confirmed \\\n",
"4/9/22 0 13 \n",
"4/10/22 0 13 \n",
"4/11/22 0 13 \n",
"4/12/22 0 13 \n",
"4/13/22 0 13 \n",
"\n",
"date_time Manitoba_Confirmed New Brunswick_Confirmed \\\n",
"4/9/22 136573 48197 \n",
"4/10/22 136573 48197 \n",
"4/11/22 136573 48197 \n",
"4/12/22 136573 48197 \n",
"4/13/22 136573 48197 \n",
"\n",
"date_time Newfoundland and Labrador_Confirmed \\\n",
"4/9/22 40336 \n",
"4/10/22 40336 \n",
"4/11/22 40336 \n",
"4/12/22 40336 \n",
"4/13/22 41587 \n",
"\n",
"date_time Northwest Territories_Confirmed Nova Scotia_Confirmed \\\n",
"4/9/22 11552 65991 \n",
"4/10/22 11552 65991 \n",
"4/11/22 11552 65991 \n",
"4/12/22 11552 65991 \n",
"4/13/22 11682 65991 \n",
"\n",
"date_time Nunavut_Confirmed Ontario_Confirmed \\\n",
"4/9/22 3531 1176866 \n",
"4/10/22 3531 1176866 \n",
"4/11/22 3531 1176866 \n",
"4/12/22 3531 1176866 \n",
"4/13/22 3531 1216491 \n",
"\n",
"date_time Prince Edward Island_Confirmed Quebec_Confirmed \\\n",
"4/9/22 28375 992646 \n",
"4/10/22 28375 992646 \n",
"4/11/22 28375 992646 \n",
"4/12/22 28375 992646 \n",
"4/13/22 30604 1007003 \n",
"\n",
"date_time Repatriated Travellers_Confirmed Saskatchewan_Confirmed \\\n",
"4/9/22 13 133256 \n",
"4/10/22 13 133256 \n",
"4/11/22 13 133256 \n",
"4/12/22 13 133256 \n",
"4/13/22 13 133256 \n",
"\n",
"date_time Yukon_Confirmed \n",
"4/9/22 3940 \n",
"4/10/22 3940 \n",
"4/11/22 3940 \n",
"4/12/22 3940 \n",
"4/13/22 4000 "
]
},
"execution_count": 5,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df = confirmed[confirmed['Country/Region'] == 'Canada'].iloc[:,5:].fillna(0)\n",
"df.index = pd.Index(list_map_output, name='date_time')\n",
"df = df.T\n",
"\n",
"df.tail()\n"
]
},
{
"cell_type": "markdown",
"id": "2901a04a",
"metadata": {},
"source": [
"Also, we would need to standardize the date_time string (esp. that the year should be represented as XXXX instead of XX), and then to convert it from a string type to a datetime type:"
]
},
{
"cell_type": "code",
"execution_count": 6,
"id": "461c6216",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"Index(['1/23/20', '1/24/20', '1/25/20', '1/26/20', '1/27/20', '1/28/20',\n",
" '1/29/20', '1/30/20', '1/31/20', '2/1/20',\n",
" ...\n",
" '4/4/22', '4/5/22', '4/6/22', '4/7/22', '4/8/22', '4/9/22', '4/10/22',\n",
" '4/11/22', '4/12/22', '4/13/22'],\n",
" dtype='object', length=812)"
]
},
"execution_count": 6,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df.index"
]
},
{
"cell_type": "code",
"execution_count": 7,
"id": "a079e948",
"metadata": {},
"outputs": [],
"source": [
"df.index = pd.to_datetime(df.index, format='%m/%d/%y', exact = False)"
]
},
{
"cell_type": "markdown",
"id": "5377a9a9",
"metadata": {},
"source": [
"\n",
"\n",
"If the datetime conversion is successful, use the following cell to validate and check how many rows of datetime records are in the dataframe.\n"
]
},
{
"cell_type": "code",
"execution_count": 8,
"id": "a80e2806",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Dataframe shape: (812, 16)\n",
"Number of hours between start and end dates: 19465.0\n"
]
}
],
"source": [
"print(\"Dataframe shape: \", df.shape)\n",
"time_diff = (df.index[-1] - df.index[0])\n",
"print(\"Number of hours between start and end dates: \", time_diff.total_seconds()/3600 + 1)"
]
},
{
"cell_type": "markdown",
"id": "67a85269",
"metadata": {},
"source": [
"\n",
"\n",
"The following will achieve three different plots:\n",
"\n",
" Plotting all the time series on one axis (line-plot)\n",
" Plotting them all on separate subplots to see them more clearly (sharing the x axis)\n",
" Plotting all the time series on one axis (scatterplot)\n",
"\n"
]
},
{
"cell_type": "code",
"execution_count": 9,
"id": "2e6ea6e2",
"metadata": {},
"outputs": [
{
"data": {
"image/png": "text/plain": [
"
"
]
},
"metadata": {
"needs_background": "light"
},
"output_type": "display_data"
}
],
"source": [
"df.plot(figsize=(15,10.5), title='Plotting all the time series on one axis (line-plot)').legend(loc='upper left')\n",
"plt.xlabel('Date Time'); \n",
"text = plt.ylabel('Num of Cases')"
]
},
{
"cell_type": "code",
"execution_count": 10,
"id": "3447b39d",
"metadata": {},
"outputs": [
{
"data": {
"image/png":
"text/plain": [
""
]
},
"metadata": {
"needs_background": "light"
},
"output_type": "display_data"
}
],
"source": [
"ax_array = df.plot(subplots=True, figsize=(15,18))\n",
"for ax in ax_array:\n",
" ax.legend(loc='upper left')\n",
"plt.xlabel('Date Time'); plt.ylabel('Num of Cases')\n",
"text = plt.title('Plotting all time-series on separate subplots (sharing the x axis)', pad=\"-180\", y=2.0, loc=\"center\")\n"
]
},
{
"cell_type": "code",
"execution_count": 11,
"id": "f574ee09",
"metadata": {},
"outputs": [
{
"data": {
"image/png": "text/plain": [
""
]
},
"metadata": {
"needs_background": "light"
},
"output_type": "display_data"
}
],
"source": [
"df.plot(y = list_map_output, linestyle = ':', linewidth=4, \n",
" figsize = (15,10.5), grid = True,\n",
" title = \"Plotting all time series on one axis (scatterplot)\").legend(loc='upper left')\n",
"plt.xlabel('Date Time')\n",
"text = plt.ylabel('Num of Cases')"
]
},
{
"cell_type": "markdown",
"id": "35fb08e5",
"metadata": {},
"source": [
"From the three plots shown above, we can tell that within Canada, Quebec province has the largest number of confirmed COVID-19 cases, preceeded by Ontario province, and Alberta."
]
},
{
"cell_type": "markdown",
"id": "a5c90844",
"metadata": {},
"source": [
"### Death data visualization\n",
"Now the confirmed COVID-19 cases for each province in Canada are shown as above, we are to define a function make_plot that is to help plot other countries/regions, not only for the confirmed cases, but also Deaths cases"
]
},
{
"cell_type": "code",
"execution_count": 12,
"id": "c5b18787",
"metadata": {},
"outputs": [],
"source": [
"def make_plot(df, country_name, category = \"Confirmed\", ref_df = death):\n",
"\n",
" if 'Country/Region' in df.columns:\n",
" provinces_list = df[df['Country/Region'] == country_name].iloc[:,0:1].T.values.tolist()[0]\n",
" else:\n",
" provinces_list = df[df['Country_Region'] == country_name].iloc[:,6:7].T.values.tolist()[0]\n",
" \n",
" map_output = map(lambda x: x + '_' + category, provinces_list)\n",
" list_map_output = list(map_output)\n",
"\n",
" if 'Country/Region' in df.columns:\n",
" death = df[df['Country/Region'] == country_name].iloc[:,5:].fillna(0)\n",
" else:\n",
" death = df[df['Country_Region'] == country_name].iloc[:,11:].fillna(0)\n",
" \n",
" death.index = pd.Index(list_map_output, name='date_time')\n",
" death = death.loc[:, ~death.columns.str.contains('^Unnamed')]\n",
" death = death.T\n",
" death.index = pd.to_datetime(death.index, format='%m/%d/%y', exact = False)\n",
" \n",
" width_multiplier = death.shape[1]/5\n",
"\n",
" death.plot(figsize=(15,2*width_multiplier), \n",
" title='Plotting all the time series on one axis (line-plot)').legend(loc='upper left')\n",
" plt.xlabel('Date Time'); plt.ylabel('Num of Cases')\n",
" \n",
" ax_array = death.plot(subplots=True, figsize=(15,3*width_multiplier))\n",
" for ax in ax_array:\n",
" ax.legend(loc='upper left')\n",
" plt.xlabel('Date Time'); plt.ylabel('Num of Cases')\n",
" text = plt.title('Plotting all time-series on separate subplots (sharing the x axis)', pad=\"-120\",\n",
" y=2.0, loc=\"center\")\n",
" \n",
" death.plot(y=list_map_output, linestyle=':', linewidth=4, \n",
" grid=True, figsize=(15,2*width_multiplier),\n",
" title=\"Plotting all time series on one axis (scatterplot)\").legend(loc='upper left')\n",
" plt.xlabel('Date Time'); plt.ylabel('Num of Cases')\n",
" \n",
" return death"
]
},
{
"cell_type": "code",
"execution_count": 13,
"id": "bb2abcc8",
"metadata": {},
"outputs": [
{
"data": {
"image/png": "text/plain": [
""
]
},
"metadata": {
"needs_background": "light"
},
"output_type": "display_data"
},
{
"data": {
"image/png": "text/plain": [
""
]
},
"metadata": {
"needs_background": "light"
},
"output_type": "display_data"
},
{
"data": {
"image/png": "text/plain": [
""
]
},
"metadata": {
"needs_background": "light"
},
"output_type": "display_data"
}
],
"source": [
"df_death_canada = make_plot(death, \"Canada\", \"Death\")"
]
},
{
"cell_type": "markdown",
"id": "9495b021",
"metadata": {},
"source": [
"# We use OWID dataset to make more clear analysis and prediction"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "471583d0",
"metadata": {},
"outputs": [],
"source": [
" \n",
"owid_data = pd.read_csv('https://covid.ourworldindata.org/data/owid-covid-data.csv')\n",
"\n",
"# we select rows where is canada\n",
"\n",
"owid_data_canada = owid_data[owid_data['location'] == 'Canada']\n",
"\n",
"owid_data_canada.head()"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "336b496a",
"metadata": {},
"outputs": [],
"source": [
"canada = owid_data_canada.copy()\n",
"\n",
"canada.columns"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "15a62635",
"metadata": {},
"outputs": [],
"source": [
"canada.index = canada['date']\n",
"\n",
"canada.head()"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "6d47f2be",
"metadata": {},
"outputs": [],
"source": [
"canada.index = pd.to_datetime(canada.index, format = '%Y-%m-%d', exact = False)\n",
"\n",
"canada.index\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "8f0ce213",
"metadata": {},
"outputs": [],
"source": [
"canada.head()"
]
},
{
"cell_type": "markdown",
"id": "a38a7997",
"metadata": {},
"source": [
"# We fill nan values with 0 "
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "59097b7c",
"metadata": {},
"outputs": [],
"source": [
"canada['new_cases'] = canada['new_cases'].replace(np.nan, 0)\n",
"\n",
"# for whole dataframe\n",
"canada = canada.replace(np.nan, 0)\n",
"\n",
"# inplace\n",
"canada.replace(np.nan, 0, inplace=True)\n",
"\n",
"canada.head()"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "89df1a65",
"metadata": {},
"outputs": [],
"source": [
"canada.tail()"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "3f2c9f53",
"metadata": {},
"outputs": [],
"source": [
"canada.columns"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "552a8c6b",
"metadata": {},
"outputs": [],
"source": [
"canada.corr()"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "16d03646",
"metadata": {},
"outputs": [],
"source": [
"plt.figure(figsize = (20, 20))\n",
"sns.heatmap(canada.corr())"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "e3c6d9d6",
"metadata": {},
"outputs": [],
"source": [
"# we select only important columns from dataset such as new_cases, new_deaths etc.\n",
"\n",
"canada_1 = canada[['total_cases', 'total_deaths', 'new_cases', 'new_deaths', 'new_cases_smoothed', 'new_deaths_smoothed', 'new_tests', 'icu_patients', 'hosp_patients', 'positive_rate', 'new_vaccinations', 'stringency_index', 'excess_mortality_cumulative_absolute', 'excess_mortality']]\n",
"\n",
"canada_1"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "0e81c5cf",
"metadata": {},
"outputs": [],
"source": [
"plt.figure(figsize = (8, 6))\n",
"sns.heatmap(canada_1.corr())"
]
},
{
"cell_type": "markdown",
"id": "3d30c71a",
"metadata": {},
"source": [
"### Splitting data to train and test"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "2c2d0e53",
"metadata": {},
"outputs": [],
"source": [
"train = canada_1[canada_1.index < pd.to_datetime(\"2022-03-20\", format='%Y-%m-%d')]\n",
"test = canada_1[canada_1.index > pd.to_datetime(\"2022-03-20\", format='%Y-%m-%d')]\n",
"\n",
"print(train.shape, test.shape)"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "695df6dc",
"metadata": {},
"outputs": [],
"source": [
"y_train = train[['new_cases', 'new_deaths']]\n",
"y_test = test[['new_cases', 'new_deaths']]\n",
"\n",
"del train['new_cases']\n",
"del train['new_deaths']\n",
"del test['new_cases']\n",
"del test['new_deaths']\n",
"\n",
"scaler = MinMaxScaler()\n",
"x_train = scaler.fit_transform(train.values)\n",
"x_test = scaler.transform(test.values)\n",
"\n",
"print(x_train.shape, x_test.shape)"
]
},
{
"cell_type": "markdown",
"id": "8cc486fc",
"metadata": {},
"source": [
"# XGBRegressor Algorithm\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "c9756445",
"metadata": {},
"outputs": [],
"source": [
"from xgboost import XGBRegressor\n",
"\n",
"rf = XGBRegressor(n_estimators = 1500 , max_depth = 15, learning_rate=0.1)\n",
"rf.fit(x_train, y_train['new_cases'])\n",
"cases_pred = rf.predict(x_test)\n",
"\n",
"rf = XGBRegressor(n_estimators = 1500 , max_depth = 15, learning_rate=0.1)\n",
"rf.fit(x_train, y_train['new_deaths'])\n",
"fatalities_pred = rf.predict(x_test)"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "87a6c631",
"metadata": {},
"outputs": [],
"source": [
"prediction_cases = [test['total_cases'][-3] + cases_pred[-3], test['total_cases'][-2] + cases_pred[-2], test['total_cases'][-1] + cases_pred[-1]]\n",
"\n",
"prediction_deaths = [test['total_deaths'][-3] + cases_pred[-3], test['total_deaths'][-2] + cases_pred[-2], test['total_deaths'][-1] + cases_pred[-1]]\n",
"\n",
"\n",
"prediction = pd.DataFrame([prediction_cases, prediction_deaths], columns=['18th April', '19th April', '20th April'], index = ['Total Cases', 'Total deaths'])\n",
"prediction.head()"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "d1bfa4b5",
"metadata": {},
"outputs": [],
"source": [
"# saving dataframe\n",
"\n",
"prediction.to_csv('prediction.csv')"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "00432790",
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.9.12"
}
},
"nbformat": 4,
"nbformat_minor": 5
}
Get the best COVID-19 Cases in Canada Using ML Models - DATA 622 / MDCH 615 - Free Solution assignment help and tutoring services from our experts now!
About The Author - Jeff Hen
Jeff Hen, a seasoned Data Scientist with advanced degrees in Computer Science and Data Analytics, specializes in machine learning and predictive modeling. In this project, Jeff will guide you in using real-world COVID-19 data to train models predicting future COVID-19 cases and deaths in Canada. With a focus on data-driven storytelling and effective communication, Jeff's expertise will help you master feature engineering, model selection, and data visualization in a compelling Jupyter notebook format.