
- 28th Jun 2024
- 18:24 pm
This project aims to build polynomial regression model to discover the hidden information from a large amount of data. With our experts guidance, you'll master the intricacies of polynomial regression and gain hands-on experience with real-world applications. Our comprehensive tutorials will walk you through every step, from data preprocessing to model evaluation. Now we are working in the human resources department of a company. Recently, we interviewed a new person who is very competent and has the ability to meet our job requirements. So, we would like to give him an offer and a position. But we need to decide the salary for this new person. He has more than 20 years of working experience and has been working as Region Manager for 2 years. He claims that his current salary is AUD 160,000, the salary in our offer should not be less than AUD 160,000. We need to verify that if the salary is true or not. We have a salary structure table for current market, including position, level, and salary. Therefore, we need to find hidden relationships between the data to help us determine if the salary is real, so that we can offer the right offer to new hires. This can be done by building accurate models. Finally, use the model to predict his future salary and verify the salary of other candidates in the future.
Build Polynomial Regression Model Using Python - Free Assignment Solution
Please note that this is a sample assignment solved by our Python Programmers. These solutions are intended to be used for research and reference purposes only. If you can learn any concepts by going through the reports and code, then our Python Tutors would be very happy.
- Option 1 - To download the complete solution along with Code, Report and screenshots - Please visit our Python Assignment Sample Solution page
- Option 2 - Reach out to our Python Tutors to get online tutoring related to this assignment and get your doubts cleared
- Option 3 - You can check the partial solution for this assignment in this blog below
Free Assignment Solution - Build Polynomial Regression Model Using Python
{
"nbformat": 4,
"nbformat_minor": 0,
"metadata": {
"colab": {
"name": "polynomial.ipynb",
"provenance": [],
"collapsed_sections": []
},
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.7.3"
}
},
"cells": [
{
"cell_type": "markdown",
"metadata": {
"id": "aHny0bvYZ_94"
},
"source": [
"__Importing Packages__"
]
},
{
"cell_type": "code",
"metadata": {
"id": "jlrl6sSRHACM"
},
"source": [
"import numpy as np\n",
"import pandas as pd\n",
"import matplotlib.pyplot as plt \n",
"import seaborn as sns \n",
"\n",
"import warnings\n",
"warnings.filterwarnings(\"ignore\")"
],
"execution_count": 1,
"outputs": []
},
{
"cell_type": "markdown",
"metadata": {
"id": "SIzORG9fZ_-B"
},
"source": [
"__Importing the Dataset__"
]
},
{
"cell_type": "code",
"metadata": {
"id": "TUH-MWc0H1nD",
"outputId": "3254fdf4-d5f4-445d-cedf-ee29365e6c65",
"colab": {
"base_uri": "https://localhost:8080/",
"height": 223
}
},
"source": [
"data = pd.read_csv(\"Employee_Salaries_2020.csv\")\n",
"print(\"Our data-set have {} rows and {} columns.\" .format(data.shape[0], data.shape[1]))\n",
"data.columns = [col.replace(\" \", \"_\") for col in data.columns]\n",
"data.head()"
],
"execution_count": 2,
"outputs": [
{
"output_type": "stream",
"name": "stdout",
"text": [
"Our data-set have 9958 rows and 8 columns.\n"
]
},
{
"output_type": "execute_result",
"data": {
"text/plain": [
" Department Department_Name Division Gender \\\n",
"0 ABS Alcohol Beverage Services Wholesale Administration F \n",
"1 ABS Alcohol Beverage Services Administrative Services F \n",
"2 ABS Alcohol Beverage Services Administration M \n",
"3 ABS Alcohol Beverage Services Wholesale Operations F \n",
"4 ABS Alcohol Beverage Services Administration F \n",
"\n",
" Base_Salary 2020_Overtime_Pay 2020_Longevity_Pay Level \n",
"0 78902.0 199.17 0.00 18 \n",
"1 35926.0 0.00 4038.91 16 \n",
"2 167345.0 0.00 0.00 M2 \n",
"3 90848.0 0.00 5717.68 21 \n",
"4 78902.0 205.16 2460.24 18 "
],
"text/html": [
"\n",
"
"
"
"\n",
"\n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
"
Department | Department_Name | Division | Gender | Base_Salary | 2020_Overtime_Pay | 2020_Longevity_Pay | Level | |
---|---|---|---|---|---|---|---|---|
0 | ABS | Alcohol Beverage Services | Wholesale Administration | F | 78902.0 | 199.17 | 0.00 | 18 |
1 | ABS | Alcohol Beverage Services | Administrative Services | F | 35926.0 | 0.00 | 4038.91 | 16 |
2 | ABS | Alcohol Beverage Services | Administration | M | 167345.0 | 0.00 | 0.00 | M2 |
3 | ABS | Alcohol Beverage Services | Wholesale Operations | F | 90848.0 | 0.00 | 5717.68 | 21 |
4 | ABS | Alcohol Beverage Services | Administration | F | 78902.0 | 205.16 | 2460.24 | 18 |
"
" " title=\"Convert this dataframe to an interactive table.\"\n",
" style=\"display:none;\">\n",
" \n",
" " width=\"24px\">\n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
"\n",
" \n",
"
"
\n",
" "
]
},
"metadata": {},
"execution_count": 2
}
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "2Ly3MjiOZ_-M"
},
"source": [
"## 3. Exploratory Data Analysis"
]
},
{
"cell_type": "code",
"source": [
"for col in data.columns:\n",
" print(\"{} has {} unique value.\".format(col, data[col].nunique()))"
],
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "CCPPmSDrIuw9",
"outputId": "bb3ebb28-a774-41b8-8d92-37b9033253f4"
},
"execution_count": 3,
"outputs": [
{
"output_type": "stream",
"name": "stdout",
"text": [
"Department has 40 unique value.\n",
"Department_Name has 40 unique value.\n",
"Division has 605 unique value.\n",
"Gender has 2 unique value.\n",
"Base_Salary has 3233 unique value.\n",
"2020_Overtime_Pay has 5650 unique value.\n",
"2020_Longevity_Pay has 677 unique value.\n",
"Level has 73 unique value.\n"
]
}
]
},
{
"cell_type": "code",
"metadata": {
"id": "YUb_fnW7Z_-T",
"outputId": "9ff03d82-28d0-4e18-97cc-364a037e62e8",
"colab": {
"base_uri": "https://localhost:8080/"
}
},
"source": [
"data.info()"
],
"execution_count": 4,
"outputs": [
{
"output_type": "stream",
"name": "stdout",
"text": [
"\n",
"RangeIndex: 9958 entries, 0 to 9957\n",
"Data columns (total 8 columns):\n",
" # Column Non-Null Count Dtype \n",
"--- ------ -------------- ----- \n",
" 0 Department 9958 non-null object \n",
" 1 Department_Name 9958 non-null object \n",
" 2 Division 9958 non-null object \n",
" 3 Gender 9958 non-null object \n",
" 4 Base_Salary 9958 non-null float64\n",
" 5 2020_Overtime_Pay 9958 non-null float64\n",
" 6 2020_Longevity_Pay 9958 non-null float64\n",
" 7 Level 9958 non-null object \n",
"dtypes: float64(3), object(5)\n",
"memory usage: 622.5+ KB\n"
]
}
]
},
{
"cell_type": "code",
"metadata": {
"id": "_D1hmgsIZ_-Y",
"outputId": "d9784986-c9c5-41cc-d773-05c313e6891f",
"colab": {
"base_uri": "https://localhost:8080/",
"height": 300
}
},
"source": [
"data.describe()"
],
"execution_count": 5,
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/plain": [
" Base_Salary 2020_Overtime_Pay 2020_Longevity_Pay\n",
"count 9958.000000 9958.000000 9958.000000\n",
"mean 78771.464060 5182.163123 923.572259\n",
"std 30153.168916 11062.665975 2043.593190\n",
"min 11147.240000 0.000000 0.000000\n",
"25% 56994.082500 0.000000 0.000000\n",
"50% 75290.000000 414.995000 0.000000\n",
"75% 94668.000000 5394.387500 0.000000\n",
"max 280000.000000 141998.220000 12471.840000"
],
"text/html": [
"\n",
"
"
"
"\n",
"\n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
"
Base_Salary | 2020_Overtime_Pay | 2020_Longevity_Pay | |
---|---|---|---|
count | 9958.000000 | 9958.000000 | 9958.000000 |
mean | 78771.464060 | 5182.163123 | 923.572259 |
std | 30153.168916 | 11062.665975 | 2043.593190 |
min | 11147.240000 | 0.000000 | 0.000000 |
25% | 56994.082500 | 0.000000 | 0.000000 |
50% | 75290.000000 | 414.995000 | 0.000000 |
75% | 94668.000000 | 5394.387500 | 0.000000 |
max | 280000.000000 | 141998.220000 | 12471.840000 |
"
" " title=\"Convert this dataframe to an interactive table.\"\n",
" style=\"display:none;\">\n",
" \n",
" " width=\"24px\">\n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
"\n",
" \n",
"
"
\n",
" "
]
},
"metadata": {},
"execution_count": 5
}
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "iBHtgslkZ_-g"
},
"source": [
"## Data Visualization"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "6NAlBUHbZ_-h"
},
"source": [
"__Distribution of Features__"
]
},
{
"cell_type": "code",
"metadata": {
"id": "RDGzTzBWZ_-j",
"outputId": "b5d5d1b0-d807-4fb6-ed17-dd954d6aac16",
"colab": {
"base_uri": "https://localhost:8080/",
"height": 351
}
},
"source": [
"fig, ax = plt.subplots()\n",
"fig.set_size_inches(15, 5)\n",
"sns.countplot(data[\"Gender\"], palette=\"Set2\" )"
],
"execution_count": 6,
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/plain": [
""
]
},
"metadata": {},
"execution_count": 6
},
{
"output_type": "display_data",
"data": {
"text/plain": [
""
],
"image/png":
},
"metadata": {
"needs_background": "light"
}
}
]
},
{
"cell_type": "code",
"metadata": {
"id": "ma62TIFRZ_-0",
"outputId": "3f587110-9cad-4126-bc74-d5c4135285dd",
"colab": {
"base_uri": "https://localhost:8080/",
"height": 982
}
},
"source": [
"for col in [\"Base_Salary\",\t\"2020_Overtime_Pay\",\t\"2020_Longevity_Pay\"]:\n",
" fig, ax = plt.subplots()\n",
" fig.set_size_inches(15, 5)\n",
" sns.distplot(data[col], color=\"g\")"
],
"execution_count": 7,
"outputs": [
{
"output_type": "display_data",
"data": {
"text/plain": [
""
],
"image/png":
},
"metadata": {
"needs_background": "light"
}
},
{
"output_type": "display_data",
"data": {
"text/plain": [
""
],
"image/png":
},
"metadata": {
"needs_background": "light"
}
},
{
"output_type": "display_data",
"data": {
"text/plain": [
""
],
"image/png":
},
"metadata": {
"needs_background": "light"
}
}
]
},
{
"cell_type": "code",
"metadata": {
"id": "4PhH3FzSZ__c",
"outputId": "e61bf2ca-03f7-4620-f088-2c7577e45501",
"colab": {
"base_uri": "https://localhost:8080/",
"height": 470
}
},
"source": [
"# Heatmapshowing correlation between variables\n",
"fig, ax =plt.subplots(figsize=(8, 8))\n",
"plt.title(\"Correlation Plot\")\n",
"sns.heatmap(data.corr(), mask=np.zeros_like(data.corr(), dtype=np.bool), cmap=sns.diverging_palette(220, 10, as_cmap=True),\n",
" square=True, ax=ax, annot=True,linewidths=5)\n",
"plt.show()"
],
"execution_count": 8,
"outputs": [
{
"output_type": "display_data",
"data": {
"text/plain": [
""
],
"image/png":
},
"metadata": {
"needs_background": "light"
}
}
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "jqjkLnH0Z__k"
},
"source": [
"## 4. Preprocessing"
]
},
{
"cell_type": "code",
"source": [
"data[\"Level\"] = pd.to_numeric(data[\"Level\"], errors='coerce')"
],
"metadata": {
"id": "MTwqv1Ox8tlP"
},
"execution_count": 9,
"outputs": []
},
{
"cell_type": "markdown",
"metadata": {
"id": "KJQtulvgZ__l"
},
"source": [
"### 4.1 Check for Missing / NAN values:"
]
},
{
"cell_type": "code",
"metadata": {
"id": "G7uFHGqSZ__l",
"outputId": "83d6f34c-14ce-42a2-bdd3-31d08078e284",
"colab": {
"base_uri": "https://localhost:8080/"
}
},
"source": [
"data.isnull().any()"
],
"execution_count": 10,
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/plain": [
"Department False\n",
"Department_Name False\n",
"Division False\n",
"Gender False\n",
"Base_Salary False\n",
"2020_Overtime_Pay False\n",
"2020_Longevity_Pay False\n",
"Level True\n",
"dtype: bool"
]
},
"metadata": {},
"execution_count": 10
}
]
},
{
"cell_type": "code",
"source": [
"data[\"Level\"] = data[\"Level\"].fillna(data[\"Level\"].mean())"
],
"metadata": {
"id": "5tdb03JP8bH_"
},
"execution_count": 11,
"outputs": []
},
{
"cell_type": "markdown",
"source": [
"### Removing duplicate features:"
],
"metadata": {
"id": "EuadnPs89SUA"
}
},
{
"cell_type": "code",
"source": [
"data = data.drop(\"Department\", axis=1)"
],
"metadata": {
"id": "MvQCkTSL9Ycf"
},
"execution_count": 12,
"outputs": []
},
{
"cell_type": "markdown",
"metadata": {
"id": "enllG8mZZ__r"
},
"source": [
"### 4.2 One-Hot Encoding:\n",
"\n",
"Labelling catagorical type of data such __'Department_Name',\t'Division',\t'Gender'__."
]
},
{
"cell_type": "code",
"metadata": {
"id": "9LtwOVk2Z__s",
"outputId": "05c501c3-3903-439d-d89a-5673dcd9d303",
"colab": {
"base_uri": "https://localhost:8080/"
}
},
"source": [
"from sklearn import preprocessing\n",
"#label Encoder\n",
"category_col =[\"Department_Name\",\t\"Division\",\t\"Gender\"] \n",
"\n",
"labelEncoder = preprocessing.LabelEncoder()\n",
"\n",
"# creating a map of all the numerical values of each categorical labels.\n",
"mapping_dict={}\n",
"for col in category_col:\n",
" data[col] = labelEncoder.fit_transform(data[col])\n",
" le_name_mapping = dict(zip(labelEncoder.classes_, labelEncoder.transform(labelEncoder.classes_)))\n",
" mapping_dict[col]=le_name_mapping\n",
"print(mapping_dict)"
],
"execution_count": 13,
"outputs": [
{
"output_type": "stream",
"name": "stdout",
"text": [
"{'Department_Name':
]
}
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "JcC7IFt9Z__y"
},
"source": [
"### 4.3 Preparing X and y using pandas"
]
},
{
"cell_type": "code",
"metadata": {
"id": "zVoNJ5KaPhxs"
},
"source": [
"X= data.drop(\"Base_Salary\", axis=1)\n",
"\n",
"y = data[\"Base_Salary\"]\n",
"\n",
"X_cols = X.columns"
],
"execution_count": 14,
"outputs": []
},
{
"cell_type": "markdown",
"metadata": {
"id": "QlLnhNvFZ__3"
},
"source": [
"### 4.4 Standardization\n",
"Standardize features by removing the mean and scaling to unit standard deviation"
]
},
{
"cell_type": "code",
"metadata": {
"id": "I8s8xD2RQtkl"
},
"source": [
"from sklearn.preprocessing import StandardScaler\n",
"scaler = StandardScaler().fit(X)\n",
"X = scaler.transform(X)\n",
"X = pd.DataFrame(X)\n",
"X.columns = X_cols"
],
"execution_count": 15,
"outputs": []
},
{
"cell_type": "code",
"metadata": {
"id": "ElkofqFQZ__8",
"outputId": "f7a5f958-5f89-4d67-ffeb-b5d20dddc8c6",
"colab": {
"base_uri": "https://localhost:8080/",
"height": 206
}
},
"source": [
"X.head()"
],
"execution_count": 16,
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/plain": [
" Department_Name Division Gender 2020_Overtime_Pay 2020_Longevity_Pay \\\n",
"0 -2.026675 1.474195 -1.197549 -0.450456 -0.451958 \n",
"1 -2.026675 -1.697593 -1.197549 -0.468461 1.524518 \n",
"2 -2.026675 -1.713803 0.835039 -0.468461 -0.451958 \n",
"3 -2.026675 1.479599 -1.197549 -0.468461 2.346039 \n",
"4 -2.026675 -1.713803 -1.197549 -0.449914 0.751982 \n",
"\n",
" Level \n",
"0 -0.137591 \n",
"1 -0.597672 \n",
"2 0.000000 \n",
"3 0.552530 \n",
"4 -0.137591 "
],
"text/html": [
"\n",
" \n",
" \n",
" \n",
"\n",
"\n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
"
Department_NameDivisionGender2020_Overtime_Pay2020_Longevity_PayLevel0-2.0266751.474195-1.197549-0.450456-0.451958-0.1375911-2.026675-1.697593-1.197549-0.4684611.524518-0.5976722-2.026675-1.7138030.835039-0.468461-0.4519580.0000003-2.0266751.479599-1.197549-0.4684612.3460390.5525304-2.026675-1.713803-1.197549-0.4499140.751982-0.137591
\n",
"\n",
" " title=\"Convert this dataframe to an interactive table.\"\n",
" style=\"display:none;\">\n",
" \n",
" " width=\"24px\">\n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
"\n",
" \n",
" \n",
" \n",
" "
]
},
"metadata": {},
"execution_count": 16
}
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "oiTEPIsfaAAJ"
},
"source": [
"### 4.5 Splitting Data into train and test sample."
]
},
{
"cell_type": "code",
"metadata": {
"id": "JzPWyDFXRJS0"
},
"source": [
"# Splitting data into train and test sample using 70% data for training and 30% data for testing\n",
"\n",
"from sklearn.model_selection import train_test_split\n",
"\n",
"X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.30, random_state=42)"
],
"execution_count": 17,
"outputs": []
},
{
"cell_type": "markdown",
"metadata": {
"id": "Y4YCMsnFaAAT"
},
"source": [
"### 5.2 Model Implementation"
]
},
{
"cell_type": "code",
"source": [
"from sklearn.preprocessing import PolynomialFeatures\n",
"from sklearn.linear_model import LinearRegression\n",
"from sklearn.metrics import r2_score\n",
"\n",
"grid_dict = {}\n",
"for n in range(1,10):\n",
" poly_reg = PolynomialFeatures(degree=n)\n",
" X_poly = poly_reg.fit_transform(X_train)\n",
" pol_reg = LinearRegression()\n",
" pol_reg.fit(X_poly, y_train)\n",
" y_pred_test = pol_reg.predict(poly_reg.fit_transform(X_test))\n",
" grid_dict[n]= r2_score(y_test, y_pred_test)\n"
],
"metadata": {
"id": "LQU8Ds0u-Wrl"
},
"execution_count": 18,
"outputs": []
},
{
"cell_type": "code",
"source": [
"degree= max(grid_dict, key=grid_dict.get)\n",
"print(\"Degree:\", degree)"
],
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "hbkmpdqgIGns",
"outputId": "aff4f2e1-f63f-472b-d991-066ce4077956"
},
"execution_count": 19,
"outputs": [
{
"output_type": "stream",
"name": "stdout",
"text": [
"Degree: 4\n"
]
}
]
},
{
"cell_type": "code",
"source": [
"poly_reg = PolynomialFeatures(degree=degree)\n",
"X_poly = poly_reg.fit_transform(X_train)\n",
"pol_reg = LinearRegression()\n",
"pol_reg.fit(X_poly, y_train)"
],
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "O45LAn6VIA6I",
"outputId": "0eb4e93d-0d11-4402-974d-c1a7c5c0827c"
},
"execution_count": 20,
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/plain": [
"LinearRegression()"
]
},
"metadata": {},
"execution_count": 20
}
]
},
{
"cell_type": "code",
"metadata": {
"id": "JCvS82R9aAAZ",
"outputId": "f00c95ab-8630-4ada-b195-5f33565f8a64",
"colab": {
"base_uri": "https://localhost:8080/"
}
},
"source": [
"print(\"The Intercept for the given Polynomial Regresssion is = \", pol_reg.intercept_)\n"
],
"execution_count": 21,
"outputs": [
{
"output_type": "stream",
"name": "stdout",
"text": [
"The Intercept for the given Polynomial Regresssion is = 841611195339.8003\n"
]
}
]
},
{
"cell_type": "code",
"metadata": {
"id": "5HV8d998aAAd",
"outputId": "cfff2b41-fece-4e70-e49a-a598d62c4436",
"colab": {
"base_uri": "https://localhost:8080/"
}
},
"source": [
"print(\"Coefficients are as follows:\")\n",
"a=list(pol_reg.coef_)\n",
"for m,n in zip(X,a):\n",
" print(m,\"=\",n)"
],
"execution_count": 22,
"outputs": [
{
"output_type": "stream",
"name": "stdout",
"text": [
"Coefficients are as follows:\n",
"Department_Name = -127466994738.19742\n",
"Division = 4664310879340.713\n",
"Gender = 151521913191246.72\n",
"2020_Overtime_Pay = 4014533335978.362\n",
"2020_Longevity_Pay = -4495710386194.0\n",
"Level = 9635260390100.555\n"
]
}
]
},
{
"cell_type": "code",
"metadata": {
"id": "QpXFmNIXaAAi"
},
"source": [
"# making predictions\n",
"y_pred_test = pol_reg.predict(poly_reg.fit_transform(X_test))\n",
"y_pred_train = pol_reg.predict(poly_reg.fit_transform(X_train))"
],
"execution_count": 23,
"outputs": []
},
{
"cell_type": "markdown",
"metadata": {
"id": "YY4SOmv2aABw"
},
"source": [
"### 6.2 Model Evaluation using Rsquared value."
]
},
{
"cell_type": "code",
"source": [
"from sklearn.metrics import r2_score\n",
"\n",
"print(\"R-Squared on test data:\",r2_score(y_test, y_pred_test))\n",
"print(\"R-Squared on train data:\",r2_score(y_train, y_pred_train))"
],
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "kKTeyjm5_Brz",
"outputId": "9336ca80-4ff3-4eb1-94eb-afbb415a7198"
},
"execution_count": 24,
"outputs": [
{
"output_type": "stream",
"name": "stdout",
"text": [
"R-Squared on test data: 0.5286852655594967\n",
"R-Squared on train data: 0.5359957716306178\n"
]
}
]
},
{
"cell_type": "code",
"source": [
""
],
"metadata": {
"id": "l1W0eFfK_Hn2"
},
"execution_count": 24,
"outputs": []
}
]
}
Get the best Build Polynomial Regression Model Using Python assignment help and tutoring services from our experts now!
About The Author - Ross Jen
Ross Jen, a seasoned data scientist specializing in predictive modeling and data analysis, will guide you through this project. With extensive experience in building and implementing machine learning models, Ross excels at uncovering hidden patterns in complex datasets. In this project, Ross will assist you in constructing a polynomial regression model to analyze salary structures and predict appropriate offers for new hires. You will learn how to handle real-world data, engineer features, and validate model predictions to ensure accurate and fair salary decisions