- 16th Jul 2024
- 17:26 pm
In this task, you will load and explore a training dataset to understand its structure and key features. By analyzing feature importance using two distinct approaches, you'll gain insights into predicting the status of an air conditioner. You will then create three supervised machine learning models and evaluate their performance. Additionally, you'll build and compare ensemble models, optimizing and justifying your choices throughout the process.
- Report performance score using a suitable metric on the test data. Is it possible that the presented result is an underfitted or overfitted one? Justify.
- Justify different design decisions for each ML model used to answer this question.
- Have you optimised any hyper-parameters for each ML model? What are they? Why have you done that? Explain.
- Finally, make a recommendation based on the reported results and justify it.
Given the same training and test data, build three ensemble models for predicting air conditioner status
- When do you want to use ensemble models over other ML models?
- What are the similarities or differences between these models?
- Is there any preferable scenario for using any specific model among set of ensemble models?
- Write a report comparing performances of models built in question 5 and 6. Report the best method based on model complexity and performance.
Is it possible to build ensemble model using ML classifiers other than decision tree? If yes, then explain with an example.
Building Classification Models - SIT720 - Get Assignment Solution
Please note that this is a sample assignment solved by our Python Programmers. These solutions are intended to be used for research and reference purposes only. If you can learn any concepts by going through the reports and code, then our Python Tutors would be very happy.
- Option 1 - To download the complete solution along with Code, Report and screenshots - Please visit our Python Assignment Sample Solution page
- Option 2 - Reach out to our Python Tutors to get online tutoring related to this assignment and get your doubts cleared
- Option 3 - You can check the partial solution for this assignment in this blog below
Free Assignment Solution - Building Classification Models - SIT720 Machine Learning
{
"cells": [
{
"cell_type": "code",
"execution_count": 62,
"metadata": {},
"outputs": [],
"source": [
"import numpy as np\n",
"import pandas as pd\n",
"from sklearn.preprocessing import StandardScaler\n",
"from sklearn.linear_model import LogisticRegression\n",
"import matplotlib.pyplot as plt\n",
"from matplotlib import rcParams\n",
"from xgboost import XGBClassifier\n",
"from sklearn.neighbors import KNeighborsClassifier\n",
"from sklearn.tree import DecisionTreeClassifier\n",
"from sklearn.metrics import classification_report\n",
"from sklearn.metrics import confusion_matrix\n",
"from sklearn.ensemble import AdaBoostClassifier\n",
"from sklearn.ensemble import GradientBoostingClassifier\n",
"from sklearn.ensemble import ExtraTreesClassifier"
]
},
{
"cell_type": "code",
"execution_count": 63,
"metadata": {},
"outputs": [],
"source": [
"rcParams['figure.figsize'] = 14, 7\n",
"rcParams['axes.spines.top'] = False\n",
"rcParams['axes.spines.right'] = False"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Data Exploration"
]
},
{
"cell_type": "code",
"execution_count": 64,
"metadata": {},
"outputs": [
"Unnamed: 0 0\n",
"load 0\n",
"ac 0\n",
"hourofday 0\n",
"dayofweek 0\n",
"dif 0\n",
"absdif 0\n",
"max 0\n",
"var 0\n",
"entropy 0\n",
"nonlinear 0\n",
"hurst 0\n",
"dtype: int64\n"
]
}
],
"source": [
"data_train=pd.read_csv(\"ac_train_data.csv\")\n",
"print(data_train.shape)\n",
"print(data_train.isnull().values.any())\n",
"print(data_train.isnull().sum())\n",
"\n",
"data_train=data_train.drop(['Unnamed: 0'],axis = 1)"
]
},
{
"cell_type": "code",
"execution_count": 66,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"name": "stdout",
"output_type": "stream",
"text":
],
"source": [
"#boosing and tree based feature importance calculation\n",
"model = XGBClassifier()\n",
"model.fit(X_train_scaled, y_train)\n",
"importances = pd.DataFrame(data={\n",
" 'Attribute': X_train.columns,\n",
" 'Importance': model.feature_importances_\n",
"})\n",
"importances = importances.sort_values(by='Importance', ascending=False)"
]
},
{
"cell_type": "code",
"execution_count": 76,
"metadata": {},
"outputs":
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Supervised Method Results"
]
},
{
"cell_type": "code",
"execution_count": 77,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
],
"source": [
"#knn result\n",
"knn = KNeighborsClassifier(n_neighbors=7)\n",
"knn.fit(X_train, y_train)\n",
"predictions_knn = knn.predict(X_test)\n",
"\n",
"\n",
"print(classification_report(y_test, predictions_knn))\n",
"print(confusion_matrix(y_test, predictions_knn))"
]
},
{
"cell_type": "code",
"execution_count": 78,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
" precision recall f1-score support\n",
"\n",
" 0 0.95 0.93 0.94 96221\n",
" 1 0.38 0.46 0.42 9319\n",
"\n",
" accuracy 0.89 105540\n",
" macro avg 0.66 0.69 0.68 105540\n",
"weighted avg 0.90 0.89 0.89 105540\n",
"\n",
"[[89214 7007]\n",
" [ 5035 4284]]\n"
]
}
],
"source": [
"#decission tree result\n",
"model = DecisionTreeClassifier()\n",
"model.fit(X_train, y_train)\n",
"predictions = model.predict(X_test)\n",
"print(classification_report(y_test, predictions))\n",
"print(confusion_matrix(y_test, predictions))"
]
},
{
"cell_type": "code",
"execution_count": 79,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Ensemble result"
]
},
{
"cell_type": "code",
"execution_count": 80,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
" precision recall f1-score support\n",
"\n",
" 0 0.95 0.93 0.94 96221\n",
" 1 0.41 0.48 0.44 9319\n",
"\n",
" accuracy 0.89 105540\n",
" macro avg 0.68 0.71 0.69 105540\n",
"weighted avg 0.90 0.89 0.90 105540\n",
"\n",
"[[89714 6507]\n",
" [ 4814 4505]]\n"
]
}
],
"source": [
"#adaboost\n",
"adb_clf = AdaBoostClassifier(n_estimators=100, random_state=42)\n",
"adb_clf.fit(X_train, y_train)\n",
"predicts_adaboost=adb_clf.predict(X_test)\n",
"print(classification_report(y_test, predicts_adaboost))\n",
"print(confusion_matrix(y_test, predicts_adaboost))"
]
},
{
"cell_type": "code",
"execution_count": 81,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text":
" [ 4786 4533]]\n"
]
}
],
"source": [
"#gradientboosting\n"
"print(classification_report(y_test, predicts_gradboost))\n",
"print(confusion_matrix(y_test, predicts_gradboost))"
]
},
{
"cell_type": "code",
"execution_count": 82,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
" precision recall f1-score support\n",
"\n",
" 0 0.94 0.96 0.95 96221\n",
" 1 0.45 0.36 0.40 9319\n",
"\n",
" accuracy 0.90 105540\n",
" macro avg 0.69 0.66 0.67 105540\n",
"weighted avg 0.90 0.90 0.90 105540\n",
"\n",
"[[92021 4200]\n",
" [ 5939 3380]]\n"
]
}
],
"source": [
"#extratree classifier\n",
"print(classification_report(y_test, predicts_ext))\n",
"print(confusion_matrix(y_test, predicts_ext))"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.8.5"
}
},
"nbformat": 4,
"nbformat_minor": 4
}
Get the best Building Classification Models - SIT720 Machine Learning assignment help and tutoring services from our experts now!
About The Author - Sneha Mishra
Sneha Mishra is a proficient data scientist specializing in machine learning and predictive modeling. With a strong background in data analysis, Sneha is adept at exploring and preparing datasets for various applications. She has significant experience in evaluating feature importance and building robust ML models. Her expertise includes optimizing hyper-parameters and implementing ensemble methods to enhance model performance.