Machine Learning - SIT720 - Assignment Solution

19th Jul 2024
18:10 pm

In this assignment reproduce the RMSE results presented in Table 3 using Python modules and packages (including your own script or customised codes). Write a report summarising the dataset, used ML methods, experiment protocol and results including variations, if any. During reproducing the results:

you should use the same set of features used by the authors.
you should use the same classifier with exact parameter values.
you should use the same training/test splitting approach as used by the authors.

N.B.

If some of the ML methods are not covered in the current unit. Consider them as HD tasks i.e., based on the knowledge gained in the unit you should be able to find necessary packages and modules to reproduce the results.
If you find any issue in reproducing results or some subtle variations are found due to implementation differences of packages and modules in Python then appropriate explanation of them will be considered during evaluation of your submission.
Similarly, variation in results due to randomness of data splitting will also be considered during evaluation based on your explanation.

2. Design and develop your own ML solution for this problem. The proposed solution should be different from all approaches mentioned in the provided article. This does not mean that you must have to choose a new ML algorithm. You can develop a novel solution by changing the feature selection approach or parameter optimisations process of used ML methods or using different ML methods or adding regularization or different combinations of them. This means, the proposed system should be substantially different from the methods presented in the article but not limited to only change of ML methods. Compare the RMSE result with reported methods in the article. Write in your report summarising your solution design and outcomes. The report should include:

Motivation behind the proposed solution.
How the proposed solution is different from existing ones.
Detail description of the model including all parameters so that any reader can implement your model.
Description of experimental protocol.
Evaluation metrics.

Machine Learning - SIT720 - Get Assignment Solution

Explore a detailed assignment solution showcasing the Machine Learning - SIT720 - Assignment Solution. Access the complete code, report, and screenshots for research and reference purposes. For more details:

Option 1: Download the complete solution on our Python Assignment Sample Solution page.
Option 2: Connect with our Python Tutors for online tutoring and clarifying doubts related to this assignment.
Option 3: Check out the partial solution in this blog post.

Free Assignment Solution - Machine Learning - SIT720

{
"cells": [
{
"cell_type": "code",
"execution_count": 39,
"metadata": {
"colab": {},
"colab_type": "code",
"id": "PnjP1EoYHF97"
},
"outputs": [],
"source": [
"import warnings\n",
"warnings.filterwarnings(\"ignore\")"
]
},
{
"cell_type": "code",
"execution_count": 40,
"metadata": {
"colab": {},
"colab_type": "code",
"id": "Lmgp6uqGHxqc"
},
"outputs": [],
"source": [
"import numpy as np\n",
"import pandas as pd\n",
"import matplotlib.pyplot as plt\n",
"from sklearn.model_selection import train_test_split\n",
"from sklearn.preprocessing import OneHotEncoder\n",
"from sklearn.preprocessing import LabelEncoder\n",
"from sklearn.svm import SVR\n",
"from sklearn.preprocessing import StandardScaler\n",
"from sklearn.model_selection import GridSearchCV\n",
"%matplotlib inline\n",
"plt.style.use('default')"
]
},
{
"cell_type": "code",
"execution_count": 41,
"metadata": {
"colab": {},
"colab_type": "code",
"id": "-FLt6M0oIAJF"
},
"outputs": [],
"source": [
"df = pd.read_csv('forestfires.csv')"
]
},
{
"cell_type": "code",
"execution_count": 42,
"metadata": {
"colab": {},
"colab_type": "code",
"id": "vMpT9BzJI768"
},
"outputs": [
{
],
"text/plain": [
" X Y month day FFMC DMC DC ISI temp RH wind rain area \\\n",
"0 7 5 mar fri 86.2 26.2 94.3 5.1 8.2 51 6.7 0.0 0.0 \n",
"1 7 4 oct tue 90.6 35.4 669.1 6.7 18.0 33 0.9 0.0 0.0 \n",
"2 7 4 oct sat 90.6 43.7 686.9 6.7 14.6 33 1.3 0.0 0.0 \n",
"3 8 6 mar fri 91.7 33.3 77.5 9.0 8.3 97 4.0 0.2 0.0 \n",
"4 8 6 mar sun 89.3 51.3 102.2 9.6 11.4 99 1.8 0.0 0.0 \n",
"\n",
" Log-area month_encoded day_encoded \n",
"0 0.0 7 0 \n",
"1 0.0 10 5 \n",
"2 0.0 10 2 \n",
"3 0.0 7 0 \n",
"4 0.0 7 3 "
]
},
"execution_count": 52,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df['day_encoded'] = enc.transform(df['day'])\n",
"df.head()"
]
},
{
"cell_type": "code",
"execution_count": 53,
"metadata": {
"colab": {},
"colab_type": "code",
"id": "XKLLWXwsPF7p"
},
"outputs": [],
"source": [
"test_size =0.4 "
]
},
{
"cell_type": "code",
"execution_count": 54,
"metadata": {
"colab": {},
"colab_type": "code",
"id": "z7fSc3iyPVSB"
},
"outputs": [],
"source": [
"X_data = df.drop(['area','Log-area','month','day'],axis = 1)\n",
"y_data = df['Log-area']\n",
"X_train, X_test, y_train, y_test = train_test_split(X_data,y_data, test_size = test_size )\n",
"y_train = y_train.values.reshape(y_train.size,1)"
]
},
{
"cell_type": "code",
"execution_count": 55,
"metadata": {
"colab": {},
"colab_type": "code",
"id": "Wxh4ve9QPo-o"
},
"outputs": [],
"source": [
"def rec (m,n,tol):\n",
" if type(m) != 'numpy.ndarray':\n",
" m=np.array(m)\n",
" if type(n) != 'numpy.ndarray':\n",
" n= np.array(n)\n",
" l = m.size\n",
" percent = 0\n",
" for i in range(l):\n",
" if np.abs(10**m[i] - 10**n[i]) <= tol:\n",
" percent +=1\n",
" return 100*(percent/l)"
]
},
{
"cell_type": "code",
"execution_count": 56,
"metadata": {
"colab": {},
"colab_type": "code",
"id": "vyfoIoLuThW4"
},
"outputs": [],
"source": [
"# Define the max tolerance limit for REC curve x-axis\n",
"# For this problem this represents the absolute value of error in the prediction of the outcome i.e. area burned\n",
"tol_max=20"
]
},
{
"cell_type": "code",
"execution_count": 57,
"metadata": {
"colab": {},
"colab_type": "code",
"id": "TWRe4fG8Tv5M"
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"\n",
"plt.title(\"Histogram of prediction errors\\n\",fontsize=18)\n",
"plt.xlabel(\"Prediction error ($ha$)\",fontsize=14)\n",
"plt.grid(True)\n",
"plt.hist(10**(a.reshape(a.size,))-10**(y_test),bins=50)"
]
},
{
"cell_type": "code",
"execution_count": 64,
"metadata": {
"colab": {},
"colab_type": "code",
"id": "HbpEdRtZi4OL"
},
"outputs": [
{
"data": {
"text/plain": [
"[]"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"rec_SVR=[]\n",
"for i in range(tol_max):\n",
" rec_SVR.append(rec(a,y_test,i))\n",
"\n",
"plt.figure(figsize=(5,5))\n",
"plt.title(\"REC curve for the Support Vector Regressor\\n\",fontsize=15)\n",
"plt.xlabel(\"Absolute error (tolerance) in prediction ($ha$)\")\n",
"plt.ylabel(\"Percentage of correct prediction\")\n",
"plt.xticks([i*5 for i in range(tol_max+1)])\n",
"plt.ylim(-10,100)\n",
"plt.yticks([i*20 for i in range(6)])\n",
"plt.grid(True)\n",
"plt.plot(range(tol_max),rec_SVR)"
]
}
],
"metadata": {
"colab": {
"collapsed_sections": [],
"name": "ForestFire.ipynb",
"provenance": []
},
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.9.7"
}
},
"nbformat": 4,
"nbformat_minor": 1
}

Get the best Machine Learning - SIT720 assignment help and tutoring services from our experts now!

About The Author - Dr. Neha Patel

Dr. Neha Patel is a seasoned machine learning expert with extensive experience in reproducing and validating research results in data science. Specializing in Python programming, she excels in implementing and comparing complex machine learning models. Her proficiency in working with various datasets, feature selection approaches, and parameter optimization processes has enabled her to consistently deliver accurate and insightful analyses.

Machine Learning - SIT720 - Assignment Solution