Programming for Data Science - EP0302

Programming for Data Science - EP0302 - Assignment Solution

23rd Jul 2024
17:59 pm

In this assignment you are required to choose at least three datasets from Housing Development Board. You are encouraged to choose datasets which are interrelated and support a central theme of investigation. Your Jupyter notebook(s) should include the following:

Your name and the title of your data analysis
The questions you want to answer to gain deeper insights into the chosen datasets such that you are able to produce an interesting data analysis on it
A list of URLs of all the datasets you have used
For each dataset, write Python code that uses the Numpy package to extract useful statistical or summary information about the data and Matplotlib package to produce useful data visualizations that explain the data.

Free Assignment Solution - Programming for Data Science - EP0302

{
"nbformat": 4,
"nbformat_minor": 0,
"metadata": {
"colab": {
"name": "EP0302_Programming_for_Data_Science.ipynb",
"provenance": [],
"collapsed_sections": []
},
"kernelspec": {
"name": "python3",
"display_name": "Python 3"
},
"language_info": {
"name": "python"
}
},
"cells": [
{
"cell_type": "markdown",
"source": [
"Q1 Data URL: \n",
"\n",
"https://data.gov.sg/dataset/number-of-renting-out-of-flat-approvals-by-flat-type-quarterly\n",
"\n",
"Q2 Data URL:\n",
"\n",
"https://data.gov.sg/dataset/key-stats-since-1960-demand-for-rental-and-sold-flats\n",
"\n",
"Q3 Data URL:\n",
"\n",
"https://data.gov.sg/dataset/flats-constructed\n",
"\n",
"For each of the datasets, we would like to extract the descriptive summary of the dataset and also would like to find the relation between different features in the dataset through different plots (the list of which is provided in the questionairre).\n",
"\n",
"**Challenges and Explaination of the Program**\n",
"\n",
"We created multiple functions for dealing with this assignment. Functions are Stacked bar plot (which would plot the bar plot in stacked fashion), desc (which would print out the description of the dataset), boxplot (which would plot boxplot), barplot (which would plot barplot), lineplot (which would plot lineplot), histplot (which would plot histplot) and scatterplot (which would plot scatterplot).\n",
"\n",
"For each question, function desc was almost the same and the main challenge was to customize the plot because we were only allowed to use numpy (not pandas and other scientific modules) and also the calculations were although not difficult but lenghty. \n",
"\n",
"Finally, it was difficult to choose the dataset and also it was difficult to identify the type of summary and plots to be shown for all these data sets. "
],
"metadata": {
"id": "MXL9HBF4QlEk"
}
},
{
"cell_type": "code",
"source": [
"import numpy as np\n",
"import matplotlib.pyplot as plt"
],
"metadata": {
"id": "E3MKVESBP-iL"
},
"execution_count": null,
"outputs": []
},
{
"cell_type": "markdown",
"source": [
"## Q1"
],
"metadata": {
"id": "x--6qiOAp-Rw"
}
},
{
"cell_type": "code",
"source": [
"def desc(data, c, t):\n",
" M, Mi = np.max(data[:, -1].astype(int)), np.argmax(data[:, -1].astype(int))\n",
" m, mi = np.min(data[:, -1].astype(int)), np.argmin(data[:, -1].astype(int))\n",
" qM, qm = data[Mi, 0], data[mi, 0]\n",
" S = 'The lowest ' + c + ' of type \"' + t + '\" is ' + str(m) + ' for quarter ' + qm + '\\n'\n",
" S += 'The highest ' + c + ' of type \"' + t + '\" is ' + str(M) + ' for quarter ' + qM + '\\n'\n",
" return S\n",
"\n",
"def stackedBarplot(x, y, x1):\n",
" color = ['r', 'b', 'y', 'k', 'g', 'c']\n",
" plt.figure(figsize=(16,8))\n",
" for i in range(len(y)):\n",
" if i > 0:\n",
" plt.bar(x, y[i], color=color[i])\n",
" else:\n",
" plt.bar(x, y[i], bottom=sum(y[:i]), color=color[i])\n",
" plt.xlabel(\"Quarter\")\n",
" plt.ylabel(\"Number of Approvals\")\n",
" plt.legend(x1)\n",
" plt.xticks(rotation=90)\n",
" plt.title(\"Number of Approvals for different Flat types in different Quarters\")\n",
" plt.show()\n",
"\n",
"def boxPlot(x, y):\n",
" fig = plt.figure(figsize =(10, 7))\n",
" ax = fig.add_axes([0, 0, 1, 1])\n",
" bp = ax.boxplot(y)\n",
" plt.xticks(np.arange(1, len(y)+1), x)\n",
" plt.xlabel(\"Flat Types\")\n",
" plt.ylabel(\"Number of Approvals\")\n",
" plt.title(\"Number of Approvals for different Flat types - Boxplot\")\n",
" plt.show()"
],
"metadata": {
"id": "yq9_HrXsxMx6"
},
"execution_count": null,
"outputs": []
},
{
"cell_type": "code",
"source": [
"data = np.genfromtxt(\"renting-out-of-flat-approvals-by-flat-type-quarterly.csv\", delimiter=',', dtype=None, encoding=None)\n",
"s = '*** Number of Renting out of Flat approvals by flat type quarterly results ***\\n\\n'\n",
"s += 'The fields in this dataset are: ' + ', '.join(data[0]) + '\\n'\n",
"data = data[1:, :]\n",
"s += 'There are ' + str(data.shape[0]) + ' rows in this dataset\\n'\n",
"s += 'There are ' + str(data.shape[1]) + ' columns in this dataset\\n'\n",
"s += 'There are ' + str(len(np.unique(data[:, 1]))) + ' unique flat types namely ' + ', '.join(np.unique(data[:, 1])) + '\\n'\n",
"c = 'no_of_approvals'\n",
"s += '\\n'\n",
"y = []\n",
"x1 = []\n",
"for i in range(len(np.unique(data[:, 1]))):\n",
" t = np.unique(data[:, 1])[i]\n",
" mask = (data[:, 1] == t)\n",
" s += desc(data[mask, :], c, t) + '\\n'\n",
" y.append(data[mask, :][:, -1].astype(int))\n",
" x1.append(t)\n",
"x = np.unique(data[:, 0])\n",
"print(s)\n",
"boxPlot(x1, y)\n",
"print()\n",
"stackedBarplot(x, y, x1)"
],
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/",
"height": 1000
},
"id": "7ekNxlDVp_Br",
"outputId": "3a74bd0d-1ae8-4068-f352-9009073fae3b"
},
"execution_count": null,
"outputs": [
{
"output_type": "stream",
"name": "stdout",
"text": [
"*** Number of Renting out of Flat approvals by flat type quarterly results ***\n",
"\n",
"The fields in this dataset are: quarter, flat_type, no_of_approvals\n",
"There are 354 rows in this dataset\n",
"There are 3 columns in this dataset\n",
"There are 6 unique flat types namely 1-room, 2-room, 3-room, 4-room, 5-room, Executive\n",
"\n",
"The lowest no_of_approvals of type \"1-room\" is 0 for quarter 2008-Q4\n",
"The highest no_of_approvals of type \"1-room\" is 6 for quarter 2010-Q3\n",
"\n",
"The lowest no_of_approvals of type \"2-room\" is 22 for quarter 2007-Q1\n",
"The highest no_of_approvals of type \"2-room\" is 182 for quarter 2019-Q4\n",
"\n",
"The lowest no_of_approvals of type \"3-room\" is 847 for quarter 2007-Q1\n",
"The highest no_of_approvals of type \"3-room\" is 4054 for quarter 2019-Q2\n",
"\n",
"The lowest no_of_approvals of type \"4-room\" is 735 for quarter 2007-Q1\n",
"The highest no_of_approvals of type \"4-room\" is 4303 for quarter 2019-Q3\n",
"\n",
"The lowest no_of_approvals of type \"5-room\" is 602 for quarter 2007-Q1\n",
"The highest no_of_approvals of type \"5-room\" is 3140 for quarter 2019-Q2\n",
"\n",
"The lowest no_of_approvals of type \"Executive\" is 239 for quarter 2007-Q1\n",
"The highest no_of_approvals of type \"Executive\" is 840 for quarter 2018-Q1\n",
"\n",
"\n"
]
},
{
"output_type": "display_data",
"data": {
"text/plain": [
"<Figure size 720x504 with 1 Axes>"
],
},
"metadata": {
"needs_background": "light"
}
},
{
"output_type": "stream",
"name": "stdout",
"text": [
"\n"
]
},
{
"output_type": "display_data",
"data": {
"text/plain": [
"<Figure size 1152x576 with 1 Axes>"
],
},
"metadata": {
"needs_background": "light"
}
}
]
},
{
"cell_type": "markdown",
"source": [
"## Q2"
],
"metadata": {
"id": "U5meYC14U38y"
}
},
{
"cell_type": "code",
"source": [
"def desc(data, c, t):\n",
" M, Mi = np.max(data[:, -1].astype(int)), np.argmax(data[:, -1].astype(int))\n",
" m, mi = np.min(data[:, -1].astype(int)), np.argmin(data[:, -1].astype(int))\n",
" sM, eM = data[Mi, 0], data[Mi, 1] \n",
" sm, em = data[mi, 0], data[mi, 1]\n",
" S = 'The lowest ' + c + ' of type \"' + t + '\" is ' + str(m) + ' for start year = ' + sm + ' and end year = ' + em + '\\n'\n",
" S += 'The highest ' + c + ' of type \"' + t + '\" is ' + str(M) + ' for start year = ' + sM + ' and end year = ' + eM + '\\n'\n",
" return S"
],
"metadata": {
"id": "VLgjlLOjbx5x"
},
"execution_count": null,
"outputs": []
},
{
"cell_type": "code",
"source": [
"def barplot(c1, c2, c3):\n",
" N = len(c1)\n",
" ind = np.arange(N)\n",
" plt.figure(figsize=(16,8))\n",
" width = 0.3 \n",
" plt.bar(ind, c1 , width, label='Rental flats')\n",
" plt.bar(ind + width, c2, width, label='Home Ownership Flats')\n",
" plt.xlabel('Start Year - End Year')\n",
" plt.ylabel('Demand for Flats')\n",
" plt.title('Demand for Flats vs Year for different Flat types')\n",
" plt.xticks(ind + width / 2, c3)\n",
" plt.legend(loc='best')\n",
" plt.show()\n",
" print()\n",
"\n",
"def lineplot(c1, c2, c3):\n",
" fig, ax = plt.subplots(figsize=(12, 7))\n",
" ax.plot(np.arange(len(c3)), c2, marker='*', label='House Ownership Flats')\n",
" ax.plot(np.arange(len(c3)), c1, marker='o', label='Rental Flats')\n",
" ax.xaxis.set_ticks(np.arange(len(c3)))\n",
" ax.xaxis.set_ticklabels(c3, rotation=90)\n",
" plt.xlabel(\"Start Year- End Year\")\n",
" plt.ylabel(\"Demand For Flats\")\n",
" plt.title('Demand for Flats vs Year for different Flat types')\n",
" plt.legend()\n",
" plt.show()"
],
"metadata": {
"id": "P4KLN9QmlS8a"
},
"execution_count": null,
"outputs": []
},
{
"cell_type": "code",
"source": [
"data = np.genfromtxt(\"demand-for-rental-and-sold-flats.csv\", delimiter=',', dtype=None, encoding=None)\n",
"s = '*** Demand for Rental and Sold flats since 1960 results ***\\n\\n'\n",
"s += 'The fields in this dataset are: ' + ', '.join(data[0]) + '\\n'\n",
"data = data[1:, :]\n",
"s += 'There are ' + str(data.shape[0]) + ' rows in this dataset\\n'\n",
"s += 'There are ' + str(data.shape[1]) + ' columns in this dataset\\n'\n",
"s += 'There are ' + str(len(np.unique(data[:, 2]))) + ' unique flat types namely ' + ', '.join(np.unique(data[:, 2])) + '\\n'\n",
"c = 'demand for flats'\n",
"s += '\\n' \n",
"for i in range(len(np.unique(data[:, 2]))):\n",
" t = np.unique(data[:, 2])[i]\n",
" mask = (data[:, 2] == t)\n",
" s += desc(data[mask, :], c, t) + '\\n'\n",
"print(s)\n",
"mask = (data[:, 2] == 'rental_flats')\n",
"c1 = data[mask, 3].astype(int).tolist()\n",
"mask = (data[:, 2] == 'home_ownership_flats')\n",
"c2 = data[mask, 3].astype(int).tolist()\n",
"dm = data[mask, :]\n",
"c3 = list(map('-'.join, zip(dm[:, 0], dm[:, 1])))\n",
"barplot(c1, c2, c3)\n",
"lineplot(c1, c2, c3)"
],
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/",
"height": 1000
},
"id": "anXD_vFxS6Wu",
"outputId": "039ef029-4534-4ae5-9110-a22266c5b097"
},
"execution_count": null,
"outputs": [
{
"output_type": "stream",
"name": "stdout",
"text": [
"*** Demand for Rental and Sold flats since 1960 results ***\n",
"\n",
"The fields in this dataset are: start_year, end_year, flat_type, demand_for_flats\n",
"There are 24 rows in this dataset\n",
"There are 4 columns in this dataset\n",
"There are 2 unique flat types namely home_ownership_flats, rental_flats\n",
"\n",
"The lowest demand for flats of type \"home_ownership_flats\" is 2967 for start year = 1960 and end year = 1965\n",
"The highest demand for flats of type \"home_ownership_flats\" is 308454 for start year = 1991 and end year = 1995\n",
"\n",
"The lowest demand for flats of type \"rental_flats\" is 15995 for start year = 1986 and end year = 1990\n",
"The highest demand for flats of type \"rental_flats\" is 66005 for start year = 1966 and end year = 1970\n",
"\n",
"\n"
]
},
{
"output_type": "display_data",
"data": {
"text/plain": [
"<Figure size 1152x576 with 1 Axes>"
],

},
"metadata": {
"needs_background": "light"
}
},
{
"output_type": "stream",
"name": "stdout",
"text": [
"\n"
]
},
{
"output_type": "display_data",
"data": {
"text/plain": [
"<Figure size 864x504 with 1 Axes>"
],
},
"metadata": {
"needs_background": "light"
}
}
]
},
{
"cell_type": "markdown",
"source": [
"## Q3"
],
"metadata": {
"id": "EG8vb5lz6lQx"
}
},
{
"cell_type": "code",
"source": [
"def desc(data, c):\n",
" M, Mi = np.max(data[:, -1]), np.argmax(data[:, -1])\n",
" m, mi = np.min(data[:, -1]), np.argmin(data[:, -1])\n",
" yM, ym = data[Mi, 0], data[mi, 0]\n",
" q1 = np.quantile(data[:, -1], .25)\n",
" q2 = np.quantile(data[:, -1], .50)\n",
" q3 = np.quantile(data[:, -1], .75)\n",
" S = 'The lowest ' + c + ' is ' + str(m) + ' for year ' + str(ym) + '\\n'\n",
" S += 'The highest ' + c + ' is ' + str(M) + ' for year ' + str(yM) + '\\n'\n",
" S += 'Q1 quantile of ' + c + ' is ' + str(int(q1)) + '\\n'\n",
" S += 'Q2 quantile (Median) of ' + c + ' is ' + str(int(q2)) + '\\n'\n",
" S += 'Q3 quantile of ' + c + ' is ' + str(int(q3)) + '\\n'\n",
" return S\n",
"\n",
"def histPlot(data):\n",
" num_bins = 20\n",
" plt.figure(figsize=(12,7))\n",
" n, bins, patches = plt.hist(data[:, 1], num_bins, \n",
" color ='red',\n",
" alpha = 0.7)\n",
" plt.xlabel('Flats Constructed')\n",
" plt.ylabel('Frequency')\n",
" plt.title('Histogram of Flats Constructed')\n",
" plt.show()\n",
"\n",
"def scatterPlot(data):\n",
" plt.figure(figsize=(12,7))\n",
" plt.scatter(data[:, 0], data[:, 1], marker='*', color='r')\n",
" plt.ylabel('Flats Constructed')\n",
" plt.xlabel('Year')\n",
" plt.title('Flats Constructed vs Year - Scatterplot')\n",
" plt.show()"
],
"metadata": {
"id": "F7Eo4auU8qk5"
},
"execution_count": null,
"outputs": []
},
{
"cell_type": "code",
"source": [
"data = np.genfromtxt(\"flats-constructed-by-housing-and-development-board-annual.csv\", delimiter=',', dtype=None, encoding=None)\n",
"s = '*** Flats constructed by Housing and Development Board Annual results ***\\n\\n'\n",
"s += 'The fields in this dataset are: ' + ', '.join(data[0]) + '\\n'\n",
"data = data[1:, :].astype(int)\n",
"s += 'There are ' + str(data.shape[0]) + ' rows in this dataset\\n'\n",
"s += 'There are ' + str(data.shape[1]) + ' columns in this dataset\\n\\n'\n",
"c = 'number of Flats constructed'\n",
"s += desc(data, c)\n",
"print(s)\n",
"histPlot(data)\n",
"print()\n",
"scatterPlot(data)"
],
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/",
"height": 1000
},
"id": "R_FfUOLv6lxG",
"outputId": "4bc1c32e-6758-4917-d8e2-aa7f553d56b6"
},
"execution_count": null,
"outputs": [
{
"output_type": "stream",
"name": "stdout",
"text": [
"*** Flats constructed by Housing and Development Board Annual results ***\n",
"\n",
"The fields in this dataset are: year, flats_constructed\n",
"There are 41 rows in this dataset\n",
"There are 2 columns in this dataset\n",
"\n",
"The lowest number of Flats constructed is 2733 for year 2006\n",
"The highest number of Flats constructed is 67017 for year 1984\n",
"Q1 quantile of number of Flats constructed is 11793\n",
"Q2 quantile (Median) of number of Flats constructed is 23913\n",
"Q3 quantile of number of Flats constructed is 29008\n",
"\n"
]
},
{
"output_type": "display_data",
"data": {
"text/plain": [
"<Figure size 864x504 with 1 Axes>"
],
},
"metadata": {
"needs_background": "light"
}
}
]
}
]
}

Get the best Programming for Data Science - EP0302 assignment help and tutoring services from our experts now!

This sample Python assignment solution has been successfully completed by our team of Python programmers. The solutions provided are designed exclusively for research and reference purposes. If you find value in reviewing the reports and code, our Python tutors would be delighted.

For a comprehensive solution package including code, reports, and screenshots, please visit our Python Assignment Sample Solution page.
Contact our Python experts for personalized online tutoring sessions focused on clarifying any doubts related to this assignment.
Explore the partial solution for this assignment available in the blog above for further insights.

About The Author - Jordan Smith

Jordan Smith is an experienced data analyst specializing in Python programming for data exploration and visualization. In this project, Jordan will utilize multiple interrelated datasets from the Housing Development Board to conduct an in-depth analysis. By employing Numpy for statistical summaries and Matplotlib for data visualizations, Jordan aims to answer key questions and present insightful findings in a compelling manner.

Programming for Data Science - EP0302 - Assignment Solution