Programming for Analytics - CC7182

Programming for Analytics - CC7182 - Assignment Solution

5th Jul 2024
17:22 pm

In this assignment our experts can guide you through the entire data analysis process. From understanding and preparing your data to performing statistical analysis and creating insightful visualizations, we ensure your project meets all requirements effectively.

Requirement specifications

Data understanding - Produce a meta data table to show characteristics of each variable. Describe missing or error data of each variable
Data preparation - Write Python programs to reduce variables with justifications and comments (e.g remove variables with no influences on the target). Write Python programs to clean data with justifications and comments (e.g clean records with missing values or errors)
Write Python programs to transform variable into the following:

Target variable Employer into binary No Outcome - 0, has an outcome -1
Gender into ordinal Female - 0, Male -1, Transgender=2, Prefer not to say=3, any others=4
Ethnic_Origin into ordinal number based on their occurrence in the data set in descending order.
WARD_NAME into ordinal numbers based on their occurrence in the data set in ascending order.
Highest_Level_of_Education into ordinal numbers based on UK ISCED Level.
Claiming_Benefits into ordinal No=0, Yes=1, Blank=2

4. Data analysis - Write a Python program to show summary statistics of sum, mean, standard deviation, skewness, and kurtosis of age variable. Write a Python program to calculate and show correlation of each variable with the target variable

5. Data exploration - Write a Python program to show histogram plot of any user chosen variables. Program should be continuing running until user chose exit.

All Python programs should write in Python version 3.x and have screen shots of coding and testing results with brief discussion and justification in the technical report. Python codes should include adequate comments and saved in .py or .ipynb file.

Programming for Analytics - CC7182 - Get Assignment Solution

Please note that this is a sample assignment solved by our Python Programmers. These solutions are intended to be used for research and reference purposes only. If you can learn any concepts by going through the reports and code, then our Python Tutors would be very happy.

To download the complete solution along with Code, Report and screenshots - Please visit our Python Assignment Sample Solution page
Reach out to our Python Tutors to get online tutoring related to this assignment and get your doubts cleared
You can check the partial solution for this assignment in this blog below

Free Assignment Solution - Programming for Analytics - CC7182

# Commented out IPython magic to ensure Python compatibility.
import pandas as pd
import numpy as np
import seaborn as sns;sns.set(style="white")
import matplotlib.pyplot as plt
# %matplotlib inline
import warnings
warnings.simplefilter("ignore")

df = pd.read_csv("1649341983356_Islington_iwork_anonymous_data.csv")

print("Our orignal data-set have {} rows and {} columns. \n" .format(df.shape[0], df.shape[1]))

df.head()

""">1. Data understanding"""

df.columns

"""Produce a meta data table to show characteristics of each variable."""
df.info()

""">2. Data Transformation:"""

"""Write Python programs to reduce variables with justifications and
comments (e.g remove variables with no influences on the target)"""

# Disability_details contains only Nan's
# Benefits has almost 50% Nan values
df = df.drop(["Disability_details", "Benefits"], axis=1)

"""Write Python programs to clean data with justifications and comments
(e.g clean records with missing values or errors)"""

df["WARD_NAME"] = df["WARD_NAME"].fillna(df["WARD_NAME"].mode()[0])

df.isnull().any()

#Target variable Employer into binary No Outcome - 0, has an outcome -1

df["Employer"] = np.where(df["Employer"]=="No Outcome",0,1)

# Gender into ordinal Female - 0, Male -1, Transgender=2, Prefer not to say=3, any others=4

gen_ls=[]

for gen in df["Gender"]:
if gen == "Female":
gen_ls.append(0)
elif gen == "Male":
gen_ls.append(1)
elif gen == "Transgender":
gen_ls.append(2)
elif gen == "Prefer not to say":
gen_ls.append(3)
else:
gen_ls.append(4)

df["Gender"] = gen_ls

#Ethnic_Origin into ordinal number based on their occurrence in the data set in descending order.

eth_uni_ls= []

for val in df["Ethnic_Origin"].tolist():
if val not in eth_uni_ls:
eth_uni_ls.append(val)

eth_uni_ls.reverse()

eth_dict = { key:value for key, value in zip(eth_uni_ls, list(range(0,len(eth_uni_ls)))) }

df["Ethnic_Origin"] = df["Ethnic_Origin"].replace(eth_dict)

#WARD_NAME into ordinal numbers based on their occurrence in the data set in ascending order.

ward_uni_ls= []

for val in df["WARD_NAME"].tolist():
if val not in ward_uni_ls:
ward_uni_ls.append(val)

ward_dict = { key:value for key, value in zip(ward_uni_ls, list(range(0,len(ward_uni_ls)))) }

df["WARD_NAME"] = df["WARD_NAME"].replace(ward_dict)

#Highest_Level_of_Education into ordinal numbers based on UK ISCED Level.

# df["Highest_Level_of_Education"].value_counts()

df["Highest_Level_of_Education"] = df["Highest_Level_of_Education"].replace({"ISCED Level 6 (Bachelor's or equivalent level)":6,
'Blanks':9,
"ISCED Level 7 (Master's or equivalent level)":7,
'ISCED Level 2 (Lower secondary education)':2,
'ISCED Level 3 (Upper secondary education)':3,
'ISCED Level 4 (Post secondary - tertiary and non-tertiary)':4,
'ISCED Level 5 (Short cycle tertiary education)':5,
'ISCED Level 1 (Primary education)':1,
'ISCED Level 0 (Early childhood education)':0,
'ISCED Level 8 (Doctoral or equivalenmt level)':8})

#Claiming_Benefits into ordinal No=0, Yes=1, Blank=2

df["Claiming_Benefits"] = df["Claiming_Benefits"].replace({"No":0,"Yes":1,"Blanks":2})

""">3. Data analysis"""

"""Write a Python program to show summary statistics of sum, mean,
standard deviation, skewness, and kurtosis of age variable."""

def age_statistics():
print("Variable: Client_Current_Age")
print("Sum:",df["Client_Current_Age"].sum())
print("Mean:", df["Client_Current_Age"].mean())
print("Statndard Deviation:", df["Client_Current_Age"].std())
print("Skewness:", df["Client_Current_Age"].skew())
print("Kutosis:", df["Client_Current_Age"].kurt())

age_statistics()

"""Write a Python program to calculate and show correlation of each
variable with the target variable"""
df.corr()

""">4 Data exploration:"""

"""
Write a Python program to show histogram plot of any user chosen
variables. Program should be continuing running until user chose exit.
"""
flag = True
while(flag):
feature = input("Enter Feature or press 'q' to exit: ")

if feature.lower() == "q":
flag = False

else:
try:
fig, ax = plt.subplots()
fig.set_size_inches(15, 5)
sns.distplot(df[feature], color="m")
plt.show()
except:
print("Enter valid feature name.")

"""Write a Python program to show scatter plot for any two user chosen
variables. Program should be continuing running until user chose exit."""

flag = True
while(flag):
feature1 = input("Enter 1st Feature or press 'q' to exit: ")
if feature1.lower() == "q":
flag = False

else:
feature2 = input("Enter 2nd Feature or press 'q' to exit: ")
if feature2.lower() == "q":
flag = False

else:
try:
fig, ax = plt.subplots()
fig.set_size_inches(15, 5)
sns.scatterplot(df[feature1], df[feature2])
plt.show()
except:
print("Enter valif features name.")

""">5. Data Mining:"""

"""Build any two Python Predictive Models to predict client employment using
prepared variables from the iWork data."""

X = df.drop(["Employer","Registration_Date","Parent_on_Enrolment","Has_Disability", "Religion", "Sexuality"], axis=1)
# X = df[["Client_Current_Age", "Gender", "Claiming_Benefits"]]
y = df["Employer"]

# Splitting data into train and test sample using 70% data for training and 30% data for testing

from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.30, stratify=y)

# Treat Imbalance
from imblearn.over_sampling import SMOTE
sm = SMOTE(random_state = 42)
X_train, y_train = sm.fit_resample(X_train, y_train)

# count of training and validation class
plt.figure(1 , figsize = (25 ,5))
n = 0
for z , j in zip([y_train , y_test] , ['train data', 'test data']):
n += 1
plt.subplot(1 , 3 , n)
sns.countplot(x = z, palette="Set3" )
plt.title(j)
plt.show()

"""* Logistic Regression:"""

from sklearn.linear_model import LogisticRegression
from sklearn.metrics import classification_report, confusion_matrix

clf1 = LogisticRegression()
clf1.fit(X_train, y_train)
predictions= clf1.predict(X_test)

train_acc1 = clf1.score(X_train, y_train)*100
test_acc1 = clf1.score(X_test, y_test)*100

print("Accuracy on training set: {:.3f}%. \n".format(train_acc1))
print("Accuracy on test set: {:.3f}%. \n".format(test_acc1))
print("Classification Report: \n",classification_report(y_test, predictions))
print()

map = [0,1]

cm = confusion_matrix(y_test, predictions)
print("Confusion Matrix: " )

confusion = pd.DataFrame(cm, columns = map, index = map)

confusion

"""* Random Forest:"""

from sklearn.ensemble import RandomForestClassifier

clf2 = RandomForestClassifier(n_estimators=2000, max_depth=4, random_state=0, oob_score=True, criterion='gini', min_samples_leaf=5)
clf2.fit(X_train, y_train)
predictions= clf2.predict(X_test)

train_acc2 = clf2.score(X_train, y_train)*100
test_acc2 = clf2.score(X_test, y_test)*100

print("Accuracy on training set: {:.3f}%. \n".format(train_acc2))
print("Accuracy on test set: {:.3f}%. \n".format(test_acc2))
print("Classification Report: \n",classification_report(y_test, predictions))
print()

cm = confusion_matrix(y_test, predictions)
print("Confusion Matrix: " )

confusion = pd.DataFrame(cm, columns = map, index = map)

confusion

Get the best Programming for Analytics - CC7182 assignment help and tutoring services from our experts now!

About The Author - Jordan Smith

Jordan Smith is a skilled data analyst and Python programmer with experience in data preparation, transformation, and analysis. Jordan excels in creating efficient Python programs for complex data sets, ensuring accurate data handling and insightful analysis.

Programming for Analytics - CC7182 - Assignment Solution

Programming for Analytics - CC7182 - Assignment Solution

Programming for Analytics - CC7182 - Get Assignment Solution

Free Assignment Solution - Programming for Analytics - CC7182

About The Author - Jordan Smith

Share this post

Recent Blogs