
- 6th Jul 2024
- 19:13 pm
In this assignment, you'll load and preprocess data, perform a train-test split, and build a fake news classifier using MultinomialNB and a deep neural network. Finally, you'll create a high-performance model to classify news into different subjects and analyze its strengths and weaknesses.
1. Load Data and perform basic EDA
- As part of understanding how the columns are separated, read the file using the open function and create a list and show the first 10 items in the list
- Based on your observation on how the data are separated, load the data set into pandas data frame and show the first 5 and last 5 rows
- See whether there are any null values and remove all the rows with any null values, and then show again that there are no more null values
2. Train Test Split
- Import related libraries and perform train test split. Keep 20% data in the test set
3. Training and Testing Fake news classifier using MultinomialNB
- Create a pipeline that will use countVectorizer with the function you have created earlier for data preprocessing, then use Tftransformer and then use the NaiveBayes classifier
4. Training and Testing a Deep Neural Network
- Import related library for using MLPClassifier from sklearn neural netowrk.
- Create a pipeline like 3i, for MLPClassfier you should use at least two layers and also should verbose = 2 (you can use other parameters as you wish or use the one you see from the uploaded google colab)
5. Build a high-performance model for classifying the news into different subjects. It means your target column will be the subject column. During this process, you have to use a neural network, properly preprocess the data frame, remove irrelevant columns, train and test the model properly and then finally show the classification report and confusion matrix. In order to get full credit, it should be at least 79% accurate in predicting the subject of news. Finally, you should discuss the classification report like what are the weaknesses of the model and what are the strengths of the model.
Neural Network - Get Assignment Solution
Please note that this is a sample solution created by our Python programmers for the Neural Network assignment. These solutions are for research and reference only.
- Visit our Python Assignment Sample Solution page to download the complete solution, including code, report, and screenshots.
- Connect with our Python Tutors for online tutoring to help you understand and complete this assignment.
- Check out the partial solution for this assignment in the blog post below.
Free Assignment Solution - Neural Network
{
"nbformat": 4,
"nbformat_minor": 0,
"metadata": {
"colab": {
"name": "NLP_ROMAN.ipynb",
"provenance": [],
"collapsed_sections": []
},
"kernelspec": {
"display_name": "Python 3",
"name": "python3"
},
"language_info": {
"name": "python"
}
},
"cells": [
{
"cell_type": "code",
"source": [
"from google.colab import drive\n",
"drive.mount('/content/drive')"
],
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "x_GNJOIYD0rb",
"outputId": "d79ac472-22f3-418e-abb9-cead8f743cec"
},
"execution_count": null,
"outputs": [
{
"output_type": "stream",
"name": "stdout",
"text": [
"Mounted at /content/drive\n"
]
}
]
},
{
"cell_type": "markdown",
"source": [
">1. Load Data and perform basic EDA"
],
"metadata": {
"id": "IilW8_A0EeeU"
}
},
{
"cell_type": "markdown",
"source": [
"I. import libraries necessary libraries and perform necessariy nltk\n",
"download operations"
],
"metadata": {
"id": "bDUdL5E2EhGn"
}
},
{
"cell_type": "code",
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "FBJ1fbEzkNQ1",
"outputId": "8b73f5d6-e0cb-4d8e-a12c-29f63991ce9a"
},
"source": [
"import pandas as pd\n",
"import numpy as np\n",
"import seaborn as sns;sns.set(style=\"white\")\n",
"import matplotlib.pyplot as plt\n",
"%matplotlib inline\n",
"\n",
"import re\n",
"from nltk.corpus import stopwords\n",
"import string\n",
"import nltk\n",
"nltk.download('stopwords')\n",
"nltk.download('punkt')\n",
"\n",
"import warnings\n",
"warnings.simplefilter(\"ignore\")"
],
"execution_count": null,
"outputs": [
{
"output_type": "stream",
"name": "stdout",
"text": [
"[nltk_data] Downloading package stopwords to /root/nltk_data...\n",
"[nltk_data] Unzipping corpora/stopwords.zip.\n",
"[nltk_data] Downloading package punkt to /root/nltk_data...\n",
"[nltk_data] Unzipping tokenizers/punkt.zip.\n"
]
}
]
},
{
"cell_type": "markdown",
"source": [
"II. As part of understanding how the columns are separated, read the\n",
"file using the open function and create a list and show the first 10\n",
"items in the list"
],
"metadata": {
"id": "afuHONdhFuHl"
}
},
{
"cell_type": "code",
"source": [
"text/html": [
"\n"
"cell_type": "code",
"source": [
"len_ls=[]\n",
"for txt in df[\"AllText\"]:\n",
" len_ls.append(len(txt))\n",
"\n",
"df[\"length\"] = len_ls\n",
"\n",
"df.head()"
],
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/",
"height": 206
},
"id": "v75njE_4NILK",
"outputId": "d95de20b-9181-46c3-a75e-565a9c605e19"
},
"execution_count": null,
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/plain": [
" AllText target length\n",
"0 politicsNews As U.S. budget fight looms, Repub... 1 4737\n",
"1 politicsNews U.S. military to accept transgend... 1 4155\n",
"2 politicsNews Senior U.S. Republican senator: '... 1 2863\n",
"3 politicsNews FBI Russia probe helped by Austra... 1 2534\n",
"4 politicsNews Trump wants Postal Service to cha... 1 5287"
],
"text/html": [
"\n",
" \n",
" \n",
" \n",
"\n",
"\n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
"
AllTexttargetlength0politicsNews As U.S. budget fight looms, Repub...147371politicsNews U.S. military to accept transgend...141552politicsNews Senior U.S. Republican senator: '...128633politicsNews FBI Russia probe helped by Austra...125344politicsNews Trump wants Postal Service to cha...15287
\n",
"\n",
" " title=\"Convert this dataframe to an interactive table.\"\n",
" style=\"display:none;\">\n",
" \n"
"metadata": {
"id": "cAxBqNYpWL-T"
},
"execution_count": null,
"outputs": []
},
{
"cell_type": "markdown",
"source": [
"IV. Generate classification report and confusion matrix"
],
"metadata": {
"id": "gKMQ3uULWPZd"
}
},
{
"cell_type": "code",
"source": [
"print(\"Classification Report: \\n\",classification_report(y_test, predictions)) \n"
],
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"outputId": "12dfc07c-19e4-455b-bee6-2d97117cb23b",
"id": "AUPfXp-zWSaq"
},
"execution_count": null,
"outputs": [
{
"output_type": "stream",
"name": "stdout",
"execution_count": null,
"outputs": []
},
{
"cell_type": "code",
"metadata": {
"scrolled": true,
"outputId": "047c17a2-f98e-437f-852e-43fa8f30a040",
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "SZwuYda2u71i"
},
"source": [
"# Feed the training data through the pipeline\n",
"pipeline.fit(X_train, y_train) "
],
"execution_count": null,
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/plain": [
"Pipeline(steps=[('count_v',\n",
" CountVectorizer(stop_words={'a', 'about', 'above', 'after',\n",
" 'again', 'against', 'all', 'also',\n",
" 'am', 'an', 'and', 'any', 'are',\n",
" \"aren't\", 'as', 'at', 'be',\n",
" 'because', 'been', 'before',\n",
" 'being', 'below', 'between',\n",
" 'both', 'but', 'by', 'can',\n",
" \"can't\", 'cannot', 'com', ...})),\n",
" ('clf', GradientBoostingClassifier())])"
]
},
"metadata": {},
"execution_count": 39
}
]
},
{
"cell_type": "code",
"source": [
"predictions = pipeline.predict(X_test)"
],
"metadata": {
"id": "aRUtyrhqu71j"
},
"execution_count": null,
"outputs": []
},
{
"cell_type": "code",
"source": [
"print(\"Classification Report: \\n\",classification_report(y_test, predictions)) "
],
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"outputId": "a7440352-ef5c-4380-8ac7-3e39267c31e4",
"id": "eHH8bN8_u71k"
},
"execution_count": null,
"outputs": [
{
"output_type": "stream",
"name": "stdout",
"text": [
"Classification Report: \n",
" precision recall f1-score support\n",
"\n",
"Government News 0.29 0.03 0.06 471\n",
" Middle-east 0.14 0.14 0.14 233\n",
" News 0.98 0.98 0.98 2715\n",
" US_News 0.18 0.19 0.18 235\n",
" left-news 0.31 0.10 0.15 1338\n",
" politics 0.52 0.84 0.64 2052\n",
" politicsNews 0.93 0.92 0.93 3382\n",
" worldnews 0.91 0.93 0.92 3044\n",
"\n",
" accuracy 0.78 13470\n",
" macro avg 0.53 0.52 0.50 13470\n",
" weighted avg 0.76 0.78 0.76 13470\n",
"\n"
]
}
]
},
{
"cell_type": "code",
"source": [
"print(\"Confusion Matrix: \\n\",confusion_matrix(y_test, predictions) )\n"
],
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"outputId": "9835da2e-47e6-46ca-bfd9-7b3bc6916cf3",
"id": "g1lxoABru71k"
},
"execution_count": null,
"outputs": [
{
"output_type": "stream",
"name": "stdout",
"text": [
"Confusion Matrix: \n",
" [[ 16 1 4 1 33 404 10 2]\n",
" [ 0 32 0 199 0 2 0 0]\n",
" [ 0 0 2674 0 4 37 0 0]\n",
" [ 0 189 0 44 0 2 0 0]\n",
" [ 13 1 26 1 135 1157 4 1]\n",
" [ 26 1 24 2 258 1718 15 8]\n",
" [ 0 0 3 0 0 7 3117 255]\n",
" [ 0 0 0 0 0 1 209 2834]]\n"
]
}
]
},
{
"cell_type": "code",
"source": [
"print(\"Accuracy:\", accuracy_score(y_test,predictions))"
],
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"outputId": "bad0874a-8fa5-4719-cadb-cba1cc334bc9",
"id": "MG6AXnzhu71l"
},
"execution_count": null,
"outputs": [
{
"output_type": "stream",
"name": "stdout",
"text": [
"Accuracy: 0.7847067557535263\n"
]
}
]
},
{
"cell_type": "code",
"source": [
""
],
"metadata": {
"id": "8YdCT8Ri1b-V"
},
"execution_count": null,
"outputs": []
}
]
}
Get the best Neural Network assignment and tutoring services from our experts now!
About The Author - Jake Stark
Jake Stark is a seasoned data scientist with extensive experience in machine learning and data analysis. Specializing in natural language processing and neural networks, Jake has a strong track record of building and optimizing predictive models. With a background in software engineering and a passion for developing innovative solutions, Jake excels in transforming complex datasets into actionable insights.