- 24th Jul 2024
- 17:37 pm
In this assignment, you are a business analyst at an event management company that specializes in concerts. In an effort to identify potential performers, you’ve been tasked to analyze Spotify data on songs that have appeared in their top 10 list from 2010-2015. For each year in the dataset, you will need to complete three analysis problems. Rather than manually reap each analysis for each individual year, you will write three distinct functions that can be reused for each year. You will begin by loading your CSV, converting it to a dictionary spreadsheet, and performing simple data cleansing. Then for each year, from 2010 through 2015, you will need to find: How many songs have appeared in the top 10 The average popularity of songs that have appeared in the top 10 How many songs in each of the following genres—dance pop, hip hop—have appeared in the top 10 Instructions to begin: Download the provided files, launch Jupyter Notebook, and open the Notebook file. Follow the instructions in the Notebook file to complete and submit the assignment.
Free Assignment Solution - Analyzing Spotify's Top 10 Songs For Event Management
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Analyzing Spotify Music Data\n",
"In this assignment, you are a business analyst at an event management company that specializes in concerts. In an effort to identify potential performers, you’ve been tasked to analyze Spotify data on songs that have appeared in their top 10 list from 2010-2015. For each year in the dataset, you will need to complete three analysis problems. Rather than manually repeating each analysis for each individual year, you will write three distinct functions that can be reused for each year.\n",
"\n",
"The CSV contains the following columns: \n",
"- Song title\n",
"- Artist\n",
"- Year \n",
"- Popularity; i.e. the number of times the song was played \n",
"- Subgenre\n",
"\n",
"You will begin by loading your CSV, converting it to a dictionary spreadsheet, and performing simple data cleansing. Then for each year, from 2010 through 2015, you will need to find:\n",
"- How many songs appear in the top 10 \n",
"- The average popularity of songs that appear in the top 10 \n",
"- How many songs in each of the following genres - dance pop, hip hop - appear in the top 10 "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Deliverables\n",
"To receive credit for this assignment, you must submit the following files:\n",
"- Your completed Jupyter Notebook\n",
"\n",
"Your completed Jupyter Notebook will be this file, but with all of the problems solved.\n",
"\n",
"When you're done with the assignment, run all cells to verify that your code executes as expected. Then, save and submit this notebook.\n",
"\n",
"Good luck!\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Part 1: Loading & Exploring the Data\n",
"In Part 1, you will:\n",
"- Load the CSV\n",
"- Organize the data with a dictionary (_new_)\n",
"- Convert improperly imported data types"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Problem 1: Loading the CSV\n",
"You have been given a variable, called `filename`, containing the path to the Spotify data set. Use it to load the lines of the CSV file into a variable, called `contents`. Then, follow the steps below:\n",
"- Split the first element of `contents` on the pipe (`|`) character, and store the result in a variable called `headers`\n",
"- Extract the remaining elements of `contents` into a variable, called `data`\n",
"\n",
"Then, print out the `headers`, and the first element of `data`.\n",
"\n",
"---\n",
"\n",
"Your code should print the following:\n",
"\n",
"```\n",
"['Title', 'Artist', 'Year', 'Popularity', 'Subgenre']\n",
"Hey, Soul Sister|Train|2010|83|neo mellow\n",
"```\n",
"\n",
"---\n",
"\n",
"**Hints**\n",
"- Unlike other CSV files you've used, this one uses the pipe symbol (`|`) as a separator. Make sure your code accounts for this.\n",
"- Be sure to `close` your file before proceeding.\n",
"- Recall that `splitlines` breaks a file into its constituent lines."
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {},
"outputs": [],
"source": [
"# Provided Code -- Do NOT Edit!\n",
"filename = 'SpotifyTop10.csv'"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {},
"outputs": [],
"source": [
"# TODO: `open` the CSV, then `read` it using `splitlines` into a variable called `contents`\n",
"f = open(filename, \"r\")\n",
"data = f.read()\n",
"contents = data.splitlines()\n",
"f.close()"
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {},
"outputs": [],
"source": [
"# TODO: Extract first row of `contents` into a variable called `headers`\n",
"headers = contents[0]"
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {},
"outputs": [],
"source": [
"# TODO: Extract the rest of the rows from `contents` into a variable called `data`\n",
"data = contents[1:]"
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Title|Artist|Year|Popularity|Subgenre\n",
"Hey, Soul Sister|Train|2010|83|neo mellow\n"
]
}
],
"source": [
"# TODO: Print headers and first element of `data`\n",
"print(headers)\n",
"print(data[0])"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Problem 2a: Splitting Data\n",
"Now, you will split and organize information in each data row according to its appropriate column. You have been provided with a set of empty list variables, corresponding to the columns of a spreadsheet.\n",
"\n",
"To solve this problem, you must write a for loop that iterates over each row of data, splits each line Into its constituent elements, and append each element of the split to the appropriate column variable.\n",
"\n",
"---\n",
"\n",
"**Hints**\n",
"- Recall that you can \"unpack\" lists, e.g., after `first_name, last_name = [\"John\", \"Doe\"]`, `first_name` contains `\"John\"`, and `last_name` contains `\"Doe\"`."
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {},
"outputs": [],
"source": [
"# Provided for your convenience -- DO NOT MODIFY\n",
"titles = []\n",
"artists = []\n",
"years = []\n",
"popularities = []\n",
"subgenres = []"
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {},
"outputs": [],
"source": [
"# TODO: Iterate over each line in `data`\n",
"\n",
"for i in data:\n",
" # TODO: Split each line on `separator` into a variable called `row`\n",
" row = i.split(\"|\")\n",
" # TODO: Append each value to appropriate list\n",
" titles.append(row[0])\n",
" artists.append(row[1])\n",
" years.append(row[2])\n",
" popularities.append(row[3])\n",
" subgenres.append(row[4])"
]
},
{
"cell_type": "code",
"execution_count": 8,
"metadata": {},
"outputs": [
{
"ename": "NameError",
"evalue": "name 'spreadsheet' is not defined",
"output_type": "error",
"traceback": [
"\u001b[1;31m---------------------------------------------------------------------------\u001b[0m",
"\u001b[1;31mNameError\u001b[0m Traceback (most recent call last)",
"\u001b[1;32m\u001b[0m in \u001b[0;36m\u001b[1;34m\u001b[0m\n\u001b[0;32m
1\u001b[0m \u001b[1;31m# Provided code --
Do NOT Edit!\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b
[0m\n\u001b[1;32m---->2\u001b[1;33m \u001b[1;32mfor\u001b[0m \u001b[0mcolumn_name\u001b
[0m\u001b[1;33m,\u001b[0m\u001b[0mcolumn_data\u001b[0m\u001b[1;32min\u001b[0m\u001b
[0mspreadsheet\u001b[0m\u001b[1;33m.\u001b[0m\u001b[0mitems\u001b[0m\u001b[1;33m(\u001b
[0m\u001b[1;33m)\u001b[0m\u001b
[1;33m:\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b
[0m\u001b[0;32m3\u001b[0m \u001b[0mprint\u001b[0m\u001b[1;33m(\u001b[0m\u001b[0mcolumn_name\u001b
[0m \u001b[1;33m+\u001b[0m\u001b[1;34m\":\"\u001b[0m\u001b[1;33m+\u001b[0m\u001b[0mstr\u001b[0m\u001b
[1;33m(\u001b[0m\u001b[0mcolumn_data\u001b[0m\u001b[1;33m[\u001b[0m\u001b[1;33m:\u001b[0m\u001b
[1;36m3\u001b[0m\u001b[1;33m]\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m)\u001b[0m\u001b
[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n",
"\u001b[1;31mNameError\u001b[0m: name 'spreadsheet' is not defined"
]
}
],
"source": [
"# Provided code -- Do NOT Edit!\n",
"for column_name, column_data in spreadsheet.items():\n",
" print(column_name + \": \" + str(column_data[:3]))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Problem 2b: Creating a `spreadsheet` Dictionary\n",
"Next, you will create a dictionary, called `spreadsheet`, whose keys are each of the column names in `headers`, and whose values are the corresponding lists you populated in the previous problem, viz., `titles`, `artists`, `years`, `popularities`, and `subgenres`."
]
},
{
"cell_type": "code",
"execution_count": 9,
"metadata": {},
"outputs": [],
"source": [
"# TODO: Create `spreadsheet` dictionary\n",
"spreadsheet = {'Title' : titles ,'Artist' : artists,'Year': years,'Popularity':popularities,'Subgenre':subgenres}\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Problem 3: Converting Data Types\n",
"As usual, Python has imported all of your data as strings -- including the numeric columns, `Popularity` and `Year`. In this problem, you will convert the strings in these columns into integers.\n",
"\n",
"You have been given a variable called `numerical_columns`. Use this list to implement the logic below:\n",
"- Iterate over each `column` in `numerical_columns`\n",
"- Within the loop:\n",
" - Convert each `value` in `spreadsheet[column]` to an integer\n",
"\n",
"---\n",
"\n",
"**Hints**\n",
"\n",
"- Use a list comprehension."
]
},
{
"cell_type": "code",
"execution_count": 10,
"metadata": {},
"outputs": [],
"source": [
"# Provided Code -- Do NOT Edit!\n",
"numerical_columns = ['Popularity', 'Year']\n"
]
},
{
"cell_type": "code",
"execution_count": 11,
"metadata": {},
"outputs": [],
"source": [
"# TODO: Iterate over `numerical_columns`\n",
"for i in numerical_columns:\n",
" # TODO: Convert each value in `column` to an integer\n",
" for j in range(len(spreadsheet[i])):\n",
" spreadsheet[i][j] = int(spreadsheet[i][j])\n",
" \n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Part 2\n",
"In Part 2, you will:\n",
"- How many songs appear in the top 10 for each year, from 2010 through 2015\n",
"- The average popularity of songs that appear in the top 10 for each year, from 2010 through 2015\n",
"- How many songs in each of the following genres - dance pop and hip hop - appear in the top 10 for each year, from 2010 through 2015"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Problem 1: Number of Songs in the Top 10 Lists Each Year\n",
"Write a function, called `count_songs_in_year`, that accepts a single numerical argument, called `year`, and returns the number of songs released in the given `year`.\n",
"\n",
"Your function should behave as follows:\n",
"```\n",
">>> count_songs_in_year(2010)\n",
"51\n",
"```"
]
},
{
"cell_type": "code",
"execution_count": 14,
"metadata": {},
"outputs": [],
"source": [
"\n",
"# TODO: Declare `count_songs_in_year` function, accepting a single `year` argument\n",
"def count_songs_in_year(year):\n",
" # TODO: Initialize a `counter` to `0`\n",
" counter = 0\n",
" # TODO: Iterate over each `release_year` in `spreadsheet['Year']`\n",
" for release_year in spreadsheet['Year']:\n",
" # TOOO: If the `release_year` matches the `year` argument, increment your counter\n",
" if release_year == year:\n",
" counter += 1\n",
" # TODO: Return your `counter`\n",
" return counter"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"After implementing `count_songs_in_year`, invoke it for each year from 2010 to 2015, to print a message like the following:\n",
"\n",
"Your function should produce the following output:\n",
"```\n",
"\n",
"51\n",
"53\n",
"35\n",
"71\n",
"58\n",
"95\n",
"```\n"
]
},
{
"cell_type": "code",
"execution_count": 16,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"51\n",
"53\n",
"35\n",
"71\n",
"58\n",
"95\n"
]
}
],
"source": [
"# TODO: Call `count_songs_in_year` on the years: 2010, 2011, 2012, 2013, 2014, 2015 and print the results\n",
"print(count_songs_in_year(2010))\n",
"print(count_songs_in_year(2011))\n",
"print(count_songs_in_year(2012))\n",
"print(count_songs_in_year(2013))\n",
"print(count_songs_in_year(2014))\n",
"print(count_songs_in_year(2015))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Problem 2: Average Popularity of Top 10 by Year\n",
"Write a function, called `average_popularity_in_year`, that accepts a single numerical `year` argument, and returns the average popularity of Top 10 songs in the given `year`. This indicates how many thousands of times someone listened to a Top 10 song each year.\n",
"\n",
"Your function should behave as follows:\n",
"\n",
"```\n",
">>> average_popularity_in_year(2010)\n",
"64.25490196078431\n",
"```\n",
"\n",
"**Hint**\n",
"- You can do this with a `for` loop _or_ a list comprehension.\n",
"- You must both count the number of songs released in the given `year` _and_ sum up their popularity values, because you must divide the sum by the count to return the average."
]
},
{
"cell_type": "code",
"execution_count": 23,
"metadata": {},
"outputs": [],
"source": [
"# TODO: Declare `average_popularity_in_year`, accepting a single `year` argument\n",
"def average_popularity_in_year(year):\n",
" # TODO: Initialize `total_popularity` and `year_counter` variables to 0\n",
" total_popularity = 0\n",
" year_counter = 0\n",
" # TODO: `enumerate` over each `index` and `release_year` of `spreadsheet['Year']`\n",
" for index,release_year in enumerate(spreadsheet['Year']):\n",
" # TODO: Check if `release_year` matches `year` argument\n",
" if release_year == year:\n",
" # TODO: If so, add current item's popularity to `total_popularity`, and increment `year_counter`\n",
" total_popularity += spreadsheet['Popularity'][index]\n",
" year_counter += 1\n",
" # TODO: Use `total_popularity` and `year_count` to return the average popularity \n",
" return total_popularity/year_counter\n"
]
},
{
"cell_type": "code",
"execution_count": 24,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"64.25490196078431"
]
},
"execution_count": 24,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# TODO: Print the average popularity of Top 10 songs in 2010\n",
"average_popularity_in_year(2010)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"After defining `average_popularity_in_year`, invoke it on every year from 2010 to 2015, inclusive, to print a message like the following:\n",
"```\n",
"64.25490196078431\n",
"61.867924528301884\n",
"67.77142857142857\n",
"63.985915492957744\n",
"62.706896551724135\n",
"64.56842105263158\n",
"```"
]
},
{
"cell_type": "code",
"execution_count": 25,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"64.25490196078431\n",
"61.867924528301884\n",
"67.77142857142857\n",
"63.985915492957744\n",
"62.706896551724135\n",
"64.56842105263158\n"
]
}
],
"source": [
"# TODO: Call `average_popularity_in_year` on: 2010, 2011, 2012, 2013, 2014, 2015 and print the results\n",
"print(average_popularity_in_year(2010))\n",
"print(average_popularity_in_year(2011))\n",
"print(average_popularity_in_year(2012))\n",
"print(average_popularity_in_year(2013))\n",
"print(average_popularity_in_year(2014))\n",
"print(average_popularity_in_year(2015))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Problem 3: Number of Songs in Interesting Genres by Year\n",
"Write a function, called `count_songs_in_genre_in_year`, that accepts two arguments: `year`, a number, and `genre`, a string. It should return the number of songs in `genre` that were released in the given `year`.\n",
"\n",
"Your function should behave as follows:\n",
"\n",
"```\n",
">>> count_songs_in_genre_in_year(2010, 'hip hop')\n",
"4\n",
"```\n",
"\n",
"**Hints**\n",
"- Be sure to lowercase the `genre` argument before you use it in your function.\n",
"- Use `in` to check if the given `genre` matches a value in `spreadsheet['Subgenre']`, instead of an `==`."
]
},
{
"cell_type": "code",
"execution_count": 32,
"metadata": {},
"outputs": [],
"source": [
"# TODO: Declare `count_songs_in_genre_in_year`, accepting two arguments: `year` and `genre`\n",
"def count_songs_in_genre_in_year(year,genre):\n",
" # TODO: Initialize `counter` to 0\n",
" counter = 0\n",
" # TODO: `enumerate` each `index` and `release_year` in `spreadsheet['Year']`\n",
" for index,release_year in enumerate(spreadsheet['Year']):\n",
" # TODO: Check if current element's `release_year` matches `year` argument\n",
" if release_year == year:\n",
" # TODO: Check if current element's subgenre matches `genre` argument \n",
" if genre in spreadsheet['Subgenre'][index]:\n",
" # TODO: If so, increment your `counter`\n",
" counter += 1\n",
"\n",
" # TODO: Return `counter`\n",
" return counter"
]
},
{
"cell_type": "code",
"execution_count": 33,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"4"
]
},
"execution_count": 33,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# TODO: Call `count_songs_in_genre_in_year` with: `2010, 'hip hop'` \n",
"count_songs_in_genre_in_year(2010,'hip hop')\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"You have been given a list, called `genres`, containing the following values:\n",
"- `'dance pop'`\n",
"- `'hip hop'`\n",
"- `country'`\n",
"\n",
"Invoke `count_songs_in_genre_in_year` on each genre in `genres`, for each year from 2010 to 2015, inclusive. Your code should produce output that starts with the following:\n",
"\n",
"```\n",
"31\n",
"38\n",
"15\n",
"42\n",
"27\n",
"52\n",
"\n",
"4\n",
"1\n",
"0\n",
"2\n",
"1\n",
"1\n",
"```"
]
},
{
"cell_type": "code",
"execution_count": 34,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"31\n",
"38\n",
"15\n",
"42\n",
"27\n",
"52\n",
"\n",
"4\n",
"1\n",
"0\n",
"2\n",
"1\n",
"1\n"
]
}
],
"source": [
"# TODO: Call `count_songs_in_genre` call 2010 - 2015, with `'dance pop'` as the `genre` argument for every year, and print the results\n",
"print(count_songs_in_genre_in_year(2010,'dance pop'))\n",
"print(count_songs_in_genre_in_year(2011,'dance pop'))\n",
"print(count_songs_in_genre_in_year(2012,'dance pop'))\n",
"print(count_songs_in_genre_in_year(2013,'dance pop'))\n",
"print(count_songs_in_genre_in_year(2014,'dance pop'))\n",
"print(count_songs_in_genre_in_year(2015,'dance pop'))\n",
"print()\n",
"\n",
"# TODO: Call `count_songs_in_genre` call 2010 - 2015, with `'hip hop'` as the `genre` argument for every year, and print the results\n",
"print(count_songs_in_genre_in_year(2010,'hip hop'))\n",
"print(count_songs_in_genre_in_year(2011,'hip hop'))\n",
"print(count_songs_in_genre_in_year(2012,'hip hop'))\n",
"print(count_songs_in_genre_in_year(2013,'hip hop'))\n",
"print(count_songs_in_genre_in_year(2014,'hip hop'))\n",
"print(count_songs_in_genre_in_year(2015,'hip hop'))"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.7.1"
}
},
"nbformat": 4,
"nbformat_minor": 2
}
Get the best Analyzing Spotify's Top 10 Songs For Event Management assignment help and tutoring services from our experts now!
This sample Python assignment solution has been successfully completed by our team of Python programmers. The solutions provided are designed exclusively for research and reference purposes. If you find value in reviewing the reports and code, our Python tutors would be delighted.
-
For a comprehensive solution package including code, reports, and screenshots, please visit our Python Assignment Sample Solution page.
-
Contact our Python experts for personalized online tutoring sessions focused on clarifying any doubts related to this assignment.
-
Explore the partial solution for this assignment available in the blog above for further insights.
About The Author - Jamie Lee
Jamie Lee is a data analyst specializing in music industry analytics. For this assignment, Jamie utilized Spotify data from 2010-2015 to identify potential performers for an event management company. Jamie's approach involved analyzing top 10 songs from each year, developing reusable functions for data analysis, and performing data cleansing. Key tasks included determining the number of top 10 songs, calculating average popularity, and analyzing genre distribution for dance pop and hip hop.