Find a data set (Use something from another class, google free data sets, kaggle, etc.). You will need to include the data set in your submittal. Please use your python directory so that I should not see an exact path, just the file name in your code. Do not use a data set that I used in my lecture or that is in our Canvas Files - This will result in 10 points off your grade.
- Use pandas to create a data frame and analyze the data
- Use the entire data frame (ok to have ...) to do analysis (number of rows, column names)
- Pick 2 columns to do data analysis (average, min, max, and unique values)
- Attach a screenshot of the pandas analysis* ** (shell results)
- Create one graph. Whatever type you want. Attach a screenshot of the graph.
Free Assignment Solution - Data Summary And Visualization Using Python
"metadata": {},
"execution_count": 2
"cell_type": "markdown",
"metadata": {
"id": "uot7Z-iLU6fy"
"source": [
"__Descriptive Statistics:__"
"cell_type": "code",
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/",
"height": 300
"id": "yMbv4ZkIU57O",
"outputId": "96c7295b-3b41-4ecf-9617-ee0ddc449aaf"
"source": [
"execution_count": 3,
"outputs": [
"output_type": "execute_result",
"data": {
"text/plain": [
" SepalLengthCm SepalWidthCm PetalLengthCm PetalWidthCm\n",
"count 150.000000 150.000000 150.000000 150.000000\n",
"mean 5.843333 3.054000 3.758667 1.198667\n",
"std 0.828066 0.433594 1.764420 0.763161\n",
"min 4.300000 2.000000 1.000000 0.100000\n",
"25% 5.100000 2.800000 1.600000 0.300000\n",
"50% 5.800000 3.000000 4.350000 1.300000\n",
"75% 6.400000 3.300000 5.100000 1.800000\n",
"max 7.900000 4.400000 6.900000 2.500000"
"metadata": {},
"execution_count": 3
"cell_type": "markdown",
"metadata": {
"id": "SchBxW_-W1RP"
"source": [
"* __Correlation HeatMap:__"
"cell_type": "code",
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/",
"height": 578
"id": "F_YoNllbNeKT",
"outputId": "a6d3adc1-62f1-487b-d582-b3ca296fc2c6"
"source": [
"# Heatmapshowing correlation between variables\n",
"fig, ax =plt.subplots(figsize=(10, 10))\n",
"plt.title(\"Correlation Plot\")\n",
"sns.heatmap(df.corr(), mask=np.zeros_like(df.corr(), dtype=np.bool), cmap=sns.diverging_palette(220, 10, as_cmap=True),\n",
" square=True, ax=ax, annot=True,linewidths=5)\n",
"execution_count": 4,
"outputs": [
"output_type": "display_data",
