- 20th Jul 2024
- 21:00 pm
In this assignment export and download the data as csv or any data format, and then load it into SQL Server or any relational database. Then can you write a few queries to tell us:
- The longest and shortest trip duration of any trip
- The count of total passenger of all the trips
- The farthest and shortest distance of any trip
- Looking only at the fare_amount, tip_amount, and tolls_amount, what is the average profit of the
2019 (entire filtered dataset)
- For each month, display the entire row with the highest fare amount.
(There should be one row for each month, a total of 7 to 8 rows, because the dataset is missing data for some months). Please send us the python(or other) script, SQL Queries, screenshots of the query results, and the screen shot of the data that is loaded into the database.
Free Assignment Solution - Exporting, Loading, and Analyzing Trip Data in SQL Server
import pandas as pd
df = pd.read_csv('2018_Yellow_Taxi_Trip_Data.csv')
#ongestDuration = df['tpep_dropoff_datetime'] - df['tpep_pickup_datetime'].max()
df['tpep_pickup_datetime'] = pd.to_datetime(df['tpep_pickup_datetime'])
df['tpep_dropoff_datetime'] = pd.to_datetime(df['tpep_dropoff_datetime'])
duration = df['tpep_dropoff_datetime'] - df['tpep_pickup_datetime']
longest = duration[0].total_seconds() / 60
shortest = duration[0].total_seconds() / 60
for i in duration:
i = i.total_seconds() / 60
if longest < i:
longest = i
if shortest > i:
shortest = i
totalAmount = sum(df['fare_amount'] + df['tip_amount'] + df['tolls_amount'])
average = totalAmount / df['fare_amount'].count()
print("Longest Duration : %.2f seconds"%(longest))
print("Shortest Duration : %.2f seconds"%(shortest))
print("Total Passenger : %d"%(sum(df['passenger_count'])))
print("Farthest Distance: %.2f"%(max(df['trip_distance'])))
print("Shortest Distance: %.2f"%(min(df['trip_distance'])))
print("Average of amount: %.2f"%(average))
highestIndex = 0
highest = -999999
for col in df.columns:
print("%-25s"%str(col),end="")
print()
for i in range(0,180):
if(df['total_amount'][i] > highest):
highestIndex = i
highest = df['total_amount'][i]
if(df['tpep_pickup_datetime'][i].month != df['tpep_pickup_datetime'][i+1].month):
for col in df.columns:
print("%-25s"%str(df[col][highestIndex]),end="")
print()
highest = -999999
highestIndex = i + 1
if highest!= -999999:
for col in df.columns:
print("%-25s"%str(df[col][highestIndex]),end="")
Get the best Exporting, Loading, and Analyzing Trip Data in SQL Server assignment help and tutoring services from our experts now!
Please note that this is a sample assignment solved by our Python Programmers. These solutions are intended to be used for research and reference purposes only. If you can learn any concepts by going through the reports and code, then our Python Tutors would be very happy.
- Option 1 - To download the complete solution along with Code, Report and screenshots - Please visit our Python Assignment Sample Solution page
- Option 2 - Reach out to our Python Tutors to get online tutoring related to this assignment and get your doubts cleared
- Option 3 - You can check the partial solution for this assignment in this blog above.
About The Author - Alex Johnson
Alex Johnson is a seasoned data analyst specializing in data extraction, transformation, and loading (ETL) processes. With a strong background in SQL and Python, Alex excels in importing and managing data within relational databases such as SQL Server. His expertise extends to writing complex queries to derive insightful analytics, including trip duration, passenger count, distance calculations, and profitability analysis.