Let me share our scripts for the submission on: The DataSeer Grab Challenge 2017
First, let us import the needed libraries.
Then process the data set using pandas. First, read_csv to open the file.We used the column "created_at_local" to derive the "date", "time" and "day of week" columns using the datetime library.
Here is a sample of the extracted dataframe. The dataframe has 265073 rows and 13 columns.
The "city" and "city_only" dataframes were created. We'll keep "city" for later (see Fig 3).
Findings on Pickup Distance:
Fig 1: This is a summary of 2013 trips showing the average pickup distance (in Km.) per city. The data shows us that a trip gets cancelled if average pickup point is 1.8Km away and would be completed below the 1.8Km mark.
Fig 1: ax1 |
Fig 2: This graph shows the Actual Allocation Rate (AAR) per city for 2013. AAR is below 50% for all cities.
Fig 2: ax2 |
Fig 3: If sources are considered for the AAR data, it can be seen that VNU for Metro Manila has the highest AAR which is above 50%. We have utilized the "city" dataframe here. *It might be a good idea to promote VNU in other cities.
Fig 3: ax3 |
Fig 4: This graph shows the number of trip status per hour in Cebu. Cebu had a high unallocated rate from 5pm to 7pm.
A new dataframe called "time" is created. This df uses time (in hours) instead of dates.
Fig 4: ax4 |
Fig 5: This graph shows the number of trip status per hour in Davao. It is seen that Davao had a high unallocated rate at around 5pm and started to go down by 8pm.
Fig 5: ax5 |
Fig 6: ax6 |
Fig 7: Cebu had high travel count during Monday, Friday and Saturday. The daily dataframe shows the trip status and fare per day sorted per city. This also shows the daily daily pickup distance and corresponding trip status.
Fig 7: ax7 |
Fig 8: Davao peaked its travel count every Thursday.
Fig 8: ax8 |
Fig 9: Metro Manila's Peak is Friday.
Fig 9: ax9 |
...pardon on the html boxes...lol
No comments:
Post a Comment