Friday, March 31, 2017

Code used for the Grab Challenge 2017

Yey! we did not win! 

Let me share our scripts for the submission on: The DataSeer Grab Challenge 2017





First, let us import the needed libraries.


Then process the data set using pandas. First, read_csv to open the file.We used the column "created_at_local" to derive the "date", "time" and "day of week" columns using the datetime library.

Here is a sample of the extracted dataframe. The dataframe has 265073 rows and 13 columns.


The "city" and "city_only" dataframes were created. We'll keep "city" for later (see Fig 3).

Findings on Pickup Distance:
Fig 1: This is a summary of 2013 trips showing the average pickup distance (in Km.) per city. The data shows us that a trip gets cancelled if average pickup point is 1.8Km away and would be completed below the 1.8Km mark.


Fig 1: ax1
Findings on AAR:
Fig 2: This graph shows the Actual Allocation Rate (AAR) per city for 2013. AAR is below 50% for all cities.



Fig 2: ax2
Recommendation on AAR:
Fig 3: If sources are considered for the AAR data, it can be seen that VNU for Metro Manila has the highest AAR which is above 50%. We have utilized the "city" dataframe here.
*It might be a good idea to promote VNU in other cities.

Fig 3: ax3
Findings on the Relationship of Trip State with Time:
Fig 4: This graph shows the number of trip status per hour in Cebu. Cebu had a high unallocated rate from 5pm to 7pm.


A new dataframe called "time" is created. This df uses time (in hours) instead of dates.



Fig 4: ax4


Fig 5: This graph shows the number of trip status per hour in  Davao. It is seen that Davao had a high unallocated rate at around 5pm and started to go down by 8pm.


Fig 5: ax5
Fig 6: This graph shows the number of trip status per hour in Metro Manila. Metro Manila had an initial spike on unallocated trips which started on the early commute hour of 5am and dropped at around 10 am and another spike which started to rise at 3pm and went down by 9pm. Highest peak at around 6pm.


Fig 6: ax6


Fig 7: Cebu had high travel count during Monday, Friday and Saturday. The daily dataframe shows the trip status and fare per day sorted per city. This also shows the daily daily pickup distance and corresponding trip status.

Fig 7: ax7


Fig 8: Davao peaked its travel count every Thursday.


Fig 8: ax8


Fig 9: Metro Manila's Peak is Friday.

Fig 9: ax9



We hope you have picked up a thing or two...Your comments are very welcome! 
...pardon on the html boxes...lol 

Tuesday, March 21, 2017

We have declared war with Pie Graphs!

I signed up at meetup.com to check on upcoming IT conferences. Initially, I was interested in topics such as DevOps, AWS and Cloud Operations. I scanned thru the topics and well...who am I kidding? I won't be able to catch up with the folks there. I am way behind in Operations Technology! Ha!

I did mention before that I started a "journey" (ugh) in Python which led to Studying Data Science. So...I searched for topics related to both and found a couple...ok..Just 1. DataSeer hosted an event last January 19, 2017 entitled "The Art of Data Story Telling" and this is where we declared war with Pie Graphs! Argh!

During the first few minutes of the talk, I got my big take away...it says, "Your data presentation should communicate the Big Take Away clearly" Ha!


Edward Tufte's concepts were also discussed. It mainly revolved on a minimalist approach in data presentation. Always mind the Data Ink Ratio. Avoid bad visualization, resist 3D! Additional ink only distracts. Above all else, show the data.




The minimalist approach really made sense...Until I came across the work of Nigel Holmes. "Nigel Holmes, whose work regularly incorporates strong visual imagery into the fabric of the chart" (I got this some where and I guess it would be safe to quote it...lol)... OK...I'm sleepy now, you can read more of Nigel Holmes here.