Showing posts with label matplotlib. Show all posts
Showing posts with label matplotlib. Show all posts

Friday, June 16, 2017

Seismic Events for Q1 2017

Plot it here, there and everywhere...


My friend showed me the Phivolcs' data set on seismic activities and asked if I can interpret the data. I reviewed it and figured it would be great to add to my portfolio. At the same time, my Coursera assignment's due was almost up. We were tasked to pull data from the web and provide interpretation and data visualization. Sort of putting everything we learned all together...being an independent data scientist. I decided to use the Phivolcs data for the assignment. Hitting 2 birds with one stone...heh!

I started to work on the 2014 data since it showed all the data in 1 page. I used BeautifulSoup to scrape the data and I almost went insane after a week of working on the HTML file. I did not get the format that I needed.

So... To make things easier, I just used the 2017 data since the HTML tables were well formatted. I extracted a 3-month data sample of seismic activities from Phivolcs and converted the data to dataframes using Pandas. I used Matplotlib to plot the earthquake magnitudes and leveraged the Basemap toolkit to draw the map and plot the magnitudes in their respective longitude and latitude coordinates. Ha!


The complete python code may be found here. If you interested with the source file, I have extracted the raw logs as and saved as .csv here.

Here is a breakdown of my source code


Import the essential libraries needed. I am running Python 3.5 over Windows 8.1. As far as I can remember, I updated matplotlib to get the basemap library. I ran the command pip install matplotlib update under the administrator account to initiate the task. You may refer here for the documentation on basemap.


The Q1 data was extracted from Phivolcs' site. The monthly data were encoded on separate pages and were extracted separately. The read_HTML function would have an output of list of dataframes. After inspection, the needed dataframes were on the 3rd index and were extracted respectively. The 3 dataframes were then concatenated and named as df. The dataframe columns were renamed for proper labeling. Saving the file is only optional. I used to_csv to export the file.


Finally! Time to draw! Basemap is pretty easy to follow. (Not!) I used the Transverse Mercator Projection to have a realistic map.I also used the colorbar to aid in the data representation.



This tutorial helped me a lot in the visualization using Basemap. I hope it can help you too.



And....the output!


Thursday, May 4, 2017

Plotting Temperatures

I recently completed the 2nd assignment in the Coursera course. I had so much fun and had to post it on my blog! Ha!

We were tasked to analyze an 11-year data set which contains the maximum and minimum temperatures for everyday from 2005 to 2015. 2005 to 2014 data was sorted and 2015 data was set aside. The maximum and minimum temperatures per day were plotted. Feb 29 data was removed in the data to keep it clean. Finally, 2015 data was processed and temperature outliers were highlighted.

The following libraries were used.


The data set came from comes from a subset of The National Centers for Environmental Information (NCEI) Daily Global Historical Climatology Network (GHCN-Daily). The GHCN-Daily is comprised of daily climate records from thousands of land surface stations across the globe. 
  • The original dataframe has 165085 rows and 4 columns and was assigned to the variable df.
  • The 'Data_Value' column is in tenths of degrees C.


Additional columns were added to df for further processing.
  • February 29 from leap years were also removed at this stage. 
  • 'temp_in_C' was used to convert 'Data_Value' to Celcius. Spyder was able to process the /10 operation but Jupyter kept on crashing. One of the mentors advised that the division operation consumes alot of memory thus causing Jupyter notebook to crash. I opted to use *.10 instead.


Created more dataframes! More fun! At this stage, the maximum and minimum temperatures per day were identified. For example, all temps for November 24 of 2005 to 2014 were collected and the highest (or lowest) temperature was selected and placed to the new dataframe.
  • I felt comfortable using pivot_table but groupby function may be used as well.
  • 2015 was excluded in the dataframe. We needed to identify which 2015 temperature exceeded the 2005 to 2014 data.


Plotting! I no matplotlib expert (yet) but thanks to stackoverflow, I survived. Yey! I did try to minimize the noise to put more emphasis on the data requirement.


Finally, here is the output.





The complete code may be found here.

TrulyRichClub.com - Do You Want to Gain Financial Wealth and Spiritual Abundance at the Same Time?