Monday 5 September 2016

Airline Flight Data Analysis – Part 1 – Data Preparation by Michael Kamprath via @DIYBigData

This is the start of a PySpark data analysis project concerning airline on-time performance. In this post, the usefulness of the Apache Parquet data format is explained as data is loaded and cleaned.

This is a great post and has code and a link to his github containing the code.

No comments:

Post a Comment

Note: only a member of this blog may post a comment.