This is the start of a PySpark data analysis project concerning airline on-time performance. In this post, the usefulness of the Apache Parquet data format is explained as data is loaded and cleaned.
This is a great post and has code and a link to his github containing the code.
No comments:
Post a Comment
Note: only a member of this blog may post a comment.