The incremental data load approach in ETL (Extract, Transform and Load) is the ideal design pattern. In this process, we identify and process new and modified rows since the last ETL run.
Code is available on Github. I can see that it is picking up just changes but I wonder for a lot of data how efficient that actually is and whether that comparison should be done at the source or off somewhere else in the cloud where it can't affect the source's performance. Something to consider.
No comments:
Post a Comment
Note: only a member of this blog may post a comment.