Partition overwriting using parquet and Databricks

Note about parquet and updating tables

Nowadays many companies are using the delta format (if they use Databricks) when they have data in the laje that needs to be updated.

This notebook shows what you had to do before using the delta format. How you needed to manage your update strategy like:

  • Rewriting the full table
  • Rewriting selected partitions manually
  • Rewriting partitions dynamically

When new technology arrives, like delta, it is good to understad some of the problems or challenges that the new technology solves.

How to update a table backed by partitioned parquet in your data lake

Example notebook can be seen below

Click here to view the notebook in full screen

comments powered by Disqus