Partition overwriting using parquet and Databricks
Note about parquet and updating tables
Nowadays many companies are using the delta format (if they use Databricks) when they have data in the laje that needs to be updated.
This notebook shows what you had to do before using the delta format. How you needed to manage your update strategy like:
- Rewriting the full table
- Rewriting selected partitions manually
- Rewriting partitions dynamically
When new technology arrives, like delta, it is good to understad some of the problems or challenges that the new technology solves.
How to update a table backed by partitioned parquet in your data lake
Example notebook can be seen below
comments powered by Disqus