Inserts/Upserts
Difference between Inserts (Append) and Upserts
Inserts and Upserts refer to how the data is uploading to the destination. Real-time streaming typically uses inserts to be able to produce low latency and cost data ingestion.
Inserts
Inserts insert each change as a new record in the destination.
For example, if you have an ecommerce record where the order status has changed, you will have 2 records showing the status both before and after.
This can be useful for tracking changes of data as well as a more optimal way of loading data for some destinations. If you go down this route, you will need to consider if/when to clean up the older records since this will impact the querying speed. Equally, users may not wish to use additional where clauses to filter out the older records and so you can create additional views on top of the raw data to only bring back the final state rows to make it easier to work with.
Upserts
Upserts over-ride a matching record (based upon a primary key) so that you do not have the older records in the database. This is the experience you may be familiar with in the batch loading world.
For example, if you have an ecommerce record where the order status has changed. Rather than have 2 records you ony have 1 with the latest change.
Updated about 2 months ago