Recently, I published an article where I briefly explain how to start and schedule a Delta Live Table Pipeline. I promised to explore the features of Delta Live Tables one by one and explain what each one can actually do for you.
Yesterday, I had some "nice" experiences in the office regarding data quality. So I thought it might be important to write about it. Data quality is becoming increasingly important. Nearly every mid-sized company possesses its own data, but the quality of this data is crucial. Whether you are a data scientist or a data analyst, working with accurate data is fundamental to effective analytics and insights. I mean, we base our decisions on data by asking the right questions. Yet even if you are asking the right questions, it is always assumed that the underlying data is accurate. If the data is incorrect, you will most likely make the wrong decisions for your company. I think I have convinced you and made it clear just how crucial doing the boring work is. Now let's get to it.
In Delta Live Tables, you can use expectations (not exceptions) to set rules for your data quality. These expectations ensure that the data in your tables meets the standards you set. You can easily apply these expectations to your queries using Python decorators or SQL constraint clauses. Last time, I showed you how to start a pipeline. As you can see, I did not use expectations as it is completely optional.
It consists of basically two things:
Now, let's start with a simple expectation that checks the data for our standard. However, in this case we don't define an action. Thereby, it will by default trigger the warn action which neither interrupts the pipeline nor drops the data.
CREATE OR REFRESH [STREAMING OR LIVE] TABLE table_name (
CONSTRAINT expectation_name EXPECT (orderCount > 0)
) AS
SELECT * FROM tbl_product
Invalid records are written to the target, and failures are reported as a metric for the dataset. If you don't mention the action, the "warn action" will be the default. This is useful for monitoring data quality issues without blocking the pipeline.
@dlt.expect("expectation_name", "condition")
Invalid records are removed before writing to the target, and failures are reported as metrics for the dataset. This ensures only clean data reaches your target tables while maintaining pipeline execution.
@dlt.expect_or_drop("expectation_name", "condition")
Invalid records stop the update from succeeding, requiring manual intervention before re-processing. This is the strictest option, ensuring absolutely no invalid data enters your system.
@dlt.expect_or_fail("expectation_name", "condition")
Choosing the right action depends on your data quality requirements and business context:
Delta Live Tables expectations provide a powerful framework for ensuring data quality at scale. By choosing the appropriate action type – warn, drop, or fail – you can implement data quality checks that align with your business requirements. Start with warn to understand your data patterns, then gradually implement stricter rules as needed. Remember: clean data is the foundation of reliable analytics and informed decision-making.
Ehsan Meisami is a data engineering expert specializing in Databricks, Delta Lake, and modern data pipeline architectures. With extensive experience in building scalable data solutions, Ehsan helps organizations implement robust data quality frameworks and optimize their analytics infrastructure.
Connect with Ehsan: LinkedIn | GitHub | Email | Schedule a Consultation