Data Quality with Databricks Delta Live Tables - Ehsan Meisami

Recently, I published an article where I briefly explain how to start and schedule a Delta Live Table Pipeline. I promised to explore the features of Delta Live Tables one by one and explain what each one can actually do for you.

Yesterday, I had some "nice" experiences in the office regarding data quality. So I thought it might be important to write about it. Data quality is becoming increasingly important. Nearly every mid-sized company possesses its own data, but the quality of this data is crucial. Whether you are a data scientist or a data analyst, working with accurate data is fundamental to effective analytics and insights. I mean, we base our decisions on data by asking the right questions. Yet even if you are asking the right questions, it is always assumed that the underlying data is accurate. If the data is incorrect, you will most likely make the wrong decisions for your company. I think I have convinced you and made it clear just how crucial doing the boring work is. Now let's get to it.

What are Expectations?

In Delta Live Tables, you can use expectations (not exceptions) to set rules for your data quality. These expectations ensure that the data in your tables meets the standards you set. You can easily apply these expectations to your queries using Python decorators or SQL constraint clauses. Last time, I showed you how to start a pipeline. As you can see, I did not use expectations as it is completely optional.

It consists of basically two things:

You create a condition statement that is checked against your data. For instance, customerId is not null.
Based on the result, true/false, you can declare an action that has to happen. There are mutliple actions you can define and that's what makes it interesting:
1. Warn: Invalid records are written to the target, and failures are reported as a metric for the dataset. If you don't mention the action, the "warn action" will be the default.
2. Drop: Invalid records are removed before writing to the target, and failures are reported as metrics for the dataset.
3. Fail: Invalid records stop the update from succeeding, requiring manual intervention before re-processing.

Some Examples

Now, let's start with a simple expectation that checks the data for our standard. However, in this case we don't define an action. Thereby, it will by default trigger the warn action which neither interrupts the pipeline nor drops the data.

CREATE OR REFRESH [STREAMING OR LIVE] TABLE table_name (
    CONSTRAINT expectation_name EXPECT (orderCount > 0)
) AS
SELECT * FROM tbl_product

Three Action Types

Warn (Default)

Invalid records are written to the target, and failures are reported as a metric for the dataset. If you don't mention the action, the "warn action" will be the default. This is useful for monitoring data quality issues without blocking the pipeline.

@dlt.expect("expectation_name", "condition")

Drop

Invalid records are removed before writing to the target, and failures are reported as metrics for the dataset. This ensures only clean data reaches your target tables while maintaining pipeline execution.

@dlt.expect_or_drop("expectation_name", "condition")

Fail

Invalid records stop the update from succeeding, requiring manual intervention before re-processing. This is the strictest option, ensuring absolutely no invalid data enters your system.

@dlt.expect_or_fail("expectation_name", "condition")

When to Use Each Action

Choosing the right action depends on your data quality requirements and business context:

Drop: Use when you want to maintain data quality but can afford to lose some records. Ideal for filtering out malformed or incomplete data that won't impact analytics.
Fail: Use for critical data where every record matters. Perfect for financial data, compliance-related information, or when data integrity is paramount.
Warn: Use during development or when you want to monitor data quality trends without disrupting pipelines. Good for gradual implementation of quality rules.

Conclusion

Delta Live Tables expectations provide a powerful framework for ensuring data quality at scale. By choosing the appropriate action type – warn, drop, or fail – you can implement data quality checks that align with your business requirements. Start with warn to understand your data patterns, then gradually implement stricter rules as needed. Remember: clean data is the foundation of reliable analytics and informed decision-making.

About Ehsan Meisami

Ehsan Meisami is a data engineering expert specializing in Databricks, Delta Lake, and modern data pipeline architectures. With extensive experience in building scalable data solutions, Ehsan helps organizations implement robust data quality frameworks and optimize their analytics infrastructure.

Connect with Ehsan: LinkedIn | GitHub | Email | Schedule a Consultation

Back to Home