Category: Data Quality / dqops

dqops Data Quality Rules (Part 2) – CFD, Machine Learning

dqops Data Quality Rules (Part 2) – CFD, Machine Learning

The previous blog post introduces a list of basic data quality rules that have been developed for my R&D data quality improvement initiative. Those rules are fundamental and essential for detecting data quality problems. However, those rules have existed since a long, long time ago and they are neither innovative nor exciting. More importantly, those … Continue reading dqops Data Quality Rules (Part 2) – CFD, Machine Learning

dqops Data Quality Rules (Part 1) – Basic Rules

In my previous blog post, I discussed the requirements and core elements of the rule-based data quality assessments. In this and next blog posts, I am going to walk through the data quality rules designed in the dqops DQ Studio app (one of my R&D initiatives for data quality improvement), ranging from the basic data … Continue reading dqops Data Quality Rules (Part 1) – Basic Rules

Data Quality Improvement – Rule-Based Data Quality Assessment

Data Quality Improvement – Rule-Based Data Quality Assessment

As discussed in the previous blog posts in my Data Quality Improvement series, the key for successful data quality management is the continuous awareness and insights of how fit your data is being used for your business. Data quality assessment is the core and possibly the most challenging activity in the data quality management process. … Continue reading Data Quality Improvement – Rule-Based Data Quality Assessment

What is Data Management, actually? – DAMA-DMBOK Framework

What is Data Management, actually? – DAMA-DMBOK Framework

"What is data management?". I guess many people will (at least I think I will) answer "em... data management is managing data, right?" at the same time swearing in their heads that "what a stupid question!". However, if I was asked this question in a job interview, I guess I'd better to provide a bit … Continue reading What is Data Management, actually? – DAMA-DMBOK Framework

Data Quality Improvement – DQ Dimensions = Confusions

Data Quality Improvement – DQ Dimensions = Confusions

DQ Dimensions are Confusing Data quality dimensions are great inventions from our data quality thought leaders and experts. Since the concept of quality dimensions was originally proposed in the course of the Total Data Quality Management (TDQM) program of MIT in the 1980s [5], a large number of data quality dimensions have been defined by … Continue reading Data Quality Improvement – DQ Dimensions = Confusions

Data Quality Improvement – Conditional Functional Dependency (CFD)

Data Quality Improvement – Conditional Functional Dependency (CFD)

To fulfil the promise I made before, I dedicate this blog post to cover the topic of Conditional Functional Dependency (CFD). The reason that I dedicate a whole blog post to this topic is that CFD is one of the most promising constraints to detect and repair inconsistencies in a dataset. The use of CFD … Continue reading Data Quality Improvement – Conditional Functional Dependency (CFD)

Data Quality Improvement – Data Profiling

This is the second post of my Data Quality Improvement blog series. This blog post discusses the data profiling tasks that I think are relevant to data quality improvement use cases. For anyone who has ever worked with data, she or he must has already done some sort of data profiling, either using a commercial … Continue reading Data Quality Improvement – Data Profiling

Data Quality – 80:20 Rule and 1:10:100 Rule

I came across two data quality rules from Martin Doyle's blog today. Martin Doyle is a data quality improvement evangelist and an industry expert on CRM. I found, to a certain extent, those data quality rules provide some kind of theoretical supports to some of my ideas with data quality improvements. 80:20 Rule The 80:20 … Continue reading Data Quality – 80:20 Rule and 1:10:100 Rule

Data Quality Improvement – Set the Scene Up

In this blog series I plan to write about data quality improvement from a data engineer's perspective. I plan this blog series to cover not only data quality concepts, methodologies, procedures but also to case study the architectural designs of some data quality management platforms and deep dive into technical details for implementing a data … Continue reading Data Quality Improvement – Set the Scene Up

Our Data Quality is Good, Nothing Breakdown

Boss: Our data quality is good, nothing breakdown IT: Our data quality is good, there are some unimportant known issues, but all under control BI Developer: All data is from source systems, the quality should be good. Hay, look, how cool is the dashboard I built Business Users: Once again, those reports don't make sense … Continue reading Our Data Quality is Good, Nothing Breakdown