How To Spot Bad Quality Data And Manage It – 6 Doable Tips

woman in black coat

Quality data is necessary for making the right analyses, which in turn, results in making the right decisions. S,o if your data is wrong your analyses will be wrong and the decisions you’ll make with the help of it will also be wrong.

This is why you need to make sure you feed good-quality data into your database. But what are good and bad quality data? And if you know this do you know how to spot bad quality data and how to manage it?

In this article, we’ll be discussing all of that so let’s get right into it.

Desktop Specialist

Learn more about Desktop Specialist Certification.

Data Analyst

Learn more about the Data Analyst Certification.

Server Certified

Learn more about the Tableau Server Certification.

🎁 Did you know you can get $30 for sharing your work and certification experiences? Write and get paid for every article. Learn more »

How to spot bad-quality data
How to spot bad-quality data

What is Bad Quality Data?

You may think it’s obvious what bad-quality data looks like. But if you want to assess the quality of your data and find ways to improve it, it helps to know what constitutes “bad” in this context.

Bad quality data can be any data that doesn’t meet your business requirements, whether due to accuracy, completeness, or timeliness. You might be surprised by just how many different types of bad-quality data are out there!

A fun fact: The term “data quality” became popular in the 1990s due to the work of Hans Peter Luhn, who defined it as “the accuracy, consistency, timeliness, and validity of data.”

Which Factors Contribute To Identifying The Quality Of Data?

The following factors determine the quality of data:

1. Data completeness

This refers to how many records are present in a file. If a document is missing from the file, then it is not complete. You can determine the completeness of records by comparing them with other databases of similar nature.

Keep on Reading: Tableau Data Analyst Certification Questions »

2. Data consistency

This refers to whether the data is consistent or not. For example, there could be two different entries for the same person having a different address or date of birth. These inconsistencies are severe flaws in data quality and should be avoided at all costs.

3. Data Accuracy

This refers to whether the data is correct or incorrect. For example, if you have information about your employee’s birthday but have recorded it wrongly as 1st June instead of 30th May, this would be an inaccurate piece of data that needs correction immediately.

Keep reading: Tableau Desktop Specialist Certification Questions »

Loading RSS Feed

How To Spot Bad Quality Data?

These are the 6 signs of bad-quality data.

1.    Missing Data

This is a common problem, but it can be caused by many different things. If you’ve missing data because your survey respondents didn’t answer the question this can create trouble while analyzing the data.

Fill in all those columns and rows to avoid any trouble.

2.    Unexpected missing values

You can determine if missing values are due to data entry errors by looking at other variables that have similar content. If every variable has a high percentage of missing values, it is likely that the reason behind this is not because the data was never collected but rather because of data entry errors. In this case, you should focus on improving your internal processes so that your team members enter information correctly and consistently.

In contrast, if only one or two variables have a high amount of missing values and all others are properly filled out, it could be that those particular questions were never asked or were removed from the questionnaire during edits (either intentionally or unintentionally).

Consider whether these questions may be important for future analyses by exploring how respondents answered them when they were present (using another file). If there does not appear to be any pattern as to why some respondents skipped certain questions based on their responses elsewhere in the survey instrument or how many people skipped them altogether among those who did answer them—then perhaps those items weren’t necessary after all!

3.    Duplicate rows

If you find a row that’s been duplicated, you can remove the extra copy. Make sure there are no duplicate rows or even columns as well.

4.    Extremely high or low values in continuous features

When you have a continuous feature, you should make sure that the values in the data set are between -1 and 1. If they’re outside of this range, then there’s likely something wrong with them.

This can happen when one or two outliers are entered into your database or an automated process is used to enter data without checking it first. You can deal with this by dropping these outliers from your dataset or transforming it so that all of the values fall within a certain range (e.g., -5 to 5).

Another problem could be if there are many outliers relative to other values in the set: for example if only one value was more than five standard deviations away from its mean. This will skew your results significantly and might indicate that there’s something wrong with either your sampling procedure or how people inputted their answers into their responses sheets—or both!

5.    Extremely high or low unique values in categorical features

You may have noticed that the number of unique values in a categorical feature should be proportional to the number of rows. If you see a value with an extremely high or low count compared to other values in its feature, it’s probably an indication that something is wrong.

  • Too few unique values: In some cases, there might be missing data for this feature or it might even be completely empty. Either way, if you have no data for this column then you can’t use it!
  • Too many unique values: On the other hand, if there are too many unique values but not enough rows to explain them (for example 75% of your users are named John Smith), then this could also mean that some users’ names were repeated more than once when they submitted their feedback through your app.

6.    Dataset contains only a single class

If you find yourself with a dataset that only has one class, this is not good. In fact, the presence of a single class in your dataset is usually an indication that something has gone wrong and the data may be unusable. Data can become unbalanced for a variety of reasons.

One reason is that there might be a large difference between the sizes of the classes to begin with (for example, if 90% of people are male). If this is true, then it doesn’t matter how much you train your model or how much time you spend tuning it—it will still perform poorly on unseen data. The problem here lies in overfitting: Your algorithm will learn all about these few samples but won’t generalize well beyond them.

Another reason why your data might end up imbalanced is that it was collected by hand rather than automatically.

How To Manage Bad Data?

Now you know how to spot low-quality data. Let’s see how you can manage it.

How to manage bad quality data
How to manage bad quality data

1.    Identify the source of the bad data.

The first step to managing bad data is identifying its source. A bad data source could be any number of things:

  • A human error–someone inputted incorrect data, or transposed numbers on a spreadsheet
  • A technical problem–a database goes down, or there’s a system glitch that causes bad data to be entered into your system

In order to manage bad data effectively, it’s important to know what kind of bad data you’re dealing with so that you can take steps to correct it. There are two basic kinds of bad data sources: human and technical. Keep that in mind.

2.    Create consistency checks and regularize your data.

You can avoid bad data by creating consistency checks and regularizing your data.

  • Regularize the data. Use a consistent naming convention, format, structure, method of entry, and method of storage for all information you collect.
  • Create consistency checks that will alert you to any inconsistencies in the data being entered into your system. For example, if an employee’s e-mail address changes from to due to an update to their company directory information or some HR changes affecting their login credentials, then it’s important for your system to flag this. So you can review these records before they are updated further downstream.

3.    Check for outliers or errors in your dataset.

Outliers and errors are two types of bad data that can be found in your dataset. Outliers are data points that are significantly different from the rest of the data. They may be caused by an error during collection, or they could be valid outliers that should be kept in your dataset. Errors are any incorrect information within your dataset, such as misspellings or transposed numbers.

Identifying outliers and errors is often done with descriptive statistics like mean, median, mode and range for numeric data.

4.    Fix all of your problems as soon as possible.

Fixing bad data as soon as possible is the best way to minimize its damage. Bad data will continue to hurt your business if you don’t take action, so make sure that you have a plan in place for dealing with it.

If your team has been following this guide, they should already be aware of the problem areas and know how to fix them. But if they haven’t, start by educating them on what bad data looks like and why it could harm their business. The sooner they learn this information, the easier it will be for them to avoid future mistakes


Data is an asset, competitive advantage and the key to future growth. The problem is that so many companies don’t realize how much of a priority data quality should be—and as a result, they’re not taking the proper steps to gather the right kind of data.

Get our Most Popular Downloads

Download the most popular scenario-based Tableau Workbooks in .twbx format. Used by thousands of Tableau developers and job aspirants every day to improve and fine-tune their CV and Tableau Public profile. Join the largest Tableau Experts Social Group.

Tableau Banking and Financial Dataset Analysis Download
Banking & Financial Dataset Analysis

Financial Domain Tableau Dataset and Analysis. The most important domain in today’s industry. Analyze Key Performance Indicators. Discover Risky and Fraudulent Outliers. Download the Tableau Packaged (.twbx) Workbook. Includes a complete Financial dataset analysis. Enhance your Data Analytics experience with our skilled analysis.

Tableau Healthcare and Hospital Dataset Analysis Download
Healthcare & Hospital Dataset Analysis

Hospital and Healthcare Domain Tableau Dataset and Analysis. A key field of study with millions of lives at stake. The most sensitive industry today. Download the Tableau Packaged (.twbx) Workbook. Understand how healthcare datasets work. Includes a complete Healthcare dataset with analytical charts. Explore Tableau interactive features with this download.

Tableau Insurance Dataset Analysis Download
Insurance Dataset Analysis

Insurance Domain Tableau Dataset and Analysis. Important domain specific metrics and data. Learn how to visualize important metrics. Show outliers and insightful data points. Download the Tableau Packaged (.twbx) Workbook. Includes comprehensive analysis of Insurance data of a large sample population. Uses industry standard analytical practices.

Tableau Practice Test

The best Tableau practice exams built. Period. Explore definitive practical problems created by brilliant Tableau experts.

Get in Touch

Contact Us
Write for Us

Published by Rahul Bhattacharya

Rahul is a journalist with expertise in researching a variety of topics and writing engaging contents. He is also a data analyst and an expert in visualizing business scenarios using data science. Rahul is skilled in a number of programming languages and data analysis tools. When he is not busy writing, Rahul can be found somewhere in the Appalachian trails or in an ethnic restaurant in Chicago.

Leave a Reply