Fixing data issues / Corrected Data Values
Last updated
Last updated
Often, data is not entirely clean. That is, it has problems that need to be corrected before meaningful analysis can be accomplished. For example, dates may be incorrectly formatted, or fields may contain a mix of numeric values and character codes that need to be separated into multiple fields. Calculated fields can often be used to fix these kinds of issues.
We’ll consider using Tableau Prep—a tool designed to shape and cleanse data—in Chapter 15, Structuring Messy Data to Work Well in Tableau. Tableau Prep’s calculation syntax is nearly identical, so many of the examples in this chapter will also be applicable in that context. Knowing how to address data issues in either Tableau Desktop or Tableau Prep will help you greatly.
We’ll continue working with the Vacation Rentals
data. You’ll recall that the start and end dates looked something like this:
Start | End |
Dec 2 | Dec 9 |
Dec 9 | Dec 15 |
Dec 16 | Dec 23 |
Without the year, Tableau does not recognize the Start
or End
fields as dates. Instead, Tableau recognizes them as strings. You might try using the drop-down menu on the fields in the data pane to change the data type to date. But without the year, Tableau will almost certainly parse them incorrectly, or at least incompletely. This is a case where we’ll need to use a calculation to fix the issue.
Assuming, in this case, that you are confident the year should always be 2020, you might create calculated fields named Start Date
and End Date
.
Here is the code for getting the start date:
And here is the code for getting the end date:
What these calculated fields do is concatenate the month and day with the year and then use the DATE()
function to convert the string into a date value. Indeed, Tableau recognizes the resulting fields as dates (with all the features of a date field, such as built-in hierarchies). A quick check in Tableau reveals the expected results:
Figure 4.15: The corrected dates appear next to the string versions. All fields are discrete dimensions on Rows (the dates are exact dates)
Not only are we able to fix problems in the data, but we can also extend the data and our analysis using calculations. We’ll consider this next.