Excel Hack 73: Clean data in MS Excel (remove extra spaces)

While performing data analysis in Excel, you may have encountered data in your set that includes extra spaces. Like the example below:

These spaces can either be:

  • leading spaces (spaces just before the start of the actual data text),
  • double spaces (extra spaces in between text, at a point in the data text that should traditionally have just one space),
  • trailing spaces (spaces right at the end of the actual data text).

To the regular Excel user, this does not look problematic. But to a data analyst, this is clearly an issue as this creates problems with data analysis. Excel recognizes data with extra spaces as a different unique item from others. So if you wanted to count items, the data points with extra spaces will not be included in the analysis. An example is the Kaduna countif operation in the snapshot below:

This can result in frustration in the data analysis process, as you are using the right Excel functions but getting the wrong results.

Of course, if you have a small data set, you can manually clear leading and double spaces by editing each faulty data point, one by one. But what if you have trailing spaces (that are hard to detect with the human eye)? Or what if you have several data points to update?

This is where the data cleaning functions – TRIM() and SUBSTITUTE() – come in.

(CLEAN() and REPLACE() can also be used as they perform almost similar functions to the two above. CLEAN() will remove characters that cannot be printed as well)

The TRIM() (or CLEAN()) function will clear up double spaces. In our example below, the FCT Abuja data point now counts correctly after applying the =TRIM() function on the original data

(You can also use Find and Replace (Ctrl+F) to remove double spaces from text. Place two spaces in the “Find” field and one space in the “Replace” field and hit “Replace All”)

However, the Kaduna count is still incorrect because the TRIM function only cleans up breaking spaces i.e. spaces that have text on both sides and is not able to clean up non-breaking spaces. The leading and trailing spaces are non-breaking spaces as they only have text on one side of the space and not on both.

To clean up the non-breaking spaces, we will require the SUBSTITUTE function.

First, let’s understand how the SUBSTITUTE function works.

The SUBSTITUTE function will replace a given character or text in a cell’s content with a specified string. The syntax for the SUBSTITUTE function is

=SUBSTITUTE(text, old text, new text, [instance number])  

Below are two examples of the SUBSTITUTE function in action:

To substitute a non-breaking space, we have to refer to it by its character code based on the Excel character code system. There are 255 character codes, and the non-breaking space is the 160th character. So it is referred to as CHAR(160).

We will now combine the SUBSTITUTE and CHAR functions to run an operation to clear leading and trailing spaces. The combined function is below:

This operation has normalized the Kaduna data in A4 and A27, and when we run a COUNTIF operation on column C, we see 2 counts against Kaduna

However, we want to clean up all the 3 types of extra spaces – leading, trailing and double – at the same time. To do this, we will use a super formula that nests the three functions – SUBSTITUTE, CHAR and TRIM – to perform the clean up operation in one go.

When we extend this formula to the rest of the table and run the count again, the data reflects a cleaned data, evidenced by an accurate count of items in the dataset.

One thought on “Excel Hack 73: Clean data in MS Excel (remove extra spaces)

Leave a Reply

Your email address will not be published. Required fields are marked *