Skip to main content

How to remove spaces from extracted text


I received a question to our support desk from a user recently, and I thought that it would be helpful to others to share my approach!

The user wanted to extract some clean items from a text property in their transaction list so that they could find the corresponding ‘Country’ dimension items by using the ITEM() function. Generally, the values were simply an exact match to the unique ‘Name’ property for the items that they wanted to map to. However, in their source data, some items were followed by a whitespace and some additional information contained within parentheses, i.e. “Country (City Name)”.

The formula that they were using was not returning results for the transactions where the source text property contained parentheses.

 

It looked something like this:

ITEM(
IF(
ISBLANK(LEFT(SourceTransactionList.CountryText, FIND(" (", SourceTransactionList.CountryText))),
SourceTransactionList.CountryText,
LEFT(SourceTransactionList.CountryText, FIND(" (",SourceTransactionList.CountryText))
),
Country.Name
)

However, it produced this kind of result, which is not quite what the user expected:

A transaction list showing a property with the incorrect output.

 

There were a couple of observations that I made here where I identified that I could help:

The formula follows an IF(ISBLANK(ExpressionA), ExpressionB, ExpressionA) structure which means that ExpressionA needs to be calculated twice. This can be optimised with a fairly small change; the IFBLANK() function can be used instead of the IF(ISBLANK()…) function structure to improve the efficiency of the calculation. This improvement comes from removing the unnecessary extra calculation that is used to determine which result should be used. Instead IFBLANK(ExpressionA, ExpressionB) will return the result from ExpressionB only when ExpressionA is blank, requiring ExpressionA and ExpressionB to be calculated only once each.

 

Making this change leaves us with a more readable formula that looks like this:

ITEM(
IFBLANK(
LEFT(SourceTransactionList.CountryText, FIND(" (", SourceTransactionList.CountryText)),
SourceTransactionList.CountryText
),
Country.Name
)

 

However, there are still missing values in the result.

The issue here came from how the FIND() function is being used in combination with the LEFT() function.

 

The FIND() function has the following parameters (only the first two are used here, and the last two are optional):

FIND(Text to Find, Text to Search n, Starting Position to Search] o, Is Case Sensitive])

The function returns the character position number for the start of the string that is being searched for. In this case, the search string is two characters, with the first character being a whitespace.

 

The LEFT() function has the following parameters:

LEFT(Text to Extract, Number of Characters)

The function returns a string that contains the number of characters specified, beginning at the first character of the string used in the first parameter.

 

If we check the results for each of these functions as they are used, you can see the following results:

A transaction list showing the ‘FIND Result’, alongside the ‘FIND LEFT Result’.

For the example CountryText = France (Paris), notice the result of the FIND() is 7 when France has only 6 characters. This is because the character where the ‘Text to Search’ string that has been matched begins.

This means that the LEFT() function is leaving a trailing whitespace in the result, which means that no Item is returned from the ITEM() function because there is no exact match to a unique property value in the Country dimension. This is sometimes difficult to detect due to the nature of whitespaces being transparent characters - you can see in the above example that the ‘FIND LEFT Result’ column looks correct at a glance.

To solve this, we only ever want to return up to the character before the start of the searched string, so we must subtract 1 from the result of the FIND() function.

This has been done in the below formula:

ITEM(
IFBLANK(
LEFT(SourceTransactionList.CountryText, FIND(" (", SourceTransactionList.CountryText) - 1),
SourceTransactionList.CountryText
),
Country.Name
)

An alternative approach is to TRIM() the output of the LEFT() function. The TRIM() function removes all leading and trailing whitespace characters. This is useful if the number of whitespaces between the value and the parentheses is inconsistent, and will produce the same result.

ITEM(
IFBLANK(
TRIM(LEFT(SourceTransactionList.CountryText, FIND(" (", SourceTransactionList.CountryText))),
SourceTransactionList.CountryText
),
Country.Name
)

If the number of whitespaces is always going to be consistent, it is a better practice to use the first approach, and subtract the number of whitespaces expected from the result of the FIND() function. In this case, I recommended the first approach.

With that solution in place, the result now looks correct:

A transaction list showing a property with the correctly calculated output.

You can find the full documentation of the LEFT() and FIND() functions here:

 

0 replies

Be the first to reply!

Reply