Skip to main content
Skip table of contents

Text functions: Part 1

Overview

Text functions manipulate Text values and return other Text values.

Matching patterns

Matching patterns are used in certain text functions that match and/or extract portions of a string. The matching patterns are similar to SQL, and may contain the following specially-interpreted characters:

  • A matching pattern containing "_" will match any character in that position. For example, "_on" will match both "Jon" and "Don".

  • A matching pattern containing "%" will match any sequence of zero or more characters in that position. For example, "%cost%" will match any string containing the substring "cost".

  • A matching pattern containing "@" will cause a following "%" or "_" to be interpreted as a literal character instead of a matching pattern. Use this to match special characters. For example "%@%%" will match any string containing a "%" character.

The Substitute function does not use matching patterns; it replaces text on a character-by-character basis.

Asc

Returns an integer value that is a valid Unicode code point.

Syntax

Asc( text )

The required argument text may be longer than a single character, but only the value of the first character is returned, (or 0 if text is empty).

Example

  • Asc("€") returns 8364.

  • Asc("A") returns 65.

  • Asc("10") returns 49.

  • Asc("13") returns 49.

CODE
if (ASC(substring(X,1,1)) = 13) then "CR" 
elseif (ASC(substring(X,1,1)) = 10) then "LF"
else substring(x,1,1) 
endif

The sample above detects non-printing carriage return and linefeed characters.

AsText

Returns value as a text string.

Syntax

AsText( value )

The required argument value may be of any type except Binary. The length of the resulting string is dependent on the data type of value.

If value is of type Document, the returned text is a JSON Unicode representation of value.

Example

  • AsText(100) returns "100" as a text string.

  • AsText(CurrentDate()) returns (for example) "22 Mar 2007".

Chr (integer constant)

Returns a single-character string containing the character associated with the specified Unicode point value.

Syntax

Chr( integer_constant )

The required argument integer_constant must be a literal Integer representing a valid Unicode code point of 9, 10, 13, and anything within the range 32 – 1,114,111.

Remarks

This function will only create characters that are valid in XML. Characters outside this range will result in an error value.

Examples

  • Chr( 65 ) returns "A".

  • Chr( 65 + 32 ) returns "a".

ChrEx

Returns a single-character string containing the character associated with the specified Unicode point value.

Syntax

Chr( integer_constant )

The required argument integer_constant must be a literal Integer representing a valid Unicode code point.

Remarks

The argument integer_constant may be any integer, but 0 will result in a null value and 1 will result in an error code. All other values will be interpreted as code points, which may not work correctly in a given context if not within the range of valid XML characters.

Example

Chr( 12 ) returns a form-feed character.

ASCII table

The following table shows decimal code values for the standard printable ASCII character set and commonly used control characters.

Decimal

Character

Decimal

Character

Decimal

Character

9

horizontal tab

60

<

94

^

10

line feed, new line

61

=

95

_

11

vertical tab

62

>

96

`

12

form feed, new page

63

?

97

a

13

carriage return

64

96

98

b

27

escape

65

A

99

c

32

space

66

B

100

d

33

!

67

C

101

e

34

 

68

D

102

f

35

#

69

E

103

g

36

$

70

F

104

h

37

%

71

G

105

i

38

&

72

H

106

j

39

 

73

I

107

k

40

(

74

J

108

l

41

)

75

K

109

m

42

*

76

L

110

n

43

+

77

M

111

o

44

,

78

N

112

p

45

-

79

O

113

q

46

.

80

P

114

r

47

/

81

Q

115

s

48

0

82

R

116

t

49

1

83

S

117

u

50

2

84

T

118

v

51

3

85

U

119

w

52

4

86

V

120

x

53

5

87

W

121

y

54

6

88

X

122

z

55

7

89

Y

123

{

56

8

90

Z

124

|

57

9

91

[

125

}

58

:

92

\

126

~

59

;

93

]

127

DEL

ConcatenateRecord

Formats all fields of a record into a single text field, with input fields delimited by the specified character.

Syntax

ConcatenateRecord( text )

The required argument text must be a single character of type text.

Examples

ConcatenateRecord( "," ) might return "LESTER VANDERGRIFT,HERITAGE WOODS,642 MAIN ST,AGAWAM,MA,01001.".

CountSymbols

Counts the number of "symbols" in value, where symbols are defined as consecutive sequences of letters or numbers, or single punctuation marks. The result returned is an Integer.

Syntax

CountSymbols( value )

The required argument value may be of any type.

Examples

  • CountSymbols( "The quick brown fox!" ) returns 5.

  • CountSymbols( "fox, cat, dog, cow" ) returns 7.

CountWords

Counts the number of "words" in value, where words are defined as consecutive sequences of letters or numbers. The result returned is an Integer.

Syntax

CountWords( value )

The required argument value may be of any type.

Example

CountWords( "The quick, brown fox." ) returns 4.

DoubleMetaphone1

Returns the first result of the "double metaphone" phonetic matching algorithm.

Syntax

DoubleMetaphone1( value )

The required argument value may be of any type.

Remarks

The Double Metaphone algorithm is used to generate match keys or lookup keys from input text. It is often used in searching and matching algorithms as a "fuzzy index" key. See http://en.wikipedia.org/wiki/Double_Metaphone for details.

Both DoubleMetaphone1 and DoubleMetaphone2 return long keys, while some metaphone implementations restrict the key to four characters. If you require a short key, use the SubString function: Substring(DoubleMetaphone1(VALUE), 1, 4)

Alternatively, assign the result to a length-4 field.

We do not recommend using Metaphone for record matching. Use our matching macros and tools instead.

Example

DoubleMetaphone1( "YANKELOVICH" ) returns ANKLFX.

DoubleMetaphone2

Returns the second result of the "double metaphone" phonetic matching algorithm.

Syntax

DoubleMetaphone2( value )

The required argument value may be of any type.

Remarks

The Double Metaphone algorithm is used to generate match keys or lookup keys from input text. It is often used in searching and matching algorithms as a "fuzzy index" key. See http://en.wikipedia.org/wiki/Double_Metaphone for details.

Both DoubleMetaphone1 and DoubleMetaphone2 return long keys, while some metaphone implementations restrict the key to four characters. If you require a short key, use the SubString function: Substring(DoubleMetaphone1(VALUE), 1, 4)

Alternatively, assign the result to a length-4 field.

We do not recommend using Metaphone for record matching. Use our matching macros and tools instead.

Example

DoubleMetaphone1( "YANKELOVICH" ) returns ANKLFK.

Metaphone limitations

We do not recommend the use of Metaphone (or similar phonetic codes) for record matching. Though some legacy systems use Metaphone or other match keys for record matching, we do not recommend this approach due to its poor quality. For example, consider matching the following names:

CODE
JON FRANK
JON RANK
J BOB FRANK
JONATHON "BUZZ LIGHTYEAR" FRANK
MR JON E FRANK JR ESQ, VP of SALES
FRANK JON

These names are likely to be the same entity. But phonetic match keys often fail when confronted with these kinds of data issues, which may include:

  • Misspellings

  • Initials vs. full names

  • Missing name components

  • Diminutive vs. formal names 

  • Nicknames and other garbage present

  • Honorifics and titles present

  • Reversed name components

Data Management's (DMs) Name Parser and other cleanup logic can help, but it is very difficult to counter many of these issues. Nickname and diminutive name standardization helps, but also fails on easy cases like these:

SAM SMITH (is this SAMANTHA or SAMUEL?)
SAMUEL SMITH
SAMANTHA SMITH 
PAT JONES (is this PATRICIA or PATRICK?)
PATRICIA JONES
PATRICK JONES

Similar issues occur with company names, for example:

INTERNATIONAL BUSINESS MACHINES
IBM
I.B.M.
I B M
INTL BUS MACH
INTL BUSINESS MACH, MEMPHIS SALES OFFICE

And similar problems are also found in addresses, especially those that fail to standardize for various reasons:

CODE
1200 RFK PARKWAY
1200 RFK PARKWAY #100
120 RFK PARKWAY 
1200 ROBERT F KENNEDY PARKWAY
1200 STATE HIGHWAY 46

These issues include:

  • Presence/absence of unit designator

  • Typo on address number

  • Acronym vs. long form of street name

  • Street name alias

This is why Data Management uses record comparison within a one or more match segments, instead of match keys.

EditDistance

Computes a score a text value measuring the similarity of two text values.

Syntax

EditDistance( value1, value2 )

The required arguments value1 and value 2 may be of any type.

Remarks

Returns a score in the range 0 to 100 measuring the similarity of value1 and value2. A score of zero means the two values are unrelated. A score of 100 means the two values are identical. Case is ignored in the comparison.

This function first compares the two strings and counts the number of errors in the comparison. Errors include:

  • Substitutions ("John" vs. "Johx")

  • Transpositions ("John" vs. "Jonh")

  • Deletions ("John" vs. "Jon") which count as two errors

The final score is then calculated as follows:

100 – error count * 100 / length of longest value

Example

EditDistance( "John Smith", "John Smyth" ) returns an EditDistance of 90.

JavaScript errors detected

Please note, these errors can depend on your browser setup.

If this problem persists, please contact our support.