Str.tokenizeAndFold

Tokenises @string and performs folding on each token.

A token is a non-empty sequence of alphanumeric characters in the source string, separated by non-alphanumeric characters. An "alphanumeric" character for this purpose is one that matches g_unichar_isalnum() or g_unichar_ismark().

Each token is then (Unicode) normalised and case-folded. If @ascii_alternates is non-%NULL and some of the returned tokens contain non-ASCII characters, ASCII alternatives will be generated.

The number of ASCII alternatives that are generated and the method for doing so is unspecified, but @translit_locale (if specified) may improve the transliteration if the language of the source string is known.

struct Str

static

string[]

tokenizeAndFold

(

string str

string translitLocale

out string[] asciiAlternates

)

Parameters

str string: a string
translitLocale string: the language code (like 'de' or 'en_GB') from which @string originates
asciiAlternates string[]: a return location for ASCII alternates

Return Value

Type: string[]

the folded tokens

Str.tokenizeAndFold

Parameters

Return Value

Meta

Since

Source