Str.tokenizeAndFold

Tokenises @string and performs folding on each token.

A token is a non-empty sequence of alphanumeric characters in the source string, separated by non-alphanumeric characters. An "alphanumeric" character for this purpose is one that matches g_unichar_isalnum() or g_unichar_ismark().

Each token is then (Unicode) normalised and case-folded. If @ascii_alternates is non-%NULL and some of the returned tokens contain non-ASCII characters, ASCII alternatives will be generated.

The number of ASCII alternatives that are generated and the method for doing so is unspecified, but @translit_locale (if specified) may improve the transliteration if the language of the source string is known.

struct Str
static
string[]
tokenizeAndFold
(
string str
,,
out string[] asciiAlternates
)

Parameters

str string

a string

translitLocale string

the language code (like 'de' or 'en_GB') from which @string originates

asciiAlternates string[]

a return location for ASCII alternates

Return Value

Type: string[]

the folded tokens

Meta

Since

2.40