.. _tokenize: ======== tokenize ======== Splits a string based on a regular expression Description ~~~~~~~~~~~ The fn:tokenize function splits a string based on a regular expression. The regular expression syntax used is defined by XML Schema with a few modifications/additions in XQueryXPath/XSLT 2.0. The $pattern argument is a regular expression that represents the separator. The simplest patterns can be a single space, or a string that contains the separator character, such as ,. However, certain characters must be escaped in regular expressions, namely .\?*+|^${}()[]. The separators are not included in the result strings. If two adjacent separators appear, a zero-length string is included in the result sequence. If the string starts with the separator, a zero-length string is the first value returned. Likewise, if the string ends with the separator, a zero-length string is the last value in the result sequence. The $flags parameter allows for additional options in the interpretation of the regular expression. If a particular point in the string could match more than one alternative, the first alternative is chosen. This is exhibited in the last example below, where the function considers the comma to be the separator, even though "comma plus asterisk" also applies. .. list-table:: :widths: 40 60 :header-rows: 1 * - **Parameters** - **Description** * - input:string() - the string to tokenize * - pattern:string() - regular expression to match the delimiters * - flags:string() - flags that control multiline mode, case insensitivity, etc. Examples ~~~~~~~~ .. list-table:: :widths: 50 50 :header-rows: 1 * - **XPath** - **Results** * - tokenize('a b c', '\s') - ('a', 'b', 'c') * - tokenize('a b c', '\s') - ('a', '', '', 'b', 'c') * - tokenize('a b c', '\s+') - ('a', 'b', 'c') * - tokenize(' b c', '\s') - ('', 'b', 'c') * - tokenize('a,b,c', ',') - ('a', 'b', 'c') * - tokenize('a,b,,c', ',') - ('a', 'b', '', 'c') * - tokenize('a, b, c', '[,\s]+') - ('a', 'b', 'c') * - tokenize('2006-12-25T12:15:00', '[\-T:]') - ('2006', '12', '25', '12', '15', '00') * - tokenize('Hello, there.', '\W+') - ('Hello', 'there', '') * - tokenize((), '\s+') - () * - tokenize('abc', '\s') - abc * - tokenize('abcd', 'b?') - Error FORX0003 * - tokenize('a,xb,xc', ',|,x') - ('a', 'xb', 'xc') See Also ~~~~~~~~ .. toctree:: :titlesonly: :glob: * :ref:`functx_chars`.