Как найти символ postgresql

There are three separate approaches to pattern matching provided by PostgreSQL: the traditional SQL LIKE operator, the more recent SIMILAR TO operator (added in SQL:1999), and POSIX-style regular expressions. Aside from the basic does this string match this pattern? operators, functions are available to extract or replace matching substrings and to split a string at matching locations.

The pattern matching operators of all three kinds do not support nondeterministic collations. If required, apply a different collation to the expression to work around this limitation.

9.7.1. LIKE

string LIKE pattern [ESCAPE escape-character]
string NOT LIKE pattern [ESCAPE escape-character]

The LIKE expression returns true if the string matches the supplied pattern. (As expected, the NOT LIKE expression returns false if LIKE returns true, and vice versa. An equivalent expression is NOT (string LIKE pattern).)

If pattern does not contain percent signs or underscores, then the pattern only represents the string itself; in that case LIKE acts like the equals operator. An underscore (_) in pattern stands for (matches) any single character; a percent sign (%) matches any sequence of zero or more characters.

Some examples:

'abc' LIKE 'abc'    true
'abc' LIKE 'a%'     true
'abc' LIKE '_b_'    true
'abc' LIKE 'c'      false

LIKE pattern matching always covers the entire string. Therefore, if it’s desired to match a sequence anywhere within a string, the pattern must start and end with a percent sign.

To match a literal underscore or percent sign without matching other characters, the respective character in pattern must be preceded by the escape character. The default escape character is the backslash but a different one can be selected by using the ESCAPE clause. To match the escape character itself, write two escape characters.

It’s also possible to select no escape character by writing ESCAPE ''. This effectively disables the escape mechanism, which makes it impossible to turn off the special meaning of underscore and percent signs in the pattern.

According to the SQL standard, omitting ESCAPE means there is no escape character (rather than defaulting to a backslash), and a zero-length ESCAPE value is disallowed. PostgreSQL‘s behavior in this regard is therefore slightly nonstandard.

The key word ILIKE can be used instead of LIKE to make the match case-insensitive according to the active locale. This is not in the SQL standard but is a PostgreSQL extension.

The operator ~~ is equivalent to LIKE, and ~~* corresponds to ILIKE. There are also !~~ and !~~* operators that represent NOT LIKE and NOT ILIKE, respectively. All of these operators are PostgreSQL-specific. You may see these operator names in EXPLAIN output and similar places, since the parser actually translates LIKE et al. to these operators.

The phrases LIKE, ILIKE, NOT LIKE, and NOT ILIKE are generally treated as operators in PostgreSQL syntax; for example they can be used in expression operator ANY (subquery) constructs, although an ESCAPE clause cannot be included there. In some obscure cases it may be necessary to use the underlying operator names instead.

Also see the starts-with operator ^@ and the corresponding starts_with() function, which are useful in cases where simply matching the beginning of a string is needed.

9.7.2. SIMILAR TO Regular Expressions

string SIMILAR TO pattern [ESCAPE escape-character]
string NOT SIMILAR TO pattern [ESCAPE escape-character]

The SIMILAR TO operator returns true or false depending on whether its pattern matches the given string. It is similar to LIKE, except that it interprets the pattern using the SQL standard’s definition of a regular expression. SQL regular expressions are a curious cross between LIKE notation and common (POSIX) regular expression notation.

Like LIKE, the SIMILAR TO operator succeeds only if its pattern matches the entire string; this is unlike common regular expression behavior where the pattern can match any part of the string. Also like LIKE, SIMILAR TO uses _ and % as wildcard characters denoting any single character and any string, respectively (these are comparable to . and .* in POSIX regular expressions).

In addition to these facilities borrowed from LIKE, SIMILAR TO supports these pattern-matching metacharacters borrowed from POSIX regular expressions:

  • | denotes alternation (either of two alternatives).

  • * denotes repetition of the previous item zero or more times.

  • + denotes repetition of the previous item one or more times.

  • ? denotes repetition of the previous item zero or one time.

  • {m} denotes repetition of the previous item exactly m times.

  • {m,} denotes repetition of the previous item m or more times.

  • {m,n} denotes repetition of the previous item at least m and not more than n times.

  • Parentheses () can be used to group items into a single logical item.

  • A bracket expression [...] specifies a character class, just as in POSIX regular expressions.

Notice that the period (.) is not a metacharacter for SIMILAR TO.

As with LIKE, a backslash disables the special meaning of any of these metacharacters. A different escape character can be specified with ESCAPE, or the escape capability can be disabled by writing ESCAPE ''.

According to the SQL standard, omitting ESCAPE means there is no escape character (rather than defaulting to a backslash), and a zero-length ESCAPE value is disallowed. PostgreSQL‘s behavior in this regard is therefore slightly nonstandard.

Another nonstandard extension is that following the escape character with a letter or digit provides access to the escape sequences defined for POSIX regular expressions; see Table 9.20, Table 9.21, and Table 9.22 below.

Some examples:

'abc' SIMILAR TO 'abc'          true
'abc' SIMILAR TO 'a'            false
'abc' SIMILAR TO '%(b|d)%'      true
'abc' SIMILAR TO '(b|c)%'       false
'-abc-' SIMILAR TO '%mabcM%'  true
'xabcy' SIMILAR TO '%mabcM%'  false

The substring function with three parameters provides extraction of a substring that matches an SQL regular expression pattern. The function can be written according to standard SQL syntax:

substring(string similar pattern escape escape-character)

or using the now obsolete SQL:1999 syntax:

substring(string from pattern for escape-character)

or as a plain three-argument function:

substring(string, pattern, escape-character)

As with SIMILAR TO, the specified pattern must match the entire data string, or else the function fails and returns null. To indicate the part of the pattern for which the matching data sub-string is of interest, the pattern should contain two occurrences of the escape character followed by a double quote ("). The text matching the portion of the pattern between these separators is returned when the match is successful.

The escape-double-quote separators actually divide substring‘s pattern into three independent regular expressions; for example, a vertical bar (|) in any of the three sections affects only that section. Also, the first and third of these regular expressions are defined to match the smallest possible amount of text, not the largest, when there is any ambiguity about how much of the data string matches which pattern. (In POSIX parlance, the first and third regular expressions are forced to be non-greedy.)

As an extension to the SQL standard, PostgreSQL allows there to be just one escape-double-quote separator, in which case the third regular expression is taken as empty; or no separators, in which case the first and third regular expressions are taken as empty.

Some examples, with #" delimiting the return string:

substring('foobar' similar '%#"o_b#"%' escape '#')   oob
substring('foobar' similar '#"o_b#"%' escape '#')    NULL

9.7.3. POSIX Regular Expressions

Table 9.16 lists the available operators for pattern matching using POSIX regular expressions.

Table 9.16. Regular Expression Match Operators

Operator

Description

Example(s)

text ~ textboolean

String matches regular expression, case sensitively

'thomas' ~ 't.*ma't

text ~* textboolean

String matches regular expression, case insensitively

'thomas' ~* 'T.*ma't

text !~ textboolean

String does not match regular expression, case sensitively

'thomas' !~ 't.*max't

text !~* textboolean

String does not match regular expression, case insensitively

'thomas' !~* 'T.*ma'f

POSIX regular expressions provide a more powerful means for pattern matching than the LIKE and SIMILAR TO operators. Many Unix tools such as egrep, sed, or awk use a pattern matching language that is similar to the one described here.

A regular expression is a character sequence that is an abbreviated definition of a set of strings (a regular set). A string is said to match a regular expression if it is a member of the regular set described by the regular expression. As with LIKE, pattern characters match string characters exactly unless they are special characters in the regular expression language — but regular expressions use different special characters than LIKE does. Unlike LIKE patterns, a regular expression is allowed to match anywhere within a string, unless the regular expression is explicitly anchored to the beginning or end of the string.

Some examples:

'abcd' ~ 'bc'     true
'abcd' ~ 'a.c'    true — dot matches any character
'abcd' ~ 'a.*d'   true — * repeats the preceding pattern item
'abcd' ~ '(b|x)'  true — | means OR, parentheses group
'abcd' ~ '^a'     true — ^ anchors to start of string
'abcd' ~ '^(b|c)' false — would match except for anchoring

The POSIX pattern language is described in much greater detail below.

The substring function with two parameters, substring(string from pattern), provides extraction of a substring that matches a POSIX regular expression pattern. It returns null if there is no match, otherwise the first portion of the text that matched the pattern. But if the pattern contains any parentheses, the portion of the text that matched the first parenthesized subexpression (the one whose left parenthesis comes first) is returned. You can put parentheses around the whole expression if you want to use parentheses within it without triggering this exception. If you need parentheses in the pattern before the subexpression you want to extract, see the non-capturing parentheses described below.

Some examples:

substring('foobar' from 'o.b')     oob
substring('foobar' from 'o(.)b')   o

The regexp_count function counts the number of places where a POSIX regular expression pattern matches a string. It has the syntax regexp_count(string, pattern [, start [, flags ]]). pattern is searched for in string, normally from the beginning of the string, but if the start parameter is provided then beginning from that character index. The flags parameter is an optional text string containing zero or more single-letter flags that change the function’s behavior. For example, including i in flags specifies case-insensitive matching. Supported flags are described in Table 9.24.

Some examples:

regexp_count('ABCABCAXYaxy', 'A.')          3
regexp_count('ABCABCAXYaxy', 'A.', 1, 'i')  4

The regexp_instr function returns the starting or ending position of the N‘th match of a POSIX regular expression pattern to a string, or zero if there is no such match. It has the syntax regexp_instr(string, pattern [, start [, N [, endoption [, flags [, subexpr ]]]]]). pattern is searched for in string, normally from the beginning of the string, but if the start parameter is provided then beginning from that character index. If N is specified then the N‘th match of the pattern is located, otherwise the first match is located. If the endoption parameter is omitted or specified as zero, the function returns the position of the first character of the match. Otherwise, endoption must be one, and the function returns the position of the character following the match. The flags parameter is an optional text string containing zero or more single-letter flags that change the function’s behavior. Supported flags are described in Table 9.24. For a pattern containing parenthesized subexpressions, subexpr is an integer indicating which subexpression is of interest: the result identifies the position of the substring matching that subexpression. Subexpressions are numbered in the order of their leading parentheses. When subexpr is omitted or zero, the result identifies the position of the whole match regardless of parenthesized subexpressions.

Some examples:

regexp_instr('number of your street, town zip, FR', '[^,]+', 1, 2)
                                   23
regexp_instr('ABCDEFGHI', '(c..)(...)', 1, 1, 0, 'i', 2)
                                   6

The regexp_like function checks whether a match of a POSIX regular expression pattern occurs within a string, returning boolean true or false. It has the syntax regexp_like(string, pattern [, flags ]). The flags parameter is an optional text string containing zero or more single-letter flags that change the function’s behavior. Supported flags are described in Table 9.24. This function has the same results as the ~ operator if no flags are specified. If only the i flag is specified, it has the same results as the ~* operator.

Some examples:

regexp_like('Hello World', 'world')       false
regexp_like('Hello World', 'world', 'i')  true

The regexp_match function returns a text array of matching substring(s) within the first match of a POSIX regular expression pattern to a string. It has the syntax regexp_match(string, pattern [, flags ]). If there is no match, the result is NULL. If a match is found, and the pattern contains no parenthesized subexpressions, then the result is a single-element text array containing the substring matching the whole pattern. If a match is found, and the pattern contains parenthesized subexpressions, then the result is a text array whose n‘th element is the substring matching the n‘th parenthesized subexpression of the pattern (not counting non-capturing parentheses; see below for details). The flags parameter is an optional text string containing zero or more single-letter flags that change the function’s behavior. Supported flags are described in Table 9.24.

Some examples:

SELECT regexp_match('foobarbequebaz', 'bar.*que');
 regexp_match
--------------
 {barbeque}
(1 row)

SELECT regexp_match('foobarbequebaz', '(bar)(beque)');
 regexp_match
--------------
 {bar,beque}
(1 row)

Tip

In the common case where you just want the whole matching substring or NULL for no match, the best solution is to use regexp_substr(). However, regexp_substr() only exists in PostgreSQL version 15 and up. When working in older versions, you can extract the first element of regexp_match()‘s result, for example:

SELECT (regexp_match('foobarbequebaz', 'bar.*que'))[1];
 regexp_match
--------------
 barbeque
(1 row)

The regexp_matches function returns a set of text arrays of matching substring(s) within matches of a POSIX regular expression pattern to a string. It has the same syntax as regexp_match. This function returns no rows if there is no match, one row if there is a match and the g flag is not given, or N rows if there are N matches and the g flag is given. Each returned row is a text array containing the whole matched substring or the substrings matching parenthesized subexpressions of the pattern, just as described above for regexp_match. regexp_matches accepts all the flags shown in Table 9.24, plus the g flag which commands it to return all matches, not just the first one.

Some examples:

SELECT regexp_matches('foo', 'not there');
 regexp_matches
----------------
(0 rows)

SELECT regexp_matches('foobarbequebazilbarfbonk', '(b[^b]+)(b[^b]+)', 'g');
 regexp_matches
----------------
 {bar,beque}
 {bazil,barf}
(2 rows)

Tip

In most cases regexp_matches() should be used with the g flag, since if you only want the first match, it’s easier and more efficient to use regexp_match(). However, regexp_match() only exists in PostgreSQL version 10 and up. When working in older versions, a common trick is to place a regexp_matches() call in a sub-select, for example:

SELECT col1, (SELECT regexp_matches(col2, '(bar)(beque)')) FROM tab;

This produces a text array if there’s a match, or NULL if not, the same as regexp_match() would do. Without the sub-select, this query would produce no output at all for table rows without a match, which is typically not the desired behavior.

The regexp_replace function provides substitution of new text for substrings that match POSIX regular expression patterns. It has the syntax regexp_replace(source, pattern, replacement [, start [, N ]] [, flags ]). (Notice that N cannot be specified unless start is, but flags can be given in any case.) The source string is returned unchanged if there is no match to the pattern. If there is a match, the source string is returned with the replacement string substituted for the matching substring. The replacement string can contain n, where n is 1 through 9, to indicate that the source substring matching the n‘th parenthesized subexpression of the pattern should be inserted, and it can contain & to indicate that the substring matching the entire pattern should be inserted. Write \ if you need to put a literal backslash in the replacement text. pattern is searched for in string, normally from the beginning of the string, but if the start parameter is provided then beginning from that character index. By default, only the first match of the pattern is replaced. If N is specified and is greater than zero, then the N‘th match of the pattern is replaced. If the g flag is given, or if N is specified and is zero, then all matches at or after the start position are replaced. (The g flag is ignored when N is specified.) The flags parameter is an optional text string containing zero or more single-letter flags that change the function’s behavior. Supported flags (though not g) are described in Table 9.24.

Some examples:

regexp_replace('foobarbaz', 'b..', 'X')
                                   fooXbaz
regexp_replace('foobarbaz', 'b..', 'X', 'g')
                                   fooXX
regexp_replace('foobarbaz', 'b(..)', 'X1Y', 'g')
                                   fooXarYXazY
regexp_replace('A PostgreSQL function', 'a|e|i|o|u', 'X', 1, 0, 'i')
                                   X PXstgrXSQL fXnctXXn
regexp_replace('A PostgreSQL function', 'a|e|i|o|u', 'X', 1, 3, 'i')
                                   A PostgrXSQL function

The regexp_split_to_table function splits a string using a POSIX regular expression pattern as a delimiter. It has the syntax regexp_split_to_table(string, pattern [, flags ]). If there is no match to the pattern, the function returns the string. If there is at least one match, for each match it returns the text from the end of the last match (or the beginning of the string) to the beginning of the match. When there are no more matches, it returns the text from the end of the last match to the end of the string. The flags parameter is an optional text string containing zero or more single-letter flags that change the function’s behavior. regexp_split_to_table supports the flags described in Table 9.24.

The regexp_split_to_array function behaves the same as regexp_split_to_table, except that regexp_split_to_array returns its result as an array of text. It has the syntax regexp_split_to_array(string, pattern [, flags ]). The parameters are the same as for regexp_split_to_table.

Some examples:

SELECT foo FROM regexp_split_to_table('the quick brown fox jumps over the lazy dog', 's+') AS foo;
  foo
-------
 the
 quick
 brown
 fox
 jumps
 over
 the
 lazy
 dog
(9 rows)

SELECT regexp_split_to_array('the quick brown fox jumps over the lazy dog', 's+');
              regexp_split_to_array
-----------------------------------------------
 {the,quick,brown,fox,jumps,over,the,lazy,dog}
(1 row)

SELECT foo FROM regexp_split_to_table('the quick brown fox', 's*') AS foo;
 foo
-----
 t
 h
 e
 q
 u
 i
 c
 k
 b
 r
 o
 w
 n
 f
 o
 x
(16 rows)

As the last example demonstrates, the regexp split functions ignore zero-length matches that occur at the start or end of the string or immediately after a previous match. This is contrary to the strict definition of regexp matching that is implemented by the other regexp functions, but is usually the most convenient behavior in practice. Other software systems such as Perl use similar definitions.

The regexp_substr function returns the substring that matches a POSIX regular expression pattern, or NULL if there is no match. It has the syntax regexp_substr(string, pattern [, start [, N [, flags [, subexpr ]]]]). pattern is searched for in string, normally from the beginning of the string, but if the start parameter is provided then beginning from that character index. If N is specified then the N‘th match of the pattern is returned, otherwise the first match is returned. The flags parameter is an optional text string containing zero or more single-letter flags that change the function’s behavior. Supported flags are described in Table 9.24. For a pattern containing parenthesized subexpressions, subexpr is an integer indicating which subexpression is of interest: the result is the substring matching that subexpression. Subexpressions are numbered in the order of their leading parentheses. When subexpr is omitted or zero, the result is the whole match regardless of parenthesized subexpressions.

Some examples:

regexp_substr('number of your street, town zip, FR', '[^,]+', 1, 2)
                                    town zip
regexp_substr('ABCDEFGHI', '(c..)(...)', 1, 1, 'i', 2)
                                   FGH

9.7.3.1. Regular Expression Details

PostgreSQL‘s regular expressions are implemented using a software package written by Henry Spencer. Much of the description of regular expressions below is copied verbatim from his manual.

Regular expressions (REs), as defined in POSIX 1003.2, come in two forms: extended REs or EREs (roughly those of egrep), and basic REs or BREs (roughly those of ed). PostgreSQL supports both forms, and also implements some extensions that are not in the POSIX standard, but have become widely used due to their availability in programming languages such as Perl and Tcl. REs using these non-POSIX extensions are called advanced REs or AREs in this documentation. AREs are almost an exact superset of EREs, but BREs have several notational incompatibilities (as well as being much more limited). We first describe the ARE and ERE forms, noting features that apply only to AREs, and then describe how BREs differ.

Note

PostgreSQL always initially presumes that a regular expression follows the ARE rules. However, the more limited ERE or BRE rules can be chosen by prepending an embedded option to the RE pattern, as described in Section 9.7.3.4. This can be useful for compatibility with applications that expect exactly the POSIX 1003.2 rules.

A regular expression is defined as one or more branches, separated by |. It matches anything that matches one of the branches.

A branch is zero or more quantified atoms or constraints, concatenated. It matches a match for the first, followed by a match for the second, etc.; an empty branch matches the empty string.

A quantified atom is an atom possibly followed by a single quantifier. Without a quantifier, it matches a match for the atom. With a quantifier, it can match some number of matches of the atom. An atom can be any of the possibilities shown in Table 9.17. The possible quantifiers and their meanings are shown in Table 9.18.

A constraint matches an empty string, but matches only when specific conditions are met. A constraint can be used where an atom could be used, except it cannot be followed by a quantifier. The simple constraints are shown in Table 9.19; some more constraints are described later.

Table 9.17. Regular Expression Atoms

Atom Description
(re) (where re is any regular expression) matches a match for re, with the match noted for possible reporting
(?:re) as above, but the match is not noted for reporting (a non-capturing set of parentheses) (AREs only)
. matches any single character
[chars] a bracket expression, matching any one of the chars (see Section 9.7.3.2 for more detail)
k (where k is a non-alphanumeric character) matches that character taken as an ordinary character, e.g., \ matches a backslash character
c where c is alphanumeric (possibly followed by other characters) is an escape, see Section 9.7.3.3 (AREs only; in EREs and BREs, this matches c)
{ when followed by a character other than a digit, matches the left-brace character {; when followed by a digit, it is the beginning of a bound (see below)
x where x is a single character with no other significance, matches that character

An RE cannot end with a backslash ().

Table 9.18. Regular Expression Quantifiers

Quantifier Matches
* a sequence of 0 or more matches of the atom
+ a sequence of 1 or more matches of the atom
? a sequence of 0 or 1 matches of the atom
{m} a sequence of exactly m matches of the atom
{m,} a sequence of m or more matches of the atom
{m,n} a sequence of m through n (inclusive) matches of the atom; m cannot exceed n
*? non-greedy version of *
+? non-greedy version of +
?? non-greedy version of ?
{m}? non-greedy version of {m}
{m,}? non-greedy version of {m,}
{m,n}? non-greedy version of {m,n}

The forms using {...} are known as bounds. The numbers m and n within a bound are unsigned decimal integers with permissible values from 0 to 255 inclusive.

Non-greedy quantifiers (available in AREs only) match the same possibilities as their corresponding normal (greedy) counterparts, but prefer the smallest number rather than the largest number of matches. See Section 9.7.3.5 for more detail.

Note

A quantifier cannot immediately follow another quantifier, e.g., ** is invalid. A quantifier cannot begin an expression or subexpression or follow ^ or |.

Table 9.19. Regular Expression Constraints

Constraint Description
^ matches at the beginning of the string
$ matches at the end of the string
(?=re) positive lookahead matches at any point where a substring matching re begins (AREs only)
(?!re) negative lookahead matches at any point where no substring matching re begins (AREs only)
(?<=re) positive lookbehind matches at any point where a substring matching re ends (AREs only)
(?<!re) negative lookbehind matches at any point where no substring matching re ends (AREs only)

Lookahead and lookbehind constraints cannot contain back references (see Section 9.7.3.3), and all parentheses within them are considered non-capturing.

9.7.3.2. Bracket Expressions

A bracket expression is a list of characters enclosed in []. It normally matches any single character from the list (but see below). If the list begins with ^, it matches any single character not from the rest of the list. If two characters in the list are separated by -, this is shorthand for the full range of characters between those two (inclusive) in the collating sequence, e.g., [0-9] in ASCII matches any decimal digit. It is illegal for two ranges to share an endpoint, e.g., a-c-e. Ranges are very collating-sequence-dependent, so portable programs should avoid relying on them.

To include a literal ] in the list, make it the first character (after ^, if that is used). To include a literal -, make it the first or last character, or the second endpoint of a range. To use a literal - as the first endpoint of a range, enclose it in [. and .] to make it a collating element (see below). With the exception of these characters, some combinations using [ (see next paragraphs), and escapes (AREs only), all other special characters lose their special significance within a bracket expression. In particular, is not special when following ERE or BRE rules, though it is special (as introducing an escape) in AREs.

Within a bracket expression, a collating element (a character, a multiple-character sequence that collates as if it were a single character, or a collating-sequence name for either) enclosed in [. and .] stands for the sequence of characters of that collating element. The sequence is treated as a single element of the bracket expression’s list. This allows a bracket expression containing a multiple-character collating element to match more than one character, e.g., if the collating sequence includes a ch collating element, then the RE [[.ch.]]*c matches the first five characters of chchcc.

Note

PostgreSQL currently does not support multi-character collating elements. This information describes possible future behavior.

Within a bracket expression, a collating element enclosed in [= and =] is an equivalence class, standing for the sequences of characters of all collating elements equivalent to that one, including itself. (If there are no other equivalent collating elements, the treatment is as if the enclosing delimiters were [. and .].) For example, if o and ^ are the members of an equivalence class, then [[=o=]], [[=^=]], and [o^] are all synonymous. An equivalence class cannot be an endpoint of a range.

Within a bracket expression, the name of a character class enclosed in [: and :] stands for the list of all characters belonging to that class. A character class cannot be used as an endpoint of a range. The POSIX standard defines these character class names: alnum (letters and numeric digits), alpha (letters), blank (space and tab), cntrl (control characters), digit (numeric digits), graph (printable characters except space), lower (lower-case letters), print (printable characters including space), punct (punctuation), space (any white space), upper (upper-case letters), and xdigit (hexadecimal digits). The behavior of these standard character classes is generally consistent across platforms for characters in the 7-bit ASCII set. Whether a given non-ASCII character is considered to belong to one of these classes depends on the collation that is used for the regular-expression function or operator (see Section 24.2), or by default on the database’s LC_CTYPE locale setting (see Section 24.1). The classification of non-ASCII characters can vary across platforms even in similarly-named locales. (But the C locale never considers any non-ASCII characters to belong to any of these classes.) In addition to these standard character classes, PostgreSQL defines the word character class, which is the same as alnum plus the underscore (_) character, and the ascii character class, which contains exactly the 7-bit ASCII set.

There are two special cases of bracket expressions: the bracket expressions [[:<:]] and [[:>:]] are constraints, matching empty strings at the beginning and end of a word respectively. A word is defined as a sequence of word characters that is neither preceded nor followed by word characters. A word character is any character belonging to the word character class, that is, any letter, digit, or underscore. This is an extension, compatible with but not specified by POSIX 1003.2, and should be used with caution in software intended to be portable to other systems. The constraint escapes described below are usually preferable; they are no more standard, but are easier to type.

9.7.3.3. Regular Expression Escapes

Escapes are special sequences beginning with followed by an alphanumeric character. Escapes come in several varieties: character entry, class shorthands, constraint escapes, and back references. A followed by an alphanumeric character but not constituting a valid escape is illegal in AREs. In EREs, there are no escapes: outside a bracket expression, a followed by an alphanumeric character merely stands for that character as an ordinary character, and inside a bracket expression, is an ordinary character. (The latter is the one actual incompatibility between EREs and AREs.)

Character-entry escapes exist to make it easier to specify non-printing and other inconvenient characters in REs. They are shown in Table 9.20.

Class-shorthand escapes provide shorthands for certain commonly-used character classes. They are shown in Table 9.21.

A constraint escape is a constraint, matching the empty string if specific conditions are met, written as an escape. They are shown in Table 9.22.

A back reference (n) matches the same string matched by the previous parenthesized subexpression specified by the number n (see Table 9.23). For example, ([bc])1 matches bb or cc but not bc or cb. The subexpression must entirely precede the back reference in the RE. Subexpressions are numbered in the order of their leading parentheses. Non-capturing parentheses do not define subexpressions. The back reference considers only the string characters matched by the referenced subexpression, not any constraints contained in it. For example, (^d)1 will match 22.

Table 9.20. Regular Expression Character-Entry Escapes

Escape Description
a alert (bell) character, as in C
b backspace, as in C
B synonym for backslash () to help reduce the need for backslash doubling
cX (where X is any character) the character whose low-order 5 bits are the same as those of X, and whose other bits are all zero
e the character whose collating-sequence name is ESC, or failing that, the character with octal value 033
f form feed, as in C
n newline, as in C
r carriage return, as in C
t horizontal tab, as in C
uwxyz (where wxyz is exactly four hexadecimal digits) the character whose hexadecimal value is 0xwxyz
Ustuvwxyz (where stuvwxyz is exactly eight hexadecimal digits) the character whose hexadecimal value is 0xstuvwxyz
v vertical tab, as in C
xhhh (where hhh is any sequence of hexadecimal digits) the character whose hexadecimal value is 0xhhh (a single character no matter how many hexadecimal digits are used)
the character whose value is 0 (the null byte)
xy (where xy is exactly two octal digits, and is not a back reference) the character whose octal value is 0xy
xyz (where xyz is exactly three octal digits, and is not a back reference) the character whose octal value is 0xyz

Hexadecimal digits are 09, af, and AF. Octal digits are 07.

Numeric character-entry escapes specifying values outside the ASCII range (0–127) have meanings dependent on the database encoding. When the encoding is UTF-8, escape values are equivalent to Unicode code points, for example u1234 means the character U+1234. For other multibyte encodings, character-entry escapes usually just specify the concatenation of the byte values for the character. If the escape value does not correspond to any legal character in the database encoding, no error will be raised, but it will never match any data.

The character-entry escapes are always taken as ordinary characters. For example, 135 is ] in ASCII, but 135 does not terminate a bracket expression.

Table 9.21. Regular Expression Class-Shorthand Escapes

Escape Description
d matches any digit, like [[:digit:]]
s matches any whitespace character, like [[:space:]]
w matches any word character, like [[:word:]]
D matches any non-digit, like [^[:digit:]]
S matches any non-whitespace character, like [^[:space:]]
W matches any non-word character, like [^[:word:]]

The class-shorthand escapes also work within bracket expressions, although the definitions shown above are not quite syntactically valid in that context. For example, [a-cd] is equivalent to [a-c[:digit:]].

Table 9.22. Regular Expression Constraint Escapes

Escape Description
A matches only at the beginning of the string (see Section 9.7.3.5 for how this differs from ^)
m matches only at the beginning of a word
M matches only at the end of a word
y matches only at the beginning or end of a word
Y matches only at a point that is not the beginning or end of a word
Z matches only at the end of the string (see Section 9.7.3.5 for how this differs from $)

A word is defined as in the specification of [[:<:]] and [[:>:]] above. Constraint escapes are illegal within bracket expressions.

Table 9.23. Regular Expression Back References

Escape Description
m (where m is a nonzero digit) a back reference to the m‘th subexpression
mnn (where m is a nonzero digit, and nn is some more digits, and the decimal value mnn is not greater than the number of closing capturing parentheses seen so far) a back reference to the mnn‘th subexpression

Note

There is an inherent ambiguity between octal character-entry escapes and back references, which is resolved by the following heuristics, as hinted at above. A leading zero always indicates an octal escape. A single non-zero digit, not followed by another digit, is always taken as a back reference. A multi-digit sequence not starting with a zero is taken as a back reference if it comes after a suitable subexpression (i.e., the number is in the legal range for a back reference), and otherwise is taken as octal.

9.7.3.5. Regular Expression Matching Rules

In the event that an RE could match more than one substring of a given string, the RE matches the one starting earliest in the string. If the RE could match more than one substring starting at that point, either the longest possible match or the shortest possible match will be taken, depending on whether the RE is greedy or non-greedy.

Whether an RE is greedy or not is determined by the following rules:

  • Most atoms, and all constraints, have no greediness attribute (because they cannot match variable amounts of text anyway).

  • Adding parentheses around an RE does not change its greediness.

  • A quantified atom with a fixed-repetition quantifier ({m} or {m}?) has the same greediness (possibly none) as the atom itself.

  • A quantified atom with other normal quantifiers (including {m,n} with m equal to n) is greedy (prefers longest match).

  • A quantified atom with a non-greedy quantifier (including {m,n}? with m equal to n) is non-greedy (prefers shortest match).

  • A branch — that is, an RE that has no top-level | operator — has the same greediness as the first quantified atom in it that has a greediness attribute.

  • An RE consisting of two or more branches connected by the | operator is always greedy.

The above rules associate greediness attributes not only with individual quantified atoms, but with branches and entire REs that contain quantified atoms. What that means is that the matching is done in such a way that the branch, or whole RE, matches the longest or shortest possible substring as a whole. Once the length of the entire match is determined, the part of it that matches any particular subexpression is determined on the basis of the greediness attribute of that subexpression, with subexpressions starting earlier in the RE taking priority over ones starting later.

An example of what this means:

SELECT SUBSTRING('XY1234Z', 'Y*([0-9]{1,3})');
Result: 123
SELECT SUBSTRING('XY1234Z', 'Y*?([0-9]{1,3})');
Result: 1

In the first case, the RE as a whole is greedy because Y* is greedy. It can match beginning at the Y, and it matches the longest possible string starting there, i.e., Y123. The output is the parenthesized part of that, or 123. In the second case, the RE as a whole is non-greedy because Y*? is non-greedy. It can match beginning at the Y, and it matches the shortest possible string starting there, i.e., Y1. The subexpression [0-9]{1,3} is greedy but it cannot change the decision as to the overall match length; so it is forced to match just 1.

In short, when an RE contains both greedy and non-greedy subexpressions, the total match length is either as long as possible or as short as possible, according to the attribute assigned to the whole RE. The attributes assigned to the subexpressions only affect how much of that match they are allowed to eat relative to each other.

The quantifiers {1,1} and {1,1}? can be used to force greediness or non-greediness, respectively, on a subexpression or a whole RE. This is useful when you need the whole RE to have a greediness attribute different from what’s deduced from its elements. As an example, suppose that we are trying to separate a string containing some digits into the digits and the parts before and after them. We might try to do that like this:

SELECT regexp_match('abc01234xyz', '(.*)(d+)(.*)');
Result: {abc0123,4,xyz}

That didn’t work: the first .* is greedy so it eats as much as it can, leaving the d+ to match at the last possible place, the last digit. We might try to fix that by making it non-greedy:

SELECT regexp_match('abc01234xyz', '(.*?)(d+)(.*)');
Result: {abc,0,""}

That didn’t work either, because now the RE as a whole is non-greedy and so it ends the overall match as soon as possible. We can get what we want by forcing the RE as a whole to be greedy:

SELECT regexp_match('abc01234xyz', '(?:(.*?)(d+)(.*)){1,1}');
Result: {abc,01234,xyz}

Controlling the RE’s overall greediness separately from its components’ greediness allows great flexibility in handling variable-length patterns.

When deciding what is a longer or shorter match, match lengths are measured in characters, not collating elements. An empty string is considered longer than no match at all. For example: bb* matches the three middle characters of abbbc; (week|wee)(night|knights) matches all ten characters of weeknights; when (.*).* is matched against abc the parenthesized subexpression matches all three characters; and when (a*)* is matched against bc both the whole RE and the parenthesized subexpression match an empty string.

If case-independent matching is specified, the effect is much as if all case distinctions had vanished from the alphabet. When an alphabetic that exists in multiple cases appears as an ordinary character outside a bracket expression, it is effectively transformed into a bracket expression containing both cases, e.g., x becomes [xX]. When it appears inside a bracket expression, all case counterparts of it are added to the bracket expression, e.g., [x] becomes [xX] and [^x] becomes [^xX].

If newline-sensitive matching is specified, . and bracket expressions using ^ will never match the newline character (so that matches will not cross lines unless the RE explicitly includes a newline) and ^ and $ will match the empty string after and before a newline respectively, in addition to matching at beginning and end of string respectively. But the ARE escapes A and Z continue to match beginning or end of string only. Also, the character class shorthands D and W will match a newline regardless of this mode. (Before PostgreSQL 14, they did not match newlines when in newline-sensitive mode. Write [^[:digit:]] or [^[:word:]] to get the old behavior.)

If partial newline-sensitive matching is specified, this affects . and bracket expressions as with newline-sensitive matching, but not ^ and $.

If inverse partial newline-sensitive matching is specified, this affects ^ and $ as with newline-sensitive matching, but not . and bracket expressions. This isn’t very useful but is provided for symmetry.

9.7.3.6. Limits and Compatibility

No particular limit is imposed on the length of REs in this implementation. However, programs intended to be highly portable should not employ REs longer than 256 bytes, as a POSIX-compliant implementation can refuse to accept such REs.

The only feature of AREs that is actually incompatible with POSIX EREs is that does not lose its special significance inside bracket expressions. All other ARE features use syntax which is illegal or has undefined or unspecified effects in POSIX EREs; the *** syntax of directors likewise is outside the POSIX syntax for both BREs and EREs.

Many of the ARE extensions are borrowed from Perl, but some have been changed to clean them up, and a few Perl extensions are not present. Incompatibilities of note include b, B, the lack of special treatment for a trailing newline, the addition of complemented bracket expressions to the things affected by newline-sensitive matching, the restrictions on parentheses and back references in lookahead/lookbehind constraints, and the longest/shortest-match (rather than first-match) matching semantics.

9.7.3.7. Basic Regular Expressions

BREs differ from EREs in several respects. In BREs, |, +, and ? are ordinary characters and there is no equivalent for their functionality. The delimiters for bounds are { and }, with { and } by themselves ordinary characters. The parentheses for nested subexpressions are ( and ), with ( and ) by themselves ordinary characters. ^ is an ordinary character except at the beginning of the RE or the beginning of a parenthesized subexpression, $ is an ordinary character except at the end of the RE or the end of a parenthesized subexpression, and * is an ordinary character if it appears at the beginning of the RE or the beginning of a parenthesized subexpression (after a possible leading ^). Finally, single-digit back references are available, and < and > are synonyms for [[:<:]] and [[:>:]] respectively; no other escapes are available in BREs.

9.7.3.8. Differences from SQL Standard and XQuery

Since SQL:2008, the SQL standard includes regular expression operators and functions that performs pattern matching according to the XQuery regular expression standard:

  • LIKE_REGEX

  • OCCURRENCES_REGEX

  • POSITION_REGEX

  • SUBSTRING_REGEX

  • TRANSLATE_REGEX

PostgreSQL does not currently implement these operators and functions. You can get approximately equivalent functionality in each case as shown in Table 9.25. (Various optional clauses on both sides have been omitted in this table.)

Table 9.25. Regular Expression Functions Equivalencies

SQL standard PostgreSQL
string LIKE_REGEX pattern regexp_like(string, pattern) or string ~ pattern
OCCURRENCES_REGEX(pattern IN string) regexp_count(string, pattern)
POSITION_REGEX(pattern IN string) regexp_instr(string, pattern)
SUBSTRING_REGEX(pattern IN string) regexp_substr(string, pattern)
TRANSLATE_REGEX(pattern IN string WITH replacement) regexp_replace(string, pattern, replacement)

Regular expression functions similar to those provided by PostgreSQL are also available in a number of other SQL implementations, whereas the SQL-standard functions are not as widely implemented. Some of the details of the regular expression syntax will likely differ in each implementation.

The SQL-standard operators and functions use XQuery regular expressions, which are quite close to the ARE syntax described above. Notable differences between the existing POSIX-based regular-expression feature and XQuery regular expressions include:

  • XQuery character class subtraction is not supported. An example of this feature is using the following to match only English consonants: [a-z-[aeiou]].

  • XQuery character class shorthands c, C, i, and I are not supported.

  • XQuery character class elements using p{UnicodeProperty} or the inverse P{UnicodeProperty} are not supported.

  • POSIX interprets character classes such as w (see Table 9.21) according to the prevailing locale (which you can control by attaching a COLLATE clause to the operator or function). XQuery specifies these classes by reference to Unicode character properties, so equivalent behavior is obtained only with a locale that follows the Unicode rules.

  • The SQL standard (not XQuery itself) attempts to cater for more variants of newline than POSIX does. The newline-sensitive matching options described above consider only ASCII NL (n) to be a newline, but SQL would have us treat CR (r), CRLF (rn) (a Windows-style newline), and some Unicode-only characters like LINE SEPARATOR (U+2028) as newlines as well. Notably, . and s should count rn as one character not two according to SQL.

  • Of the character-entry escapes described in Table 9.20, XQuery supports only n, r, and t.

  • XQuery does not support the [:name:] syntax for character classes within bracket expressions.

  • XQuery does not have lookahead or lookbehind constraints, nor any of the constraint escapes described in Table 9.22.

  • The metasyntax forms described in Section 9.7.3.4 do not exist in XQuery.

  • The regular expression flag letters defined by XQuery are related to but not the same as the option letters for POSIX (Table 9.24). While the i and q options behave the same, others do not:

    • XQuery’s s (allow dot to match newline) and m (allow ^ and $ to match at newlines) flags provide access to the same behaviors as POSIX’s n, p and w flags, but they do not match the behavior of POSIX’s s and m flags. Note in particular that dot-matches-newline is the default behavior in POSIX but not XQuery.

    • XQuery’s x (ignore whitespace in pattern) flag is noticeably different from POSIX’s expanded-mode flag. POSIX’s x flag also allows # to begin a comment in the pattern, and POSIX will not ignore a whitespace character after a backslash.

text ^@ textboolean

Returns true if the first string starts with the second string (equivalent to the starts_with() function).

'alphabet' ^@ 'alph't

ascii ( text ) → integer

Returns the numeric code of the first character of the argument. In UTF8 encoding, returns the Unicode code point of the character. In other multibyte encodings, the argument must be an ASCII character.

ascii('x')120

btrim ( string text [, characters text ] ) → text

Removes the longest string containing only characters in characters (a space by default) from the start and end of string.

btrim('xyxtrimyyx', 'xyz')trim

chr ( integer ) → text

Returns the character with the given code. In UTF8 encoding the argument is treated as a Unicode code point. In other multibyte encodings the argument must designate an ASCII character. chr(0) is disallowed because text data types cannot store that character.

chr(65)A

concat ( val1 "any" [, val2 "any" [, …] ] ) → text

Concatenates the text representations of all the arguments. NULL arguments are ignored.

concat('abcde', 2, NULL, 22)abcde222

concat_ws ( sep text, val1 "any" [, val2 "any" [, …] ] ) → text

Concatenates all but the first argument, with separators. The first argument is used as the separator string, and should not be NULL. Other NULL arguments are ignored.

concat_ws(',', 'abcde', 2, NULL, 22)abcde,2,22

format ( formatstr text [, formatarg "any" [, …] ] ) → text

Formats arguments according to a format string; see Section 9.4.1. This function is similar to the C function sprintf.

format('Hello %s, %1$s', 'World')Hello World, World

initcap ( text ) → text

Converts the first letter of each word to upper case and the rest to lower case. Words are sequences of alphanumeric characters separated by non-alphanumeric characters.

initcap('hi THOMAS')Hi Thomas

left ( string text, n integer ) → text

Returns first n characters in the string, or when n is negative, returns all but last |n| characters.

left('abcde', 2)ab

length ( text ) → integer

Returns the number of characters in the string.

length('jose')4

lpad ( string text, length integer [, fill text ] ) → text

Extends the string to length length by prepending the characters fill (a space by default). If the string is already longer than length then it is truncated (on the right).

lpad('hi', 5, 'xy')xyxhi

ltrim ( string text [, characters text ] ) → text

Removes the longest string containing only characters in characters (a space by default) from the start of string.

ltrim('zzzytest', 'xyz')test

md5 ( text ) → text

Computes the MD5 hash of the argument, with the result written in hexadecimal.

md5('abc')900150983cd24fb0​d6963f7d28e17f72

parse_ident ( qualified_identifier text [, strict_mode boolean DEFAULT true ] ) → text[]

Splits qualified_identifier into an array of identifiers, removing any quoting of individual identifiers. By default, extra characters after the last identifier are considered an error; but if the second parameter is false, then such extra characters are ignored. (This behavior is useful for parsing names for objects like functions.) Note that this function does not truncate over-length identifiers. If you want truncation you can cast the result to name[].

parse_ident('"SomeSchema".someTable'){SomeSchema,sometable}

pg_client_encoding ( ) → name

Returns current client encoding name.

pg_client_encoding()UTF8

quote_ident ( text ) → text

Returns the given string suitably quoted to be used as an identifier in an SQL statement string. Quotes are added only if necessary (i.e., if the string contains non-identifier characters or would be case-folded). Embedded quotes are properly doubled. See also Example 43.1.

quote_ident('Foo bar')"Foo bar"

quote_literal ( text ) → text

Returns the given string suitably quoted to be used as a string literal in an SQL statement string. Embedded single-quotes and backslashes are properly doubled. Note that quote_literal returns null on null input; if the argument might be null, quote_nullable is often more suitable. See also Example 43.1.

quote_literal(E'O'Reilly')'O''Reilly'

quote_literal ( anyelement ) → text

Converts the given value to text and then quotes it as a literal. Embedded single-quotes and backslashes are properly doubled.

quote_literal(42.5)'42.5'

quote_nullable ( text ) → text

Returns the given string suitably quoted to be used as a string literal in an SQL statement string; or, if the argument is null, returns NULL. Embedded single-quotes and backslashes are properly doubled. See also Example 43.1.

quote_nullable(NULL)NULL

quote_nullable ( anyelement ) → text

Converts the given value to text and then quotes it as a literal; or, if the argument is null, returns NULL. Embedded single-quotes and backslashes are properly doubled.

quote_nullable(42.5)'42.5'

regexp_count ( string text, pattern text [, start integer [, flags text ] ] ) → integer

Returns the number of times the POSIX regular expression pattern matches in the string; see Section 9.7.3.

regexp_count('123456789012', 'ddd', 2)3

regexp_instr ( string text, pattern text [, start integer [, N integer [, endoption integer [, flags text [, subexpr integer ] ] ] ] ] ) → integer

Returns the position within string where the N‘th match of the POSIX regular expression pattern occurs, or zero if there is no such match; see Section 9.7.3.

regexp_instr('ABCDEF', 'c(.)(..)', 1, 1, 0, 'i')3

regexp_instr('ABCDEF', 'c(.)(..)', 1, 1, 0, 'i', 2)5

regexp_like ( string text, pattern text [, flags text ] ) → boolean

Checks whether a match of the POSIX regular expression pattern occurs within string; see Section 9.7.3.

regexp_like('Hello World', 'world$', 'i')t

regexp_match ( string text, pattern text [, flags text ] ) → text[]

Returns substrings within the first match of the POSIX regular expression pattern to the string; see Section 9.7.3.

regexp_match('foobarbequebaz', '(bar)(beque)'){bar,beque}

regexp_matches ( string text, pattern text [, flags text ] ) → setof text[]

Returns substrings within the first match of the POSIX regular expression pattern to the string, or substrings within all such matches if the g flag is used; see Section 9.7.3.

regexp_matches('foobarbequebaz', 'ba.', 'g')

 {bar}
 {baz}

regexp_replace ( string text, pattern text, replacement text [, start integer ] [, flags text ] ) → text

Replaces the substring that is the first match to the POSIX regular expression pattern, or all such matches if the g flag is used; see Section 9.7.3.

regexp_replace('Thomas', '.[mN]a.', 'M')ThM

regexp_replace ( string text, pattern text, replacement text, start integer, N integer [, flags text ] ) → text

Replaces the substring that is the N‘th match to the POSIX regular expression pattern, or all such matches if N is zero; see Section 9.7.3.

regexp_replace('Thomas', '.', 'X', 3, 2)ThoXas

regexp_split_to_array ( string text, pattern text [, flags text ] ) → text[]

Splits string using a POSIX regular expression as the delimiter, producing an array of results; see Section 9.7.3.

regexp_split_to_array('hello world', 's+'){hello,world}

regexp_split_to_table ( string text, pattern text [, flags text ] ) → setof text

Splits string using a POSIX regular expression as the delimiter, producing a set of results; see Section 9.7.3.

regexp_split_to_table('hello world', 's+')

 hello
 world

regexp_substr ( string text, pattern text [, start integer [, N integer [, flags text [, subexpr integer ] ] ] ] ) → text

Returns the substring within string that matches the N‘th occurrence of the POSIX regular expression pattern, or NULL if there is no such match; see Section 9.7.3.

regexp_substr('ABCDEF', 'c(.)(..)', 1, 1, 'i')CDEF

regexp_substr('ABCDEF', 'c(.)(..)', 1, 1, 'i', 2)EF

repeat ( string text, number integer ) → text

Repeats string the specified number of times.

repeat('Pg', 4)PgPgPgPg

replace ( string text, from text, to text ) → text

Replaces all occurrences in string of substring from with substring to.

replace('abcdefabcdef', 'cd', 'XX')abXXefabXXef

reverse ( text ) → text

Reverses the order of the characters in the string.

reverse('abcde')edcba

right ( string text, n integer ) → text

Returns last n characters in the string, or when n is negative, returns all but first |n| characters.

right('abcde', 2)de

rpad ( string text, length integer [, fill text ] ) → text

Extends the string to length length by appending the characters fill (a space by default). If the string is already longer than length then it is truncated.

rpad('hi', 5, 'xy')hixyx

rtrim ( string text [, characters text ] ) → text

Removes the longest string containing only characters in characters (a space by default) from the end of string.

rtrim('testxxzx', 'xyz')test

split_part ( string text, delimiter text, n integer ) → text

Splits string at occurrences of delimiter and returns the n‘th field (counting from one), or when n is negative, returns the |n|’th-from-last field.

split_part('abc~@~def~@~ghi', '~@~', 2)def

split_part('abc,def,ghi,jkl', ',', -2)ghi

starts_with ( string text, prefix text ) → boolean

Returns true if string starts with prefix.

starts_with('alphabet', 'alph')t

string_to_array ( string text, delimiter text [, null_string text ] ) → text[]

Splits the string at occurrences of delimiter and forms the resulting fields into a text array. If delimiter is NULL, each character in the string will become a separate element in the array. If delimiter is an empty string, then the string is treated as a single field. If null_string is supplied and is not NULL, fields matching that string are replaced by NULL. See also array_to_string.

string_to_array('xx~~yy~~zz', '~~', 'yy'){xx,NULL,zz}

string_to_table ( string text, delimiter text [, null_string text ] ) → setof text

Splits the string at occurrences of delimiter and returns the resulting fields as a set of text rows. If delimiter is NULL, each character in the string will become a separate row of the result. If delimiter is an empty string, then the string is treated as a single field. If null_string is supplied and is not NULL, fields matching that string are replaced by NULL.

string_to_table('xx~^~yy~^~zz', '~^~', 'yy')

 xx
 NULL
 zz

strpos ( string text, substring text ) → integer

Returns first starting index of the specified substring within string, or zero if it’s not present. (Same as position(substring in string), but note the reversed argument order.)

strpos('high', 'ig')2

substr ( string text, start integer [, count integer ] ) → text

Extracts the substring of string starting at the start‘th character, and extending for count characters if that is specified. (Same as substring(string from start for count).)

substr('alphabet', 3)phabet

substr('alphabet', 3, 2)ph

to_ascii ( string text ) → text

to_ascii ( string text, encoding name ) → text

to_ascii ( string text, encoding integer ) → text

Converts string to ASCII from another encoding, which may be identified by name or number. If encoding is omitted the database encoding is assumed (which in practice is the only useful case). The conversion consists primarily of dropping accents. Conversion is only supported from LATIN1, LATIN2, LATIN9, and WIN1250 encodings. (See the unaccent module for another, more flexible solution.)

to_ascii('Karél')Karel

to_hex ( integer ) → text

to_hex ( bigint ) → text

Converts the number to its equivalent hexadecimal representation.

to_hex(2147483647)7fffffff

translate ( string text, from text, to text ) → text

Replaces each character in string that matches a character in the from set with the corresponding character in the to set. If from is longer than to, occurrences of the extra characters in from are deleted.

translate('12345', '143', 'ax')a2x5

unistr ( text ) → text

Evaluate escaped Unicode characters in the argument. Unicode characters can be specified as XXXX (4 hexadecimal digits), +XXXXXX (6 hexadecimal digits), uXXXX (4 hexadecimal digits), or UXXXXXXXX (8 hexadecimal digits). To specify a backslash, write two backslashes. All other characters are taken literally.

If the server encoding is not UTF-8, the Unicode code point identified by one of these escape sequences is converted to the actual server encoding; an error is reported if that’s not possible.

This function provides a (non-standard) alternative to string constants with Unicode escapes (see Section 4.1.2.3).

unistr('d061t+000061')data

unistr('du0061tU00000061')data

В этом учебном пособии вы узнаете, как использовать PostgreSQL условие LIKE с синтаксисом и примерами.

Описание

PostgreSQL условие LIKE позволяет использовать подстановочные символы (метасимволы) в предложении WHERE оператора SELECT, INSERT, UPDATE или DELETE. Это позволяет выполнять сопоставление с pattern.

Синтаксис

Синтаксис для условия LIKE в PostgreSQL:

expression LIKE pattern [ ESCAPE ‘escape_character’ ]

Параметры или аргументы

expression
Символьное выражение, такое как поле или столбец.
pattern
Символьное выражение, которое содержит сопоставляемый шаблон. Шаблоны, которые вы можете выбрать из:

Подстановочный символ Пояснение
% Соответствует любой строке любой длины (в том числе нулевой длины)
_ Соответствует одному символу
escape_character
Необязательный. Это позволяет вам проверять наличие буквенных символов, таких как % или _. Если вы не предоставите escape_character, PostgreSQL предполагает, что является escape_character.

Пример — использование подстановочного символа % (символ знак процента)

Первый пример PostgreSQL LIKE, который мы рассмотрим, включает использование подстановочного символа % (символ знак процента).
Давайте разберем, как подстановочный символ % работает в PostgreSQL условии LIKE. Мы хотим найти всех employees, last_name начинается с ‘Jo’.

SELECT *

FROM employees

WHERE first_name LIKE ‘Jo%’;

Вы также можете использовать подстановочный символ % несколько раз в одной строке. Например,

SELECT *

FROM employees

WHERE first_name LIKE ‘%od%’;

В этом PostgreSQL примере условия LIKE мы ищем всех employees, чье first_name содержит символы ‘od’.

Пример — использование подстановочного символа _ (символ подчеркивание)

Далее, давайте рассмотрим, как подстановочный символ _ (символ подчеркивания) работает в PostgreSQL условии LIKE. Помните, что подстановочный символ _ ищет только один символ.
Например:

SELECT first_name, last_name

FROM employees

WHERE first_name LIKE ‘Yoh_n’;

Этот пример PostgreSQL условия LIKE вернул бы всех suppliers, чье supplier_name имеет длину 5 символов, где первые три символа — «Yoh», а последний — «n». Например, он может возвращать записи таблицы employees, у которых first_name — «Yohan», «Yohen», «Yohin», «Yohon» и т.д.

Вот еще один пример:

SELECT *

FROM employees

WHERE employee_number LIKE ‘98765_’;

Вы можете обнаружить, что ищете номер счета, но у вас есть только 5 из 6 цифр. В приведенном выше примере можно было бы получить обратно 10 записей (где отсутствующее значение может быть равно 0–9). Например, он может вернуть записи таблицы employees с employee_number:
987650, 987651, 987652, 987653, 987654, 987655, 987656, 987657, 987658, 987659

Пример — использование оператора NOT

Теперь давайте посмотрим, как вы можете использовать оператор NOT с подстановочными символами.
Давайте использовать подстановочный символ % с оператором NOT. Вы также можете использовать PostgreSQL условие LIKE, чтобы найти записи таблицы employees, чье last_name не начинается с ‘J’.
Например:

SELECT first_name, last_name

FROM employees

WHERE last_name NOT LIKE ‘J%’;

Поместив оператор NOT перед PostgreSQL условием LIKE, вы сможете получить все записи employees, чье last_name не начинается с ‘J’.

Пример — использование Escape-символов

Важно понимать, как «экранировать символы» когда pattern соответствует. Эти примеры имеют дело с экранированием символов в PostgreSQL.
Допустим, вы хотели найти символ % или _ в условии PostgreSQL LIKE. Вы можете сделать это с помощью символа Escape.
Обратите внимание, что вы можете определить только escape-символ как один символ (длина 1).
Например:

SELECT *

FROM employees

WHERE last_name LIKE ‘G%’;

Поскольку мы не указали escape-символ, PostgreSQL предполагает, что это escape-символ. Затем PostgreSQL предполагает, что escape-символ является в результате PostgreSQL рассматривает символ % как литерал вместо подстановочного символа. Этот оператор затем возвращает все записи из employees, чье last_name равно G%.
Мы можем переопределить escape-символ по умолчанию в PostgreSQL, предоставив модификатор ESCAPE следующим образом:

SELECT *

FROM employees

WHERE last_name LIKE ‘G!%’ ESCAPE ‘!’;

Этот пример PostgreSQL условия LIKE определяет ! как escape-символ. ! escape-символ приведет к тому, что PostgreSQL будет рассматривать символ % как литерал. В результате этот оператор также вернет все записи из таблицы employees, last_name которых равно G%.
Вот еще один более сложный пример использования escape-символов в PostgreSQL условии LIKE.

SELECT *

FROM employees

WHERE last_name LIKE ‘M%%’;

Этот пример PostgreSQL условия LIKE возвращает всех employees, чье last_name начинается с ‘M’ и заканчивается на ‘%’. Например, он вернул бы значение, такое как «Mathison%». Поскольку в условии LIKE мы не указывали escape-символ, PostgreSQL предполагает, что escape-символ является что приводит к тому, что PostgreSQL рассматривает второй символ % как литерал вместо подстановочного символа.
Мы могли бы изменить это условие LIKE, указав escape-символ следующим образом:

SELECT *

FROM employees

WHERE last_name LIKE ‘M%!%’ ESCAPE ‘!’;

Этот пример PostgreSQL условия LIKE возвращает все записи из employees, чье last_name начинается с ‘M’ и заканчивается литеральным ‘%’. Например, он будет возвращать значение, такое как «Mathison%».
Вы также можете использовать escape-символ с символом _ в условии PostgreSQL LIKE.
Например:

SELECT *

FROM employees

WHERE last_name LIKE ‘M%_’;

Опять же, поскольку модификатор ESCAPE не предоставлен, PostgreSQL использует как escape-символ, приводящий к символу _, который будет трактоваться как литерал вместо подстановочного символа. В этом примере будут возвращаться все поля из employees, чье last_name начинается с ‘M’ и заканчивается на ‘_’. Например, он вернул бы значение, такое как ‘Mathison_’.

Function Return Type Description Example Result
ascii(string) int ASCII code of the first character of the argument. For UTF8 returns the Unicode code point of the character. For other multibyte encodings, the argument must be an ASCII character. ascii('x') 120
btrim(string text [, characters text]) text Remove the longest string consisting only of characters in characters (a space by default) from the start and end of string btrim('xyxtrimyyx', 'xyz') trim
chr(int) text Character with the given code. For UTF8 the argument is treated as a Unicode code point. For other multibyte encodings the argument must designate an ASCII character. The NULL (0) character is not allowed because text data types cannot store such bytes. chr(65) A
concat(str "any" [, str "any" [, ...] ]) text Concatenate the text representations of all the arguments. NULL arguments are ignored. concat('abcde', 2, NULL, 22) abcde222
concat_ws(sep text, str "any" [, str "any" [, ...] ]) text Concatenate all but the first argument with separators. The first argument is used as the separator string. NULL arguments are ignored. concat_ws(',', 'abcde', 2, NULL, 22) abcde,2,22
convert(string bytea, src_encoding name, dest_encoding name) bytea Convert string to dest_encoding. The original encoding is specified by src_encoding. The string must be valid in this encoding. Conversions can be defined by CREATE CONVERSION. Also there are some predefined conversions. See Table 9.10 for available conversions. convert('text_in_utf8', 'UTF8', 'LATIN1') text_in_utf8 represented in Latin-1 encoding (ISO 8859-1)
convert_from(string bytea, src_encoding name) text Convert string to the database encoding. The original encoding is specified by src_encoding. The string must be valid in this encoding. convert_from('text_in_utf8', 'UTF8') text_in_utf8 represented in the current database encoding
convert_to(string text, dest_encoding name) bytea Convert string to dest_encoding. convert_to('some text', 'UTF8') some text represented in the UTF8 encoding
decode(string text, format text) bytea Decode binary data from textual representation in string. Options for format are same as in encode. decode('MTIzAAE=', 'base64') x3132330001
encode(data bytea, format text) text Encode binary data into a textual representation. Supported formats are: base64, hex, escape. escape converts zero bytes and high-bit-set bytes to octal sequences (nnn) and doubles backslashes. encode('1230001', 'base64') MTIzAAE=
format(formatstr text [, formatarg "any" [, ...] ]) text Format arguments according to a format string. This function is similar to the C function sprintf. See Section 9.4.1. format('Hello %s, %1$s', 'World') Hello World, World
initcap(string) text Convert the first letter of each word to upper case and the rest to lower case. Words are sequences of alphanumeric characters separated by non-alphanumeric characters. initcap('hi THOMAS') Hi Thomas
left(str text, n int) text Return first n characters in the string. When n is negative, return all but last |n| characters. left('abcde', 2) ab
length(string) int Number of characters in string length('jose') 4
length(string bytea, encoding name ) int Number of characters in string in the given encoding. The string must be valid in this encoding. length('jose', 'UTF8') 4
lpad(string text, length int [, fill text]) text Fill up the string to length length by prepending the characters fill (a space by default). If the string is already longer than length then it is truncated (on the right). lpad('hi', 5, 'xy') xyxhi
ltrim(string text [, characters text]) text Remove the longest string containing only characters from characters (a space by default) from the start of string ltrim('zzzytest', 'xyz') test
md5(string) text Calculates the MD5 hash of string, returning the result in hexadecimal md5('abc') 900150983cd24fb0 d6963f7d28e17f72
parse_ident(qualified_identifier text [, strictmode boolean DEFAULT true ] ) text[] Split qualified_identifier into an array of identifiers, removing any quoting of individual identifiers. By default, extra characters after the last identifier are considered an error; but if the second parameter is false, then such extra characters are ignored. (This behavior is useful for parsing names for objects like functions.) Note that this function does not truncate over-length identifiers. If you want truncation you can cast the result to name[]. parse_ident('"SomeSchema".someTable') {SomeSchema,sometable}
pg_client_encoding() name Current client encoding name pg_client_encoding() SQL_ASCII
quote_ident(string text) text Return the given string suitably quoted to be used as an identifier in an SQL statement string. Quotes are added only if necessary (i.e., if the string contains non-identifier characters or would be case-folded). Embedded quotes are properly doubled. See also Example 43.1. quote_ident('Foo bar') "Foo bar"
quote_literal(string text) text Return the given string suitably quoted to be used as a string literal in an SQL statement string. Embedded single-quotes and backslashes are properly doubled. Note that quote_literal returns null on null input; if the argument might be null, quote_nullable is often more suitable. See also Example 43.1. quote_literal(E'O'Reilly') 'O''Reilly'
quote_literal(value anyelement) text Coerce the given value to text and then quote it as a literal. Embedded single-quotes and backslashes are properly doubled. quote_literal(42.5) '42.5'
quote_nullable(string text) text Return the given string suitably quoted to be used as a string literal in an SQL statement string; or, if the argument is null, return NULL. Embedded single-quotes and backslashes are properly doubled. See also Example 43.1. quote_nullable(NULL) NULL
quote_nullable(value anyelement) text Coerce the given value to text and then quote it as a literal; or, if the argument is null, return NULL. Embedded single-quotes and backslashes are properly doubled. quote_nullable(42.5) '42.5'
regexp_match(string text, pattern text [, flags text]) text[] Return captured substring(s) resulting from the first match of a POSIX regular expression to the string. See Section 9.7.3 for more information. regexp_match('foobarbequebaz', '(bar)(beque)') {bar,beque}
regexp_matches(string text, pattern text [, flags text]) setof text[] Return captured substring(s) resulting from matching a POSIX regular expression to the string. See Section 9.7.3 for more information. regexp_matches('foobarbequebaz', 'ba.', 'g') {bar}

{baz}

(2 rows)

regexp_replace(string text, pattern text, replacement text [, flags text]) text Replace substring(s) matching a POSIX regular expression. See Section 9.7.3 for more information. regexp_replace('Thomas', '.[mN]a.', 'M') ThM
regexp_split_to_array(string text, pattern text [, flags text ]) text[] Split string using a POSIX regular expression as the delimiter. See Section 9.7.3 for more information. regexp_split_to_array('hello world', 's+') {hello,world}
regexp_split_to_table(string text, pattern text [, flags text]) setof text Split string using a POSIX regular expression as the delimiter. See Section 9.7.3 for more information. regexp_split_to_table('hello world', 's+') hello

world

(2 rows)

repeat(string text, number int) text Repeat string the specified number of times repeat('Pg', 4) PgPgPgPg
replace(string text, from text, to text) text Replace all occurrences in string of substring from with substring to replace('abcdefabcdef', 'cd', 'XX') abXXefabXXef
reverse(str) text Return reversed string. reverse('abcde') edcba
right(str text, n int) text Return last n characters in the string. When n is negative, return all but first |n| characters. right('abcde', 2) de
rpad(string text, length int [, fill text]) text Fill up the string to length length by appending the characters fill (a space by default). If the string is already longer than length then it is truncated. rpad('hi', 5, 'xy') hixyx
rtrim(string text [, characters text]) text Remove the longest string containing only characters from characters (a space by default) from the end of string rtrim('testxxzx', 'xyz') test
split_part(string text, delimiter text, field int) text Split string on delimiter and return the given field (counting from one) split_part('abc~@~def~@~ghi', '~@~', 2) def
strpos(string, substring) int Location of specified substring (same as position(substring in string), but note the reversed argument order) strpos('high', 'ig') 2
substr(string, from [, count]) text Extract substring (same as substring(string from from for count)) substr('alphabet', 3, 2) ph
starts_with(string, prefix) bool Returns true if string starts with prefix. starts_with('alphabet', 'alph') t
to_ascii(string text [, encoding text]) text Convert string to ASCII from another encoding (only supports conversion from LATIN1, LATIN2, LATIN9, and WIN1250 encodings) to_ascii('Karel') Karel
to_hex(number int or bigint) text Convert number to its equivalent hexadecimal representation to_hex(2147483647) 7fffffff
translate(string text, from text, to text) text Any character in string that matches a character in the from set is replaced by the corresponding character in the to set. If from is longer than to, occurrences of the extra characters in from are removed. translate('12345', '143', 'ax') a2x5
  1. Use the like Operator to SELECT if String Contains a Substring Match in PostgreSQL
  2. Use the position() Function to SELECT if String Contains a Substring Match in PostgreSQL
  3. Use the similar to Regular Expression to SELECT if String Contains a Substring Match in PostgreSQL
  4. Use the substring() Function to SELECT if String Contains a Substring Match in PostgreSQL
  5. Use the Posix Regular Operators to SELECT if String Contains a Substring Match in PostgreSQL

SELECT if String Contains a Substring Match in PostgreSQL

Today, we will learn how to find a value in the PostgreSQL string and select it if it matches a specific condition.

Suppose you have a string carabc and want to see if it contains the value car or not. Thus, you would try to use a function that would tell you if such a substring exists in a string or not.

PostgreSQL provides three methods: like, similar to, and the Posix operator. We will discuss these in detail.

Use the like Operator to SELECT if String Contains a Substring Match in PostgreSQL

The like expression will return True if the string contains the specific sub-string; else, False. First, we create a table and insert some values into it:

create table strings(
    str TEXT PRIMARY KEY
);

insert into strings values ('abcd'), ('abgh'), ('carabc');

Now, let’s select the abc values from this table.

select * from strings where str like 'abc';

When we run this, nothing happens. An empty table is displayed.

Why didn’t it return abcd and carabc from the table? Take a look at the statement below and try to see what happens:

select * from strings where str like 'abcd';

Output:

Use Like Operator To Find String Match

Thus, now you see that calling like tends to return results that are the exact copy of the string we are trying to match.

To match it to a substring, you need to add a % to the substring you are trying to match.

The following provides a better example of how to do that:

select * from strings where str like '%abc%';

Output:

Use Like Operator With % To Find String Match

The %x tells the query that we need to find the x in the string after some other values are located behind this x. If we wanted to find carabc, we would say %abc, and it would return carabc as car is appended before the abc.

But if we wanted to return abcd, we would use x% as d is appended after the x. Try this out yourself and see the results to get a better understanding!

Because we wanted to check if a string contains abc that could be either at the left, right or the middle, we used %abc% to return all such strings.

The like operator can also be written as ~~ and if not like, then use !~~.

Another expression is known as Ilike helps match strings without any case sensitivity. Meaning that if we use the statement:

select * from strings where str Ilike '%Abc%';

It would return the previous results even though abc is not equal to Abc as the a is capitalized in the substring.

Like the like operator, you can write Ilike as ~~* and not Ilike as !~~*.

Suppose you want to run a query to check the records in a table that match a particular string. If you want to compare abcde to the records in the table and return the row to which it matches, you can use:

select * from strings where 'abcd' like '%' || str || '%'

This would append the rows inside the str column to the syntax of %_str_%, and then we can compare each value inside and see which ones match.

Output:

Use Like Operator to Compare String and Return Row

To match a sub-string with some rogue character such as escape, %, or backslash /, , you can use,

select * from strings where 'abcd' like '%/abc'

This would return nothing as the function fails to eliminate the value %. This takes us to our next point, using the position function.

Use the position() Function to SELECT if String Contains a Substring Match in PostgreSQL

The position() function is a better alternative for checking if a substring exists within a string. This is defined under the operations for the string in the PostgreSQL documentation.

It returns the index of the found substring inside the main string. So if we were to find car in carabc, it would return 1.

In this way, we can see if any sub-string is present inside a string or not by checking the value of the returned index. If it is greater than 0, the sub-string exists; else, it doesn’t.

select * from strings where position(str in 'abcde') > 0

The above statement would return two values, a and abcd, as they both are present in abcde. You can manipulate this as needed.

Also, if you have any other character inside your sub-string such as %, it would skip that in the check and return accurate results, making it far better than the like expression.

select * from strings where position(str in '% abcde') > 0

Running the above will give the same results.

A possible replacement for the position expression can be strpos, which similarly works.

Use the similar to Regular Expression to SELECT if String Contains a Substring Match in PostgreSQL

The only difference between like and similar to is that the latter is a standard SQL definition used in various other DMBS.

select * from strings where position(str in 'abcde') > 0

This will return the same result as the like expression.

To use alternative matches, you can use:

select * from strings where str similar to '%(abc|a)%'

This will return all strings matching abc or a. When we run this query, we are returned with all strings in the table as each contains an a.

In case you want to disable any meta-characters in your matching substring, you can use a backslash to disable what we tend to call rogue characters inside a string.

Use the substring() Function to SELECT if String Contains a Substring Match in PostgreSQL

Another manipulation can be done to the substring() function as follows:

select * from strings where str ~~ substring(str similar '%abc%' escape '#')

The substring() returns the strings similar to abc in our case or contains abc. We then match the returned results to the str using the ~~ operator, short for like, and if they match, we select the results from the table.

This straightforward function even helps in dividing the string into separate parts can be seen in the syntax provided:

substring(string similar pattern escape escape-character)

or

substring(string from a pattern for escape-character)

or

substring(string, pattern, escape-character)

The escape-character tends to divide our string to be matched into different parts if it contains the escape-character at various points.

So if we run the statement:

select * from strings where str ~~ substring(str similar '#"abcd#"%' escape '#')

The '#"abcd#"%' will be divided into abcd, enclosed between two # characters. Thus, we can also find the matching string abcd.

Use the Posix Regular Operators to SELECT if String Contains a Substring Match in PostgreSQL

Posix

The above has been taken from a table described in the PostgreSQL documentation for Posix operators that perform the same match functions.

You can use the statement as follows to see if a string contains a sub-string or not:

select * from strings where str ~ 'abc'

This will return the values carabc and abcd.

You can even use the regexp_match(string, pattern [, flags]), which returns a null if no match is found. If a match is found, it will return an array with all the substrings matching a pattern.

To understand this, look at the following query:

select regexp_match('abdfabc', 'abd')

Output:

Use regexp_match to Find Match 1

Using the another query,

select regexp_match('abdfabc', 'abf')

Output:

Use regexp_match to Find Match 2

Now, you see how this expression finds a pattern and returns it. You can use this expression in a function, call the function for the select operation and then return all matching strings.

Понравилась статья? Поделить с друзьями:
  • Как легко найти общий язык с людьми
  • Электротехника как найти rэкв
  • Как найти банк псб
  • Как составить резюме электрик
  • Как найти длину ломаной как сложить