山高疑日近,海阔觉天低

Regular Expression

原文

Metacharacters

Character What does it do?
$ Matches the end of the input. If in multiline mode, it also matches before a line break character, hence every end of line.
(?:x) Matches ‘x’ but does NOT remember the match. Also known as NON-capturing parenthesis.
(x) Matches ‘x’ and remembers the match. Also known as capturing parenthesis.
* Matches the preceding character 0 or more times.
+ Matches the preceding character 1 or more times.
. Matches any single character except the newline character.
?
  • Matches the preceding character 0 or 1 time.
  • When used after the quantifiers *, +, ? or {}, makes the quantifier non-greedy; it will match the minimum number of times as opposed to matching the maximum number of times.
[\b] Matches a backspace.
[^abc] Matches anything NOT enclosed by the brackets. Also known as a negative character set.
[abc] Matches any of the enclosed characters. Also known as a character set. You can create range of characters using the hyphen character such as A-Z (A to Z). Note that in character sets, special characters (., *, +) do not have any special meaning.
\
  • Used to indicate that the next character should NOT be interpreted literally. For example, the character ‘w’ by itself will be interpreted as ‘match the character w’, but using ‘\w’ signifies ‘match an alpha-numeric character including underscore’.
  • Used to indicate that a metacharacter is to be interpreted literally. For example, the ‘.’ metacharacter means ‘match any single character but a new line’, but if we would rather match a dot character instead, we would use ‘\.’.
\0 Matches a NULL character.
\b Matches a word boundary. Boundaries are determined when a word character is NOT followed or NOT preceded with another word character.
\B Matches a NON-word boundary. Boundaries are determined when two adjacent characters are word characters OR non-word characters.
\cX Matches a control character. X must be between A to Z inclusive.
\d Matches a digit character. Same as [0-9] or [0123456789].
\D Matches a NON-digit character. Same as [^0-9] or [^0123456789].
\f Matches a form feed.
\n Matches a line feed.
\r Matches a carriage return.
\s Matches a single white space character. This includes space, tab, form feed and line feed.
\S Matches anything OTHER than a single white space character. Anything other than space, tab, form feed and line feed.
\t Matches a tab.
\uhhhh Matches a character with the 4-digits hexadecimal code.
\v Matches a vertical tab.
\w Matches any alphanumeric character including underscore. Equivalent to [A-Za-z0-9_].
\W Matches anything OTHER than an alphanumeric character including underscore. Equivalent to [^A-Za-z0-9_].
\x A back reference to the substring matched by the x parenthetical expression. x is a positive integer.
\xhh Matches a character with the 2-digits hexadecimal code.
^
  • Matches the beginning of the input. If in multiline mode, it also matches after a line break character, hence every new line.
  • When used in a set pattern ([^abc]), it negates the set; match anything not enclosed in the brackets
x(?!y) Matches ‘x’ only if ‘x’ is NOT followed by ‘y’. Also known as a negative lookahead.
x(?=y) Matches ‘x’ only if ‘x’ is followed by ‘y’. Also known as a lookahead.
x|y Matches ‘x’ OR ‘y’.
{n,m} Matches the preceding character at least n times and at most m times. n and m can be omitted if zero..
{n} Matches the preceding character exactly n times.
未经允许不得转载:Mr.Zhang » Regular Expression

评论 抢沙发

评论前必须登录!