Regex in C#

HomeC#

Regex in C#

How to update the GUI from a different thread
Bitwise Operators in C#
Numeric data types in C#

Regex provides a powerful way to search, match, and manipulate text based on patterns. Here’s a breakdown of how it works, including common tokens and examples:

1. The Regex Class:

The core of regex in C# resides in the System.Text.RegularExpressions.Regex class. You create an instance of this class, providing the regular expression pattern as a string.

using System.Text.RegularExpressions;

string pattern = @"\d+"; // Matches one or more digits
Regex regex = new Regex(pattern);

2. Basic Matching:

  • Regex.IsMatch(): Checks if the input string contains a match for the pattern.
string input = "There are 123 apples.";
bool isMatch = regex.IsMatch(input); // isMatch will be true
  • Regex.Match(): Returns the first match found in the input string.
Match match = regex.Match(input);
if (match.Success)
{
    Console.WriteLine(match.Value); // Output: 123
    Console.WriteLine(match.Index); // Output: 10 (starting position)
}
  • Regex.Matches(): Returns a collection of all matches.
MatchCollection matches = regex.Matches(input);
foreach (Match m in matches)
{
    Console.WriteLine(m.Value); // Output: 123
}

3. Regex Tokens and Examples:

Here’s a breakdown of essential regex tokens with C# examples:

  • Character Classes:
    • \d: Matches any digit (0-9). Example: \d+ matches one or more digits.
    • \D: Matches any non-digit character.
    • \w: Matches any word character (letters, numbers, underscore).
    • \W: Matches any non-word character.
    • \s: Matches any whitespace character (space, tab, newline).
    • \S: Matches any non-whitespace character.
    • .: Matches any character except a newline.
    • [abc]: Matches any one of the characters a, b, or c.
    • [^abc]: Matches any character not in the set a, b, or c.
    • [a-z]: Matches any lowercase letter.
    • [A-Z]: Matches any uppercase letter.
    • [0-9a-zA-Z]: Matches any alphanumeric character.
string pattern = @"\b\w+\b"; // Matches whole words
string input = "Hello World!";
foreach (Match match in Regex.Matches(input, pattern))
{
    Console.WriteLine(match.Value); // Output: Hello, World
}
  • Quantifiers:
    • *: Matches zero or more occurrences.
    • +: Matches one or more occurrences.
    • ?: Matches zero or one occurrence.
    • {n}: Matches exactly n occurrences.  
    • {n,}: Matches n or more occurrences.
    • {n,m}: Matches between n and m occurrences.  
string pattern = @"\d{2,4}"; // Matches numbers with 2 to 4 digits
string input = "12 123 1234 12345";
foreach (Match match in Regex.Matches(input, pattern))
{
    Console.WriteLine(match.Value); // Output: 12, 123, 1234
}
  • Anchors:
    • ^: Matches the beginning of the string.
    • $: Matches the end of the string.
    • \b: Matches a word boundary.
string pattern = @"^\d+$"; // Matches a string that contains only digits from beginning to end
string input1 = "12345"; // Match
string input2 = "12345a"; // No Match
Console.WriteLine(Regex.IsMatch(input1, pattern)); // True
Console.WriteLine(Regex.IsMatch(input2, pattern)); // False
  • Grouping and Capturing:
    • ( ): Groups parts of the pattern and captures the matched text. You can refer back to captured groups using backreferences (\1, \2, etc.).
string pattern = @"(\w+)\s(\w+)"; // Matches two words separated by space and captures each word
string input = "First Second";
Match match = Regex.Match(input, pattern);
if (match.Success)
{
    Console.WriteLine(match.Groups[1].Value); // Output: First
    Console.WriteLine(match.Groups[2].Value); // Output: Second
}
  • Alternation:
    • |: Matches either the expression before or the expression after the pipe.
string pattern = @"cat|dog"; // Matches either "cat" or "dog"
string input = "I have a cat and a dog.";
foreach (Match match in Regex.Matches(input, pattern))
{
    Console.WriteLine(match.Value); // Output: cat, dog
}
  • Escaping Special Characters:
    • If you need to match a character that has special meaning in regex (like ., *, [, etc.), you need to escape it with a backslash (\). For example, to match a literal dot, use \..
string pattern = @"\d+\.\d+"; // Matches a floating-point number (e.g., 3.14)
string input = "3.14";
Console.WriteLine(Regex.IsMatch(input, pattern)); // True

4. Regex Options:

You can provide additional options to the Regex constructor to modify the matching behavior. Common options include:

  • RegexOptions.IgnoreCase: Performs case-insensitive matching.
  • RegexOptions.Multiline: Makes ^ and $ match the beginning/end of each line instead of the entire string.
  • RegexOptions.Singleline: Makes . match any character (including newline).
string pattern = "hello";
string input = "Hello World!";
Regex regex = new Regex(pattern, RegexOptions.IgnoreCase);
Console.WriteLine(regex.IsMatch(input)); // True (case-insensitive)

5. Using Regex for Replacement:

The Regex.Replace() method allows you to replace matched text with a new string.

string input = "I have 123 apples and 456 oranges.";
string newText = Regex.Replace(input, @"\d+", "XXX");
Console.WriteLine(newText); // Output: I have XXX apples and XXX oranges.

This comprehensive explanation should give you a solid foundation for working with regular expressions in C#. Remember to test your regex patterns thoroughly to ensure they behave as expected. Online regex testers can be invaluable for this purpose. If you have any more specific questions about regex usage in C#, feel free to ask!

COMMENTS

DISQUS: 0