Advanced Regular Expressions
Regular expression quantifiers are greedy by default. This means that each quantifier tries to get as many repetitions as possible. As the regular expression is executed from left to right, this means that the first quantifiers may get more repetitions than the following ones. In some situations, this is not desirable, and that's what non-greedy quantifiers are for. They try to match as few repetitions as possible while still fulfilling the pattern as a whole.
In order to get a non-greedy quantifier, just append a "?", e.g. write "*?" instead of just "*".
String a = "a,b,c";
Matcher m1 = Pattern.compile("(.*),(.*)").matcher(a); // greedy quantifiers
if (m1.matches())
System.out.println(m1.group(1) + " " + m1.group(2)); // prints "a,b c"
Matcher m2 = Pattern.compile("(.*?),(.*?)").matcher(a); // non-greedy quantifiers
if (m2.matches())
System.out.println(m2.group(1) + " " + m2.group(2)); // prints "a b,c"
Lookahead assertions allow you to state that a pattern only matches if it is followed by another sub-pattern. This sub-pattern itself is not part of the actual match though.
String a = "ab ac ad";
String b = "pineapple";
assert a.replaceAll("a(?=c)", "X").equals("ab Xc ad"); // Match "a" followed by "c" (more)
assert a.replaceAll("a(?!c)", "X").equals("Xb ac Xd"); // Match "a" not followed by "c" (more)
assert b.replaceAll("[a-z](?=[eip])", "X").equals("XiXeXXpXe");
assert b.replaceAll("[a-z](?![eip])", "X").equals("pXnXapXlX");
assert "5".replaceFirst("^(?=\\d$)", "0").equals("05"); // prepend 0 to single-digit strings
// Lookahead can be everywhere in the pattern (more):
assert b.replaceAll("[a-z](?=[eip])[a-e]", "X").equals("piXappX"");
Similar to lookahead assertions, lookbehind assertions allow you to state a sub-pattern that must be in front of the main pattern. The sub-pattern itself is not part of the match.
String a = "ba ca da";
assert a.replaceAll("(?<=c)a", "X").equals("ba cX da"); // Match "a" following "c" (more)
assert a.replaceAll("(?<!c)a", "X").equals("bX ca dX"); // Match "a" not following "c" (more)
You can find more information on lookahead and lookbehind in the Regex Tutorial.