Thursday, April 8, 2010

I love you word boundary.

A word boundary is, other than my savior, a regular expression match that matches the tiny, invisible space between a word character (a-zA-Z0-9_) and a non-word character.

You use it with \b and \B (\B matching a non-word boundary, same as [^\b])

Example: "this is some text" with the regular expression \bs. This would only match the "s" in "some" because there is a non word character before the s. Also, if you did s\b it would match the "s" in "this" because it's before a non word character. Finally, \Bs would match "this" and "is" and s\B would match "some". I hope that all made sense.

I needed this in a particular problem where I had to parse some code, and catch any specific words, globally. The problem was, when I checked for a match (previously) I was checking for (\W) which is a non word character, then check for my matching word, so if I was looking for all instances of "scott" I would not want to catch "scotty" as in this case, would be a different variable. So, /(\W)(scott){\W)/g, problem with this is, it would "eat" the leading and trailing characters, making any other matches in the string not check the leading and trailing characters again.

By using \b, I was able to check what I needed, without "eating" the leading and trailing characters.

If anyone is curious, the final product looks like so:

// add this. to public variables used inside member functions, and constructors
if (publicVars) {
// Search functions for public variables
for (var i = 0; i < methodsArray.length; i++) {
methods += methodsArray[i].replace(/(addMethod.*?\{)([\s|\S]*\}\);)/g, function(all, header, body) {
return header + body.replace(new RegExp("(\\.)?\\b(" + publicVars.substr(0, publicVars.length-1) + ")\\b", "g"), function (all, first, variable) {
if (first === ".") {
return all;
} else {
return "this." + variable;
}
});
});
}
// Search constructors for public variables
constructors = constructors.replace(new RegExp("(var\\s*?|\\.)?\\b(" + publicVars.substr(0, publicVars.length-1) + ")\\b", "g"), function (all, first, variable) {
if (/var\s*?$/.test(first) || first === ".") {
return all;
} else {
return "this." + variable;
}
});
}

Obviously, there is code before and after, with declared variables, but just use your imagination ;) and I'm still looking for bug, but feel free to scrutinize my code.

1 comment:

  1. Wow!! Good find. That appears to be a phenomenally helpful regular expression you have found, my friend.

    ReplyDelete