Wednesday, February 10, 2010

knee deep now

Yesterday was the most frustrating day yet in my submersion of open source, but might of been the best. A lot of good things came from it.

One, I got a new direction on my http://processingjs.org/ bug, bug #226. I'm still going with the mask all strings approach, and then later replace them, like I decided in this post, but the difference is in the implementation. The way fill the array, and mask the string has not changed. For replacing the masks with the strings, before, I was simply going through a for loop, and replacing all matches for <STRING n> using n as the loop index, and replacing that with the string in the index of the array. Like this.

// replaces all masked strings from to the appropriate string
if (strings != null){
for( var i = 0; l = i < strings.length; i++ ){
var ex = new RegExp("\<STRING " + i + "\>");
aCode = aCode.replace(ex, strings[i]);
}
}

This worked in the most basic sense, but if someone had a string containing <STRING n>, like I mentioned in the same post I earlier mentioned, it would crash. The solution was, what I thought, regular expressions. I was doing it all in one, and it looked like this.

((((\\\"|\\')|[^'\"])*['\"]((\\\"|\\')|[^'\"])*['\"])*[^'\"]*)

What it's doing is counting all the instances of single or double quotes that do not follow a backslash, then if it was an even number of quotes, it was NOT in a string, odd number of quotes, and I was inside a string, not including escaped quotes.

It technically worked, but I over looked something. It didn't care which kind of quotes I used, so a string this like "'", which is valid, would need to be escaped. It would be expecting "\'". That last one would work, but isn't valid code as the quote being escaped can only be the one used to start the string.

My solution was to look deeper into regular expressions. I looked into some advanced regular expressions and I learned a lot. It was productive, but no code was written. I looked at concepts like attomic grouping, which is not available in JavaScript, but this blog post explains how to emulate atomic grouping in JavaScript. It was interesting. I also found this link, on some advanced regular expression techniques. It was a great read. But, none of these had the answer I was looking for. I asked around, obviously asking my professor David Humphrey first. He gave me some advice, and direction, or redirection. A link and some encouragement.

I'm knee deep in a new solution now, that's a lot more code, but looks very promising already. This is the idea.

aCode = aCode.replace(new RegExp("(.*)(\<STRING " + i + "\>)(.*)", "g"), function(all, quoteStart, match, quoteEnd){

});

Inside this function, which is pretty cool by the way, I'll just parse the single "quoteStart" and or "quoteEnd", without regex, and see if "match" is inside a string or not, if it's not, I return the new string, if it is, I return "all".

The inner functions classes are filled based on the order of matches in the regex. So, the first parameter sent is $0, or the whole matched string. Second is $1, which is the first matched pattern, inside the first brackets. etc.

I prefer it this way, it's easier to read and understand than a complex regex, and modular, because it's easier to edit what happens inside the inner function, without breaking the regular expression.

No comments:

Post a Comment