Monday, February 8, 2010

There and back again, and regular expressions are hardcore

So, I've been thinking about my solution to bug #226, and at one point, I changed my mind on my solution, of masking all strings, and later replacing them based on <STRING n>. I thought of a few possible bugs, mostly the odd chance someone enters <STRING 7> (for example) as a string, and that would be a problem, because the parser would put <STRING 7> into it's array, and mask it with the appropriate mask, then later replace the string, but if it happens to contain a string of a string that has yet to be replaced, the parser would then find that string, replace it. Hmm, hard to explain, but I decided the only solution would be to only replace <STRING n> later, when all the code is replaced, if the <STRING n> is NOT inside a string, and that was kinda what I wanted to avoid in the first place.

So, I figured I should re consider checking all parser regexs to not parse while in a string, like I have to do while replacing <STRING n> with the appropriate strings. The advantage of doing the later, is I don't have to worry about keeping an array of all strings, which might get quite large, but only has to be done once for each time the code is parsed, I believe, which isn't too bad, but not optimal.

I thought more about it (this was while taking my 2 hour bus ride home) and decided against it, so I was back to my original solution. My reasoning: KISS. The original method might not be the fastest, but it's simple, you do it once, and it's done. The other way, might be faster (marginal) and it's situational, but it's a debugging nightmare. Each time someone adds something to the parser, they must remember to check to make sure it's not in a string. I only need to check to make sure it's not in a string once, and no one else has to ever worry about my code, also making it more modular.

The other thing I've been working on: really tweaking the regex for checking for strings. It's not apparent, there were a lot of things I was not considering, and I've been changing and testing various solutions, and right now, this is how it looks.

/(["'])([^"'\\]|\\.)*(\1)/g

Before, I was being greeeedy, things like this test, would muck it all up.

text("Hello World!" + (4 + 5) + "Hello World",50,i+=15);

Now, it was being printed off fine in this case, but any processing that was between the two strings, would NOT get parsed, so I changed it to a non greedy string, and added a few more touches, like using ['"] instead of ('|"), which are pretty much the same, but to me, the first one is simpler, and using back reference on the string type, just because I think it's cleaner that way. Anyway, all for today.

No comments:

Post a Comment