Pages, some stolen, some original

Friday, March 12, 2010

stristr

When you process text with a computer program, sometimes you need to search for a string, and when you search for that string, sometimes you care whether the letters match exactly. That is, are they all the same case? All UPPERCASE, or all lowercase? Or MaYbE tHeY aRe AlL mixED up.

There is a set of standard string functions that come with all C Compilers. strstr(s1, s2) will perform an exact match. But what if you don't need an exact match, you just need the sequence of letters. Then you need stristr (case insensitive string search)! And where, pray tell, do you find stristr? Well, you can write your own, or you can download one from Code Snippets, or you can borrow the one I wrote. My first one used a somewhat crude approach. I copied the two strings into two arrays, forced all the characters in both arrays to be uppercase, and then used the standard strstr function to perform the search.

But then I got to thinking about it and realized it had a fairly serious limitation. It would only work on strings up to a certain length. What if I needed to look for a piece of text in a book? Or maybe even an encyclopedia? We don't want to have make a complete copy of a book just to search for a string. I mean computers are fast, and space is cheap, but we should exercise a little discipline here. So I rewrote it to only use the original strings. I use strchr to locate the first character, if it is there at all, and then do a character by character comparison to see if we have a match.

If I was using Microsoft's Visual C, I could have used stricmp, (case insensitive string compare), but it is also not a C standard, so there's a fine line there. Are we going to stick with standard library functions, or are we going to allow common extensions? Since we are not using hardware assisted scanning, it is going to run a little slower, but since we don't have to make the two copies, we are also saving a little time.

This is the kind of thing programmers do for fun when they aren't working. Both the old and new routines are stored as a Google Doc.

No comments:

Post a Comment