Solution to challenge 2 (compare similar lists)

Sketch of my code:

  1. Make mutable arrays containing each list (“words” and “proper names”). Mutable was more important when I tried a different solution than the one I ended up going with.

  2. Check each word in list #1 against every word in list #2. There are a few ways I test this so to cut the list down to the size I want:
    a) Is a word from the words list found in the proper names list?
    b) Is that word in the words list the same length as the word in the proper names list?
    c) Is the first letter of that word in the words list NOT uppercase? I check for this so to remove the proper names that also are in the words list.

  3. If a, b and c all return true for the word in the words list, then I add that word to a mutable array I made.

  4. I then print out the results. My tally comes to 293 items.

Code below:


#import <Foundation/Foundation.h>

int main(int argc, const char * argv[]) {
    @autoreleasepool {
        
        NSString *nameString = [NSString stringWithContentsOfFile:@"/usr/share/dict/propernames"
                                                         encoding:NSUTF8StringEncoding
                                                            error:NULL];
        
        NSString *wordString = [NSString stringWithContentsOfFile:@"/usr/share/dict/words"
                                                         encoding:NSUTF8StringEncoding
                                                            error:NULL];
        
        NSArray *names = [nameString componentsSeparatedByString:@"\n"];
        NSMutableArray *namesMorph = [[NSMutableArray alloc] initWithArray:names];
        

        
        NSArray *words = [wordString componentsSeparatedByString:@"\n"];
        NSMutableArray *wordsMorph = [[NSMutableArray alloc] initWithArray:words];
        
        NSMutableArray *myResults = [NSMutableArray array];
        
        for (int i=0; i < [wordsMorph count]-1; i++){
            BOOL wordIsUppercase = [[NSCharacterSet uppercaseLetterCharacterSet] characterIsMember:[wordsMorph[i] characterAtIndex:0] ];
            for (int j = 0; j < [namesMorph count]-1; j++) {
                NSRange r = [namesMorph[j] rangeOfString:wordsMorph[i] options:NSCaseInsensitiveSearch];
                if ((r.location != NSNotFound) && ([wordsMorph[i] length] == [namesMorph[j] length]) && (!wordIsUppercase)) {
                    [myResults addObject: wordsMorph[i]];
                    NSLog(@"WORD: %@ ==> %@\n", wordsMorph[i], namesMorph[j]);
                };
            };
        };
        
        NSLog(@"myResults: %lu\n", [myResults count]);
        for (NSString *resultItem in myResults) {
            NSLog(@"Result: %@\n", resultItem);
        };
    }
    return 0;
}

I added a timer for the loop to see how long it was taking to complete the comparison of the two lists. I would be curious to see the times clocked for other peoples loops such that we can really get an idea of which method is the fastest.

Main.m

[code]// Read in files as large strings
NSString *nameList = [NSString stringWithContentsOfFile:@"/usr/share/dict/propernames"
encoding:NSUTF8StringEncoding
error:NULL];
NSString *wordList = [NSString stringWithContentsOfFile:@"/usr/share/dict/words"
encoding:NSUTF8StringEncoding
error:NULL];

// Break up lists into array of strings
NSArray *names = [nameList componentsSeparatedByString:@"\n"];
NSArray *words = [wordList componentsSeparatedByString:@"\n"];

int i = 0;
NSDate *startTime = [NSDate date];

// Go through array, 1 string at a time
for (NSString *name in names) {
if ([words containsObject:name.lowercaseString] && ![name isEqualToString:@""]) {
i++;
}
}

NSDate *endTime = [NSDate date];
double timeTaken = [endTime timeIntervalSinceDate:startTime];
NSLog(@“Total Number of Matches: %d.\nComparison completed in %.2f seconds.”, i, timeTaken);[/code]

Output

[quote]Total Number of Matches: 293.
Comparison completed in 11.90 seconds.[/quote]

I could not resist trying your code, but by using sets for comparison.

#import <Foundation/Foundation.h>

int main (int argc, const char * argv[]) {
    @autoreleasepool {
        
        //  Read in files as large strings
        NSString *nameList = [NSString stringWithContentsOfFile:@"/usr/share/dict/propernames"
                                                       encoding:NSUTF8StringEncoding
                                                          error:NULL];
        NSString *wordList = [NSString stringWithContentsOfFile:@"/usr/share/dict/words"
                                                       encoding:NSUTF8StringEncoding
                                                          error:NULL];
        
        //  Break up lists into array of strings
        NSArray *names = [nameList componentsSeparatedByString:@"\n"];
        NSArray *words = [wordList componentsSeparatedByString:@"\n"];
        
        int i = 0;
        NSDate * startTime = [NSDate date];
        
        NSSet * wordsSet = [NSSet setWithArray:words];
        NSSet * namesSet = [NSSet setWithArray:names];

        for (NSString * name in namesSet) {
            if ([wordsSet containsObject:name.lowercaseString] && ![name isEqualToString:@""]) {
                i += 1;
            }
        }

        NSDate * endTime = [NSDate date];
        double timeTaken = [endTime timeIntervalSinceDate:startTime];
        NSLog(@"Total Number of Matches: %d.\nComparison completed in %.2f seconds.", i, timeTaken);
    }
    return 0;
}

The output:

2015-05-31 14:58:08.836 WordsCount[65342] Total Number of Matches: 293.
Comparison completed in 0.02 seconds.

Orders of magnitude increase in performance!

WOW. I really didn’t see that happening… Do you have any idea why NSSet would speed things up? Or more importantly what made you think to use that? I am still trying to wrap my head around how that would speed things up so drastically.

EDIT: I think I may have answered my own question…

According to apple’s documentation via developer tools, it states that for NSSet, “You can use sets as an alternative to arrays when the order of elements isn’t important and performance in testing whether an object is contained in the set is a consideration—[color=#FF0000]while arrays are ordered, testing for membership is slower than with sets.[/color]”.

That is absolutely fascinating though. I just recreated your solution and am still in disbelief that there can be such a considerable difference in duration of the comparison! Thanks for enlightening me :stuck_out_tongue: