The fastest solution for the second challenge


#1

[code] // Read in a file as a huge string (ignoring the possibility of an error)
NSString *nameString =
[NSString stringWithContentsOfFile:@"/usr/share/dict/propernames"
encoding:NSUTF8StringEncoding
error:NULL];
// Break it into an array of strings
NSArray *names = [nameString componentsSeparatedByString:@"\n"];

    NSString *wordString =
    [NSString stringWithContentsOfFile:@"/usr/share/dict/words"
                              encoding:NSUTF8StringEncoding
                                 error:NULL];
    // Break it into an array of strings
    NSArray *words = [wordString componentsSeparatedByString:@"\n"];
    
    NSMutableSet *intersectionDict = [NSMutableSet setWithArray:words];
    [intersectionDict intersectSet:[NSSet setWithArray:names]];
    NSArray *intersectionArray = [intersectionDict allObjects];
    
    NSLog(@"%lu == %lu", [intersectionArray count], [names count]); // if equals then everything is correct[/code]

It runs 0.30s


#2

You are correct that NSSet is more efficient, but I don’t believe your approach works.

You are creating an NSMutableSet containing all the words and then finding the intersect with names. So you start with all words and remove everything that is not in the names list. However the words list is a superset of the names list containing all the names uppercased. Names that are also words have an additional entry lowercased eg:

Woody
woody

So when I run your solution, I get 1309 matches which is just the total number of lines in the names list.

However you can get a very fast solution without doing any set math simply because NSSet’s version of containsObject appears to be much more efficient than NSArray’s.

My solution is as follows:

[code]#import <Foundation/Foundation.h>

int main(int argc, const char * argv[]) {
@autoreleasepool {
NSLog(@“Starting…”);
// Read in a file as a huge string (ignoring the possibility of an error)
NSString *nameString = [NSString stringWithContentsOfFile:@"/usr/share/dict/propernames"
encoding:NSUTF8StringEncoding
error:NULL];
NSString *wordString = [NSString stringWithContentsOfFile:@"/usr/share/dict/words"
encoding:NSUTF8StringEncoding
error:NULL];
NSLog(@“Strings loaded.”);

    // Break the names string into an array.
    NSArray *names = [nameString componentsSeparatedByString:@"\n"];
    NSLog(@"Array created from names.");
    // Break the words string into an array, then load it into a set.
    NSMutableSet *words = [NSMutableSet setWithArray:[wordString componentsSeparatedByString:@"\n"]];
    // Remove the empty string to avoid an extra match.
    [words removeObject:@""];
    NSLog(@"Set created from words.");
    
    // Iterate through the names array via fast enumeration and ask if the lowercased name is a member of the set of words. (Remember that every uppercased name is a member of the set of words. You're only interested in ones that match if you lowercase them.)
    unsigned int total = 0;
    for (NSString *n in names) {
        if ([words containsObject:n.lowercaseString]) { // I used the dot syntax which is covered in a later chapter, but you could do the same thing as [n lowercaseString]
            NSLog(@"%@", n);
            total++;
        }
    }
    NSLog(@"There were %d total matches", total);
}
return 0;

}
[/code]

On my computer this takes 0.147 seconds to run, 0.119 seconds of which are creating the set of words from the array.

You can also do set math to remove all the uppercased names from the words set (using minusSet to remove names from words), then convert the names to lowercase and make a new set of lowercased names and use set math again to remove anything not in the lowercased names set from the words set (using intersectSet):

[code]#import <Foundation/Foundation.h>

int main(int argc, const char * argv[]) {
@autoreleasepool {
NSLog(@“Starting…”);
// Read in a file as a huge string (ignoring the possibility of an error)
NSString *nameString = [NSString stringWithContentsOfFile:@"/usr/share/dict/propernames"
encoding:NSUTF8StringEncoding
error:NULL];
NSString *wordString = [NSString stringWithContentsOfFile:@"/usr/share/dict/words"
encoding:NSUTF8StringEncoding
error:NULL];
NSLog(@“Strings loaded.”);

    // Break the names string into an array.
    NSSet *names = [NSSet setWithArray:[nameString componentsSeparatedByString:@"\n"]];
    NSLog(@"Set created from names.");
    // Break the words string into an array, then load it into a set.
    NSMutableSet *words = [NSMutableSet setWithArray:[wordString componentsSeparatedByString:@"\n"]];
    NSLog(@"Set created from words.");
    // Remove names (capitalized) from the set of words.
    [words minusSet:names];
    NSLog(@"Names removed from words.");
    
    // Generate a new set the same as names, but all lowercase.
    NSMutableSet *lowercaseNames = [NSMutableSet set];
    for (NSString *n in names) {
        [lowercaseNames addObject:n.lowercaseString];
    }
    
    // Remove any objects not in lowercaseNames from the words set.
    [words intersectSet:lowercaseNames];
    
    // Print all the words that are also names. Note that because this is a set and not an array it doesn't appear in order.
    for (NSString *w in words) {
        NSLog(@"%@", w);
    }
    
    NSLog(@"Done.");
}
return 0;

}[/code]

However I find this to be slower (0.291 seconds) as you have to iterate over all the names twice, once to convert each name to lowercase and once to print each name. Even if you skip the step of printing each name and are content with simply having a set containing words that are names, this is still slower (0.254 seconds).

Of course with the method above, you don’t leave this with a set of names that are also words. If you want to build that set, you can simply make a new NSMutableSet and add each match to the set as you go through the existing for loop. This does not seem to add a noticeable amount to the runtime.

Edit: By the way, my understanding is that the reason NSSet is faster is that NSSet creates a hash table for the set so when you compare two items in different sets you don’t actually have to compare the items themselves but can ask whether they have the same hash which is much quicker than comparing strings. The surprising part, I suppose, is that it is so efficient to compute the hashes for everything, but I guess this is a very common and very optimized operation and can potentially be done in parallel for each item rather than one at a time as you do when you compare the strings from one array to another one by one.