Benchmarking different collections


#1

Hi all,

After reading this chapter I took the opportunity to re-visit the 2nd array challenge (The one with the names.) to see how it can be made to run faster.

Here was my original solution with what I knew at the time.

[code]#import <Foundation/Foundation.h>

int main (int argc, const char * argv[])
{

@autoreleasepool {
    
// Read in the strings. Forget error handling for now
    NSString *nameString = [NSString stringWithContentsOfFile:@"/usr/share/dict/propernames" 
                                                     encoding:NSUTF8StringEncoding
                                                        error:NULL];
    
    NSString *wordsString = [NSString stringWithContentsOfFile:@"/usr/share/dict/words"
                                                      encoding:NSUTF8StringEncoding
                                                         error:NULL];
    
    // Break the strings into arrays
    NSArray *names = [nameString componentsSeparatedByString:@"\n"];
    NSArray *words = [wordsString componentsSeparatedByString:@"\n"];
    
    // Compare our arrays of strings
    for (NSString *n in names) {
        
        for (NSString *w in words) {
            if ([n caseInsensitiveCompare:w] == NSOrderedSame) {
                NSLog(@"Found %@ and %@", n, w);
            }
        }
    
    }
}
return 0;

}[/code]

When this runs it takes around 35 seconds.

real 0m35.169s
user 0m31.380s
sys 0m0.380s

The next thing I tried was using NSSet objects in place of the arrays. Simply changing the NSArray to NSSet generates a compiler warning, but we will deal with that in a second. The new code is simpler.

[code]#import <Foundation/Foundation.h>

int main (int argc, const char * argv[])
{

@autoreleasepool {
    
// Read in the strings. Forget error handling for now
    NSString *nameString = [NSString stringWithContentsOfFile:@"/usr/share/dict/propernames" 
                                                     encoding:NSUTF8StringEncoding
                                                        error:NULL];
    
    NSString *wordsString = [NSString stringWithContentsOfFile:@"/usr/share/dict/words"
                                                      encoding:NSUTF8StringEncoding
                                                         error:NULL];
    
    // Break the strings into arrays
    NSSet *names = [nameString componentsSeparatedByString:@"\n"];
    NSSet *words = [wordsString componentsSeparatedByString:@"\n"];
    
    // Compare our arrays of strings
    for (NSString *n in names) {
        if ([words containsObject:n]) {
            NSLog(@"Found %@ and %@",n, n);
        }

// for (NSString *w in words) {
//
//
// // Comment out the array compare
//// if ([n caseInsensitiveCompare:w] == NSOrderedSame) {
//// NSLog(@“Found %@ and %@”, n, w);
//// }
// }

    }
}
return 0;

}[/code]

Even with the pointer mismatches this was substantially faster!
real 0m18.894s
user 0m15.997s
sys 0m0.281s

I adjusted names back to an NSArray and it got a bit faster again.
real 0m18.672s
user 0m15.957s
sys 0m0.272s

Now was time to cleanup the code though and hit the gas! I read about how to properly create a set from an array. This has two effects, one it makes the set cleanly, but it also uniques that data so that there are no dupes in it.

[code]#import <Foundation/Foundation.h>

int main (int argc, const char * argv[])
{

@autoreleasepool {
    
// Read in the strings. Forget error handling for now
    NSString *nameString = [NSString stringWithContentsOfFile:@"/usr/share/dict/propernames" 
                                                     encoding:NSUTF8StringEncoding
                                                        error:NULL];
    
    NSString *wordsString = [NSString stringWithContentsOfFile:@"/usr/share/dict/words"
                                                      encoding:NSUTF8StringEncoding
                                                         error:NULL];
    
    // Break the strings into arrays
    NSArray *names = [nameString componentsSeparatedByString:@"\n"];
    NSArray *wordsArray = [wordsString componentsSeparatedByString:@"\n"];
    NSSet *words = [NSSet setWithArray:wordsArray];
    
    // Compare our arrays of strings
    for (NSString *n in names) {
        if ([words containsObject:n]) {
            NSLog(@"Found %@ and %@",n, n);
        }

// for (NSString *w in words) {
//
//
// // Comment out the array compare
//// if ([n caseInsensitiveCompare:w] == NSOrderedSame) {
//// NSLog(@“Found %@ and %@”, n, w);
//// }
// }

    }
}
return 0;

}

[/code]

And boy howdy!
real 0m1.525s
user 0m0.531s
sys 0m0.156s

Next I’m going to look at specifying a predicate to filter the arrays by the first letter of the word. For that though I need to read up on NSRange it seems a bit.


#2

After reading the Collections Programming Guide in bed last night I decided to give another approach a try. Rather than enumerating the collection testing for membership or searching for strings NSMutableSet has a method intersectSet:. Using this you can quickly remove all the objects in a given mutable set that are already in another set.

Also for fun I skipped the enumeration all together and just printed out the description of the set.

[code]#import <Foundation/Foundation.h>

int main (int argc, const char * argv[])
{

@autoreleasepool {
    
// Read in the strings. Forget error handling for now
    NSString *nameString = [NSString stringWithContentsOfFile:@"/usr/share/dict/propernames" 
                                                     encoding:NSUTF8StringEncoding
                                                        error:NULL];
    
    NSString *wordsString = [NSString stringWithContentsOfFile:@"/usr/share/dict/words"
                                                      encoding:NSUTF8StringEncoding
                                                         error:NULL];
    
    // Break the strings into arrays and sets
    NSArray *namesArray = [nameString componentsSeparatedByString:@"\n"];
    NSArray *wordsArray = [wordsString componentsSeparatedByString:@"\n"];
    NSSet *names = [NSSet setWithArray:namesArray];
    NSMutableSet *words = [NSMutableSet setWithArray:wordsArray];
    
    // Create an intersect set
    [words intersectSet:names];
    
    NSLog(@"Names that are in the words file %@", words);
    
}

return 0;

}
[/code]

This runs really fast, although sets aren’t ordered.
real 0m0.695s
user 0m0.623s
sys 0m0.045s

If I change back to enumerating and printing the collection we move back to our slower speed.

[code]#import <Foundation/Foundation.h>

int main (int argc, const char * argv[])
{

@autoreleasepool {
    
// Read in the strings. Forget error handling for now
    NSString *nameString = [NSString stringWithContentsOfFile:@"/usr/share/dict/propernames" 
                                                     encoding:NSUTF8StringEncoding
                                                        error:NULL];
    
    NSString *wordsString = [NSString stringWithContentsOfFile:@"/usr/share/dict/words"
                                                      encoding:NSUTF8StringEncoding
                                                         error:NULL];
    
    // Break the strings into arrays and sets
    NSArray *namesArray = [nameString componentsSeparatedByString:@"\n"];
    NSArray *wordsArray = [wordsString componentsSeparatedByString:@"\n"];
    NSSet *names = [NSSet setWithArray:namesArray];
    NSMutableSet *words = [NSMutableSet setWithArray:wordsArray];
    
    // Create an intersect set
    [words intersectSet:names];

// NSLog(@“Names that are in the words file %@”, words);

    // Print line by line
    for (NSString *w in words) {
        NSLog(@" Found %@ and %@", w, w);
    }
    
}
return 0;

}[/code]

real 0m1.962s
user 0m0.790s
sys 0m0.173s

The lesson I get from this is clear. If you don’t need to enumerate through the sets then don’t do it!


#3

Very cool bit of benchmarking there!

Thanks for comparing those, and pointing us toward the Programming Guide; lots of great stuff in there. I am curious how your predicate experiment will turn out.

Cheers.


#4

Wow macshome… I really learned a lot by reading (and reconstructing) these tests.
Thanks for this post!