, , , , , , , ,

Course: Harvard’s CS50

Problem Set 6: Mispellings

pset6 deals with a dictionary implementation. (Note: Yes, the title is misspelled.)

For this problem set you get to:

  • Take an introductory look at Version control systems.
  • Design and implement a Data Structure and use it as a dictionary.
    • You can use something like a Hash table or a Trie, the choice is yours.
  • Optimize your code as much as you can. Here’s some links to look into: [1][2][3][4].
    The highlights:

    • Identify operations that are ‘heavy’ and try to optimize them. ie. division and multiplication are CPU-expensive, you might be able to work around them.
    • Identify logical code structures that can be optimized. ie: if/else inside a loop is heavier than if/else that each contain a loop (but, uglier since you get duplicate code.)
    • Change the implementation of a data structure to use less memory.
    • Also keep in mind compiler optimization. Use code the compiler can optimize and compile with appropriate arguments.
    • The brave should look at Bit Twiddling Hacks.
    • You must find a good balance between optimal code and readable code. Readability must sometimes win.

I used the djb2 hash function from here, adapted for case insensitive subjects. And my data structure of choice was a Chained Hash Table.

dictionary.c (view | download)
dictionary.h (view | download)

Comments from dictionary.h:

/* Change ITEMSPERINDEX at will.
 * Ideally N ~= indexsize * ITEMSPERINDEX.
 *  Where: N is the number of dictionary words,
 *  indexsize is the array length (elements in hash table),
 *  ITEMSPERINDEX is the 'desired' length of the linked list at each hash table entry.
 * Tradeoff logic: (My instinct points to this, but is it actually valid? I dunno.)
 * If N is small (ie. in this case 143091 dictionary words) use a small ITEMSPERINDEX,
 * If N is big (millions, billions...) you _have_ to use a much larger ITEMSPERINDEX,
 *  meaning longer list length, in order to get a reasonable array length.

Build with:

gcc -ggdb -std=c99 -Wall -Werror -Wformat=0   -c -o speller.o speller.c
gcc -ggdb -std=c99 -Wall -Werror -Wformat=0   -c -o dictionary.o dictionary.c
gcc -ggdb -std=c99 -Wall -Werror -Wformat=0 -o speller speller.o dictionary.o

P.S. I have to confess I probably could do a lot more optimizations, but I stopped after a while; I didn’t have access to the big board to compete anyways, so…

P.S.2 If you’d like some additional insight in hash tables/functions look at CS61B lecture 22.