CS50 Week 5: Data Structures

Arrays, linked lists, hash tables and tries.

Written by Eva Dee on 13th of December 2021 (about a 7 minute read).

🔗 Course Website

Data structures are more complex ways to organize data in memory, storing information in different layouts. We'll look at:

arrays
linked lists
hash tables
tries

Arrays permalink

Arrays are contiguous blocks of memory
Arrays aren't easily resizable

🤔 Garbage values aren't bad - you just don't know what they are.

Summary

insertion is bad: a)in the case when the array is not big enough (O(n)); b)if the array is big enough and we have the index Omega of 1
deletion is bad
lookup is great (random access constant time)
relatively easy to sort
relatively small size-wise
stuck with a fixed size, no flexibility

Linked lists permalink

C doesn't provide dynamic arrays
With linked lists, we store the value and some metadata to go along with it (to help keep track of where the next element is)
meaning that they take up more space than arrays
values in linked lists also store a pointer to the next value in the list (the last element in the list will have a NULL pointer)
the . for accessing a struct property and the * (go to the address of) operator are shorthanded into ->

n->next = NULL;

//instead of
(*n).next = 1;

The nodes can be anywhere in the computer memory - meaning the values are no longer next to one another.

// because this is a self-referential data type
// we need to give it a temp name to use it immediately
// and then return the name we actually want to call it
typedef struct tempNode
{
    int number;
    struct tempNode *next;
}
node;

Your newly initialized variable will contain garbage values unless if you set it to NULL.

How do you create a linked list?

// an empty linked list
node *list = NULL

// We use sizeof(node) to get the right amount of memory to
// allocate, and malloc returns a pointer that we save as n
node *n = malloc(sizeof(node))

// We want to make sure malloc succeeded in getting memory for us
if (n != NULL)
{

// This is equivalent to (*n).number, where we first go to the
// node pointed to by n, and then set the number property.
// In C, we can also use this arrow notation
    n->number = 1;

// Then we need to make sure the pointer to the next node in our // list isn't a garbage value, but the new node won't point to
// anything (for now)
    n->next = NULL;
}

But steps involved are:

dynamically allocate spare memory for a new node
check to make sure we didn't run out of memory
initialize the node's val field
initialize the node's next field
return a pointer to the newly created node

❗Always insert to the beginning of the list - you already have the pointer to the list's HEAD

One of the trickiest things with linked lists is figuring out the order of doing this - you certainly don't want to end up with an orphaned list! When inserting items, always make sure to point to the next item (i.e., previous head) first, before changing the HEAD of the list.

🤔 Every time you allocate memory with malloc, you need to check whether that value != NULL before doing stuff to it. Every time you stuff with pointers, you need to check for NULL as well

Tradeoff: we need to allocate twice as much memory for each element in order to spend less time adding values. We can't use binary search.

Summary

insertion is easy (O(1))
deletion is easy
lookup is bad (O(n))
relatively difficult to sort
relatively small-size (not as small as arrays)

Trees permalink

Trees are recursive data structures where each node points to two other nodes, one to the left (with a smaller value) and one to the right (with a larger value). Meaning each node has at most two children, or nodes it is pointing to.

typedef struct node
{
    int number;
    struct node *left;
    struct node *right;
}
node;

Tradeoff: incurred even a larger memory cost - since each node now needs space for a value and two pointers. It can be used as a binary search tree (as long as the data is sorted, of course).

For insertion into a binary search tree, the running time is O(log n)
that's because log of n is also the height of the tree.
You might also need to put some work to keep the tree balanced and logarithmic in height

Hash table permalink

Is an array of linked lists
A hash function is a function that takes as input some string, and it returns an output that is a non-negative integer
it combines the random access ability of an array with the dynamism of a linked list
hash tables are bad for ordering or sorting data
a collision occurs when two pieces of data produce the same hash code (with linear probing, you just add number 1 to the hash code until you find a free space)
instead, you could use chaining, which uses linked lists (instead of a single value)

Summary

insertion is a two-step process (hash then add)
deletion is easy
lookup on average better than linked lists (O(n), in case of playing cards, dividing into 4 buckets n/4, still results in n)
not great for sorting
size-wise: bigger than linked list but smaller than a trie

Tries permalink

A trie is a tree made up of arrays (of pointers to other nodes).
Shortname for retrieval 🤷‍♀️, also known as a prefix tree
Tries combine structures and pointers together to store data in an interesting way
Unlike with a hash table, there are no collisions, and no two pieces of data (unless they are identical) have the same path
The data to be searched for in the tie is a roadmap - if you can follow the roadmap until the end, the data exists, otherwise it doesn't.

Summary

insertion: complex (lots of dynamic memory allocation)
deletion: easy (just free node)
lookup: fast (almost as fast as arrays, O(1) - constant time)
already sorted
size-wise: huge

Abstract data structures permalink

Are higher-level constructs that use arrays, linked links, hash tables as building blocks.
queue: works on FIFO (first-in-first-out) and uses enqueue and dequeue for adding and removing values (can be implemented as an array or a linked list)
stack: work as LIFO (last-in-first-out), with pop and push functions
dictionary: key-value pair (has a table or an array)
a dictionary, where we can map keys to values, such as words, to their definitions. Implemented with a hash table or an array, as always, there's tradeoff between time and space.