CS50 Week 5: Data Structures
Arrays, linked lists, hash tables and tries.
Data structures are more complex ways to organize data in memory, storing information in different layouts. We'll look at:
- arrays
- linked lists
- hash tables
- tries
Arrays permalink
- Arrays are contiguous blocks of memory
- Arrays aren't easily resizable
🤔 Garbage values aren't bad - you just don't know what they are.
Summary
- insertion is bad: a)in the case when the array is not big enough (O(n)); b)if the array is big enough and we have the index Omega of 1
- deletion is bad
- lookup is great (random access constant time)
- relatively easy to sort
- relatively small size-wise
- stuck with a fixed size, no flexibility
Linked lists permalink
- C doesn't provide dynamic arrays
- With linked lists, we store the value and some metadata to go along with it (to help keep track of where the next element is)
- meaning that they take up more space than arrays
- values in linked lists also store a pointer to the next value in the list (the last element in the list will have a
NULL
pointer) - the
.
for accessing a struct property and the*
(go to the address of) operator are shorthanded into->
n->next = NULL;
//instead of
(*n).next = 1;
- The nodes can be anywhere in the computer memory - meaning the values are no longer next to one another.
// because this is a self-referential data type
// we need to give it a temp name to use it immediately
// and then return the name we actually want to call it
typedef struct tempNode
{
int number;
struct tempNode *next;
}
node;
Your newly initialized variable will contain garbage values unless if you set it to NULL
.
How do you create a linked list?
// an empty linked list
node *list = NULL
// We use sizeof(node) to get the right amount of memory to
// allocate, and malloc returns a pointer that we save as n
node *n = malloc(sizeof(node))
// We want to make sure malloc succeeded in getting memory for us
if (n != NULL)
{
// This is equivalent to (*n).number, where we first go to the
// node pointed to by n, and then set the number property.
// In C, we can also use this arrow notation
n->number = 1;
// Then we need to make sure the pointer to the next node in our // list isn't a garbage value, but the new node won't point to
// anything (for now)
n->next = NULL;
}
But steps involved are:
- dynamically allocate spare memory for a new node
- check to make sure we didn't run out of memory
- initialize the node's val field
- initialize the node's next field
- return a pointer to the newly created node
❗Always insert to the beginning of the list - you already have the pointer to the list's HEAD
One of the trickiest things with linked lists is figuring out the order of doing this - you certainly don't want to end up with an orphaned list! When inserting items, always make sure to point to the next item (i.e., previous head) first, before changing the HEAD
of the list.
🤔 Every time you allocate memory with malloc, you need to check whether that value != NULL before doing stuff to it. Every time you stuff with pointers, you need to check for NULL
as well
Tradeoff: we need to allocate twice as much memory for each element in order to spend less time adding values. We can't use binary search.
Summary
- insertion is easy (O(1))
- deletion is easy
- lookup is bad (O(n))
- relatively difficult to sort
- relatively small-size (not as small as arrays)
Trees permalink
Trees are recursive data structures where each node points to two other nodes, one to the left (with a smaller value) and one to the right (with a larger value). Meaning each node has at most two children, or nodes it is pointing to.
typedef struct node
{
int number;
struct node *left;
struct node *right;
}
node;
Tradeoff: incurred even a larger memory cost - since each node now needs space for a value and two pointers. It can be used as a binary search tree (as long as the data is sorted, of course).
- For insertion into a binary search tree, the running time is O(log n)
- that's because log of n is also the height of the tree.
- You might also need to put some work to keep the tree balanced and logarithmic in height
Hash table permalink
- Is an array of linked lists
- A hash function is a function that takes as input some string, and it returns an output that is a non-negative integer
- it combines the random access ability of an array with the dynamism of a linked list
- hash tables are bad for ordering or sorting data
- a collision occurs when two pieces of data produce the same hash code (with linear probing, you just add number 1 to the hash code until you find a free space)
- instead, you could use chaining, which uses linked lists (instead of a single value)
Summary
- insertion is a two-step process (hash then add)
- deletion is easy
- lookup on average better than linked lists (O(n), in case of playing cards, dividing into 4 buckets n/4, still results in n)
- not great for sorting
- size-wise: bigger than linked list but smaller than a trie
Tries permalink
- A trie is a tree made up of arrays (of pointers to other nodes).
- Shortname for retrieval 🤷♀️, also known as a prefix tree
- Tries combine structures and pointers together to store data in an interesting way
- Unlike with a hash table, there are no collisions, and no two pieces of data (unless they are identical) have the same path
- The data to be searched for in the tie is a roadmap - if you can follow the roadmap until the end, the data exists, otherwise it doesn't.
Summary
- insertion: complex (lots of dynamic memory allocation)
- deletion: easy (just free node)
- lookup: fast (almost as fast as arrays, O(1) - constant time)
- already sorted
- size-wise: huge
Abstract data structures permalink
- Are higher-level constructs that use arrays, linked links, hash tables as building blocks.
- queue: works on FIFO (first-in-first-out) and uses enqueue and dequeue for adding and removing values (can be implemented as an array or a linked list)
- stack: work as LIFO (last-in-first-out), with pop and push functions
- dictionary: key-value pair (has a table or an array)
- a dictionary, where we can map keys to values, such as words, to their definitions. Implemented with a hash table or an array, as always, there's tradeoff between time and space.