# CS50 Week 5: Data Structures

Arrays, linked lists, hash tables and tries.

Written by Eva Dee on (about a 7 minute read).

Data structures are more complex ways to organize data in memory, storing information in different layouts. We'll look at:

• arrays
• hash tables
• tries

• Arrays are contiguous blocks of memory
• Arrays aren't easily resizable

🤔 Garbage values aren't bad - you just don't know what they are.

Summary

• insertion is bad: a)in the case when the array is not big enough (O(n)); b)if the array is big enough and we have the index Omega of 1
• deletion is bad
• lookup is great (random access constant time)
• relatively easy to sort
• relatively small size-wise
• stuck with a fixed size, no flexibility

• C doesn't provide dynamic arrays
• With linked lists, we store the value and some metadata to go along with it (to help keep track of where the next element is)
• meaning that they take up more space than arrays
• values in linked lists also store a pointer to the next value in the list (the last element in the list will have a `NULL` pointer)
• the `.` for accessing a struct property and the `*` (go to the address of) operator are shorthanded into `->`
``n->next = NULL;//instead of(*n).next = 1;``
• The nodes can be anywhere in the computer memory - meaning the values are no longer next to one another.
``// because this is a self-referential data type// we need to give it a temp name to use it immediately// and then return the name we actually want to call ittypedef struct tempNode{    int number;    struct tempNode *next;}node;``

Your newly initialized variable will contain garbage values unless if you set it to `NULL`.

How do you create a linked list?

``// an empty linked listnode *list = NULL// We use sizeof(node) to get the right amount of memory to// allocate, and malloc returns a pointer that we save as nnode *n = malloc(sizeof(node))// We want to make sure malloc succeeded in getting memory for usif (n != NULL){// This is equivalent to (*n).number, where we first go to the// node pointed to by n, and then set the number property.// In C, we can also use this arrow notation    n->number = 1;// Then we need to make sure the pointer to the next node in our // list isn't a garbage value, but the new node won't point to// anything (for now)    n->next = NULL;}``

But steps involved are:

• dynamically allocate spare memory for a new node
• check to make sure we didn't run out of memory
• initialize the node's val field
• initialize the node's next field
• return a pointer to the newly created node

❗Always insert to the beginning of the list - you already have the pointer to the list's `HEAD`

One of the trickiest things with linked lists is figuring out the order of doing this - you certainly don't want to end up with an orphaned list! When inserting items, always make sure to point to the next item (i.e., previous head) first, before changing the `HEAD` of the list.

🤔 Every time you allocate memory with malloc, you need to check whether that value != NULL before doing stuff to it. Every time you stuff with pointers, you need to check for `NULL` as well

Tradeoff: we need to allocate twice as much memory for each element in order to spend less time adding values. We can't use binary search.

Summary

• insertion is easy (O(1))
• deletion is easy
• lookup is bad (O(n))
• relatively difficult to sort
• relatively small-size (not as small as arrays)

Trees are recursive data structures where each node points to two other nodes, one to the left (with a smaller value) and one to the right (with a larger value). Meaning each node has at most two children, or nodes it is pointing to.

``typedef struct node{    int number;    struct node *left;    struct node *right;}node;``

Tradeoff: incurred even a larger memory cost - since each node now needs space for a value and two pointers. It can be used as a binary search tree (as long as the data is sorted, of course).

• For insertion into a binary search tree, the running time is O(log n)
• that's because log of n is also the height of the tree.
• You might also need to put some work to keep the tree balanced and logarithmic in height

## Hash table permalink

• Is an array of linked lists
• A hash function is a function that takes as input some string, and it returns an output that is a non-negative integer
• it combines the random access ability of an array with the dynamism of a linked list
• hash tables are bad for ordering or sorting data
• a collision occurs when two pieces of data produce the same hash code (with linear probing, you just add number 1 to the hash code until you find a free space)
• instead, you could use chaining, which uses linked lists (instead of a single value)

Summary

• insertion is a two-step process (hash then add)
• deletion is easy
• lookup on average better than linked lists (O(n), in case of playing cards, dividing into 4 buckets n/4, still results in n)
• not great for sorting
• size-wise: bigger than linked list but smaller than a trie

• A trie is a tree made up of arrays (of pointers to other nodes).
• Shortname for retrieval 🤷‍♀️, also known as a prefix tree
• Tries combine structures and pointers together to store data in an interesting way
• Unlike with a hash table, there are no collisions, and no two pieces of data (unless they are identical) have the same path
• The data to be searched for in the tie is a roadmap - if you can follow the roadmap until the end, the data exists, otherwise it doesn't.

Summary

• insertion: complex (lots of dynamic memory allocation)
• deletion: easy (just free node)
• lookup: fast (almost as fast as arrays, O(1) - constant time)