CS50 Week 0: Scratch

On binary, encoding, algorithms and more.

Written by Eva Dee on 2nd of December 2021 (about a 5 minute read).

Note: I've published this post previously, but I'm rewatching the CS50 lectures again and updating the old notes as I go along.

On binary permalink

🤔 What does binary mean?

Binary is the numbering system that computers use in order to represent on and off. On is 1, and off is 0. It's a base 2 system.
Computers take in the electricity as the input and use that to harness and keep track of information. Store some electricity to represent 1, and let go of some to represent 0.
Modern computers use a million tiny switches called transistors that can be turned on and off.
A single digit (either a 0 or a 1) in binary is known as a binary digit. Or, simply, a bit.
so 110010 in binary, is 50 in decimal
in ASCII standard, letter A is represented by number 65, or 01000001 in binary
8 bits (8 digits) represent a byte, representing 256 different values
but to write accented letters, Asian characters, or emojis, you'll need to use 16, 24 or 32 bytes instead (there's where Unicode steps in)
Unary is a system where each digit represents a single value of one.

🤔 How about 8-bit vs 16-bit machines?

Different computers can process a different number of bits at a time.
An 8-bit machine breaks up and processes 8 bits at a time.
A 16-bit machine would break up and process 16 bits at a time. The number of bits that are processed at a time is known as a computer word, so we can think of bits as the “letters” that make up a computer word.
Most computers now have a word length of 32 or 64 bits.
In conclusion: this means that your machine passes around and processes 32 or 64 bits at a time. In other words, your computer processes binary strings that are 32 or 64 digits long!

Pixels are the dots, the squares on our screen. An image is a grid of pixels, every pixel is using 3 bytes (or 24 bits) one for each of rgb
The resolution is the number of pixels there are, horizontally and vertically, so a high-resolution image will have more pixels and require more bytes to be stored.
Videos are made up of many images, changing multiple times a second to give us the appearance of motion (flipbooks).
Music can be represented with bits, too, with mappings of numbers to notes and durations, or more complex mappings of bits to sound frequencies at each moment of time.
the different file extensions (.png, jpg, .gif) is just people agreeing how to store patterns of zeros and ones in a file.

An algorithm is a set of step-by-steps instructions for solving a problem.
input -> algorithms -> output
computer algorithms have to be both correct and precise
Pseudocode is a representation of an algorithm in a human language.
programming elements: functions, conditions, Boolean expressions, loops, variables, threads, events

🔗 Extra stuff researched by me mostly based on this article.

🤔 And what is encoding?

Encoding is a standardized way of translating between two things.
ASCII encoding is a set of rules that allows us to translate certain characters into decimal numbers.
ASCII can represent every character using a number between 32 and 127. Space is 32, the letter “A” is 65... These can be stored in 7 bits (meaning numbers 128 - 255 were up for grabs). Codes below 32 were for control characters.
In the ANSI standard, everybody agreed on what to do below 128, which was pretty much the same as ASCII, but there were lots of different ways to handle the characters from 128 and on up, depending on where you lived. These different systems were called code pages.

🤔 What is UNICODE?

UNICODE is a superset of ASCII
Unicode was an effort to create a single character set that included every reasonable writing system on the planet and some make-believe ones like Klingon, too.
In Unicode, a letter maps to something called a code point.
Every platonic letter in every alphabet is assigned a magic number by the Unicode consortium, which is written like this: U+0639. This magic number is called a code point.
Hello is an equivalent of U+0048 U+0065 U+006C U+006C U+006F.

🤔 What is UTF-8?

UTF-8 was another system for storing your string of Unicode code points in memory using 8-bit bytes.
In UTF-8, every code point from 0-127 is stored in a single byte.
Only code points 128, and above are stored using 2, 3, in fact, up to 6 bytes.
This has the neat side effect that English text looks exactly the same in UTF-8 as it did in ASCII.