Data Structures Review

Array
Stack/Queue
Linked List
Binary Search Tree and variations (AVL, …)
Priority Queue
Union find/Disjoint Set
Trie
Hash Table
Graph

Array

연속된 주소, 동일한 type

0	1	2	3	4
5	3	7	6	4

빠름!
주소가 정해지기 때문에 바로 찾아갈 수 있다.
Sorting이 되어있으면 binary search도 할 수 있음. O(log N)
개수를 잡아줘야 한다는 단점이 있음
in place에서 늘릴 수 없음. 크기를 늘리려면 다른 곳에서 다시 할당해줘야 하기 때문에 느려짐

Stack

Insert/Delete만 제공. Search는 안됨
LIFO: Last In, First Out
push와 pop 연산 제공. 둘 다 O(1)
수식 계산 등에 활용됨
스택 때문에 어려울 일들을 할 수 있다!

Queue

Insert/Delete만 제공 Search는 안됨
FIFO: First In, First Out
자료를 저장하는 기능을 할 수 있음
enqueue와 dequeue 연산 제공. 둘 다 O(1)
배열과 스택/큐의 차이점
- 배열은 실제 구현이 연속된 주소에 있음
- 배열: 실제 구현이 자료구조의 정의
- 스택/큐: 기능이 자료구조의 정의. 보통은 배열로 구현하는 경우가 많음

Linked List

Node들로 구성
Nodes are linked with pointers (what does that mean?)
Nodes are linked with pointers: data & pointer to the next node
Pointer simply stores an address: 나는 첫 번째 노드의 주소만 알면 됨!
search할 수 있지만 젤 처음부터 하나씩 봐야함 –> 느림
insert

Binary Search Tree

binary search를 할 수 있는 트리
linked list는 포인터가 하나지만 BST는 포인터를 두 개 가짐 (left, right)
부모 노드보다 왼쪽은 작고 오른쪽은 큼
Node contains key (key: sortable)
There is one root note (unless empty BST)
Root node may hvave two (left&right) child(ren) node(s)
Each child is the root of its own subtree
Each subtree is itself a BST (we call them left and right subtree(s))
The key values in left subtree are all smaller than the key in the root node (and vice versa)
걸리는 시간이 tree의 height에 비례함. 트리의 height는 log N

Heap Insert

BST처럼 왼/오가 있는게 아니라 양쪽이 다 크거나 작음
log N: log N 주엥서도 굉장히 빠른 log N

AVL Tree

Each node has two labels: L and R
L: height of left subtree, R: height of right subtree
Condition: L-R < 2; L-R = 0, -1, 1
Height bound?
- What we want to prove
- If there are n nodes in an AVL tree the height is O(log N)
balanced tree: O(log N)

2-3 Tree

Each node has 1 or 2 keys
Thus each node has 2 or 3 children
NOT allowed to have a single child

Heap을 쓰는 이유

AVL tree, 2-3 tree, RB tree 이런 것들은 insert/delete를 log N 시간에 할 수 있다. Heap도 log N 시간에 insert/delete할 수 있는데 heap은 delete를 정해진 수만 할 수 있다. 나머지 트리들도 무조건 왼쪽으로 내려가기만 하면 가장 작은 값을 찾을 수 있다. 즉, heap이 할 수 있는 일을 다 할 수 있는데도 왜 heap을 쓸까?

–> 실제로 돌려보면 heap이 10배 빠르다. heap은 그냥 배열에 넣어놓고 인덱스 하기도 하고.. O notation은 같지만 heap이 훨씬 훨씬 빨라 더 실용적이다.

Trie

Binary Trie (Radix idea 게통) 내부 구현을 사용하는 것

10진수 2진수 (I: insert, D: delete)

10진수	이진수
I 5	0101
I 7	0111
I 9	1001
D 8	1000
D 7	0111

내부 구조를 이해하고 있으면 직관하고 잘 맞아서 편하다. 왼쪽을 0, 오른쪽을 1이라 하면 왼오왼오 이렇게 내려갔다 하면 5인거임!
내부 구조를 모르면 안된다는 단점이 있음

Graph

node와 edge로 구성
Vertices are labelled to associate with particular computation.
Each edge can be viewed as the set of its two endpoints.

Simple Graphs

A simple graph G=(V, E) consists of a nonempty set V of vertices (or nodes) and a set E(possibly empty) of edges where each edge is a subset of V with cardinality 2 (an unordered pair).
Q: for a set V with n elements, how many possible edges there?
- nC2
Adjacenty list 또는 adjacency matrix로 표현할 수 있다.

adjacency list

	1	2	3
1	0	0	1
2	0
3	1
4

edge가 많을 때는 matrix를, 적을 때는 list를 많이 쓴다.
보통은 list를 많이 쓴다. matrix는 제곱에 해당하는 만큼의 공간이 필요하니까