Data Structure
Data Structure#
Table of Contents
 Data Structure
 Introduction
 Array
 Linked List
 Stack
 Queue
 Hash
 Tree
 Desc
 Terminologies
 Why
 Application
 Hand Shaking Lemma
 Binary Tree
 Views
 Full Binary Tree
 Complete Binary Tree
 Perfect Binary Tree
 Degenerated / Pathological Tree
 Skew (Binary) Tree
 Left Skew (Binary) Tree
 Right Skew (Binary) Tree
 Binary Search Tree (BST)
 Self Balancing Binary Tree
 AVL Tree
 BTree
 RedBlack Tree
 B+ Tree
 Kary Tree
 Mway Search Tree
 Heap
 Binary Heap
 Min Heap
 Max Heap
 Trie
 Graph
 References
 TODO
Introduction#
 Data structure (is a way) or (are the special format/structures in computer memory) to store & organize the data.
 To suit a specific purpose.
 So, that we can perform some operations on them like search, add, delete, insert
Data Structure Vs Data Type#
Data Type#
 It is class of objects sharing certain properties.
 It is a premitive element of a particular language like C, C++, Java, Python, Go
 Have predefined characteristics according to the language
 Contains data/information without any semantic
 Can't be reduced anymore
 e.g. integer, float, character, string, boolean
Data Structure#
 It is an abstract description of organization of data & operations on them
 It uses data types
 Can be decomposed to smaller
 DS are data type as well, but not a premitive type
Abstract Data Type (ADT)#
 Its visualization, thought, logical description, or a picture of a new data type which we are going to create
 while DS is a real & concrete thing, we can directly use them
DS is a super set and Data type is sub set or building block for DS
Types#
Linear Data Structure#
Traverses the data elements sequentially.
Only one data element can directly be reached.
 Array
 Linked List
 Singly Linked List
 Circular Linked List
 Doubly Linked List
 Stack
 Queue
 Hash
 Matrix
Non Linear Data Structure#
Every data item is attached to several other data items with a relationships.
The data items are not arranged in a sequential structure.
 Tree
 Binary Tree
 Full Binary Tree
 Complete Binary Tree
 Perfect Binary Tree
 Binary Search Tree  BST
 Degenerated / Pathological Tree
 Skew Binary Tree
 Left Skew Tree
 Right Skew Tree
 Binary Search Tree  BST
 Self Balancing / Balanced / Height Balanced BST
 AVL Tree
 Btree
 RedBlack Tree
 B+ tree
 Splay Tree
 Treap
 Self Balancing / Balanced / Height Balanced BST
 Kary Tree
 Heap
 Binary Heap
 Bionomial Heap
 Fibbonacci Heap
 Leftist Heap
 Skew Heap
 Trie
 Misc
 Indexing with Trees
 Segment Tree
 Fenwick Tree
 Binary Indexed Tree (BIT)
 Binomial Queues
 Open Hash Tables (Closed Addressing)
 Closed Hash Tables (Open Addressing)
 Closed Hash Tables, using buckets
 Binary Tree
 Graph
 Undirected Graph
 Directed Graph (Digraph)
 Directed Acyclic Graph (DAG)
 Directed Cyclic Graph [having cycle(s)] (DCG)
 Weighted
 Weighted Undirected Graph
 Weighted Directed Graph (WITHOUT negative weights)
 Weighted DAG
 Weighted DCG
 Weighted Directed Graph (WITH negative weights)
 Weighted DAG
 Weighted DCG
 WITH Negative Weight Cycle
 WITHOUT Negative Weight Cycle
 Misc
 Finite Graph
 Infinite Graph
 Trivial Graph
 Simple Graph
 Multi Graph
 Null Graph (Empty Graph)
 Complete Graph
 Pseudo Graph
 Regular Graph
 Bipartite Graph
 Labelled Graph
 Connected & Disconnected Graph
 Cyclic Graph
 Vertex Labelled Graph
 Subgraph
 Rooted Graph
 Mixed Graph
 Path Graph
 Planar Graph
 Cycle Graph
 Petersen Graph
 Perfect Graph
 Cograph
 Chordal Graph
 Misc
 Disjointset Data Structures
 Suffix Arrays
Language Specific#
 Python
 list
 tuple
 set
 dict
 builtin dict
 collections.OrderedDict
 remember insertion order of keys
 collections.defaultdict
 return default values for missing keys
 collections.ChainMap
 search multiple dicts as a single mapping
 types.MappingProxyType
 A wrapper for making readonly dictionaries
 Java
 C#
 Go
Array#
Linked List#
Doubly Linked List#
Circular Linked List#
Stack#
LIFO
Queue#
FIFO
Simple Queue#
Circular Queue#
Dequeue#
(aka Doubly eneded queue)
Priority Queue#
Desc#
Application#
 job scheduling
 eventdriven simulation
Properties#
Operations#
 INSERT
 MAXIMUM
 EXTRACTMAX
 REPLACE
Implementation#
 Array/Set/List(LinkedList)
 Binary Heap
 Fibbonacci Heap
Hash#
Desc#
 Linear data structure (ADT) to make lookup fast
 uses a hashing functions, a slot list and, a data list
 time complexity (lookup):
 best: O(1)
 worst: O(n)
Terminologies#
 hash function
 folding method
 midsquared method
 perfect hash functions
 slot/bucket
 load factor
 collision
 resolution techniques (rehashing)
 open addressing
 linear probing
 quadratic probing
 disadvantages
 clustered data
 chaining
 open addressing
 resolution techniques (rehashing)
Why#
 to optimize the lookup speed after binary search and other varients
Dictionary#
 a general concept/data structure that maps keys to values
 many ways to implement dictionary
 hashtable; O(1)  O(N)
 redblack tree; always O(longN)
Hash Table#
 synchronized
 thread safe
 can be shared with many threads
 doesn't allows any null key or values
Hash Map#
 non synchronized
 nonthread safe
 can't be shared between many threads without proper synchronization code
 allows one null key and multiple null values
 preferred over hashtable
Tree#
Desc#
 Are hierchical data structure
 made up of nodes or vertices and edges without having any cycle
 The tree with no nodes is called the null or empty tree
Terminologies#
 Root: The top node in a tree
 Child: A node directly connected to another node when moving away from the Root
 Parent: The converse notion of a child
 Siblings: A group of nodes with the same parent
 Descendant: A node reachable by repeated proceeding from parent to child
 Ancestor: A node reachable by repeated proceeding from child to parent
 Leaf: (less commonly called External node) A node with no children
 Branch: Internal node, A node with at least one child
 Degree: The number of subtrees of a node
 Edge: The connection between one node and another
 Path: A sequence of nodes and edges connecting a node with a descendant
 Depth of node: The depth of a node is the number of edges from the tree's root node to the node
 Level of node: All nodes of depth d are at level d
 Height of node: The height of a node is the number of edges on the longest path between that node and a leaf
 Depth & Level of root node is zero (some may say 1 as well  no problem)
 Depth & Level are measured from top (root) to bottom (leaf)
 Height is measured from bottom (leaf) to top (root)
 Height of the leaf in last level is zero
 Depth of tree: The number of edges between root & deepest leaf + 1
 Level of tree: The number of levels in the tree (i.e. number of edges between root & deepest leaf + 1; i.e. same as Depth of tree)
 Height of tree: The height of a tree is the height of its root node using longest path
 Forest: A forest is a set of n \ge 0 disjoint trees
Why#
 need to store in hierchical way, e.g. computer filesystem
 provides quicker insertion/deletion than array but slower than unordered linked list
 No upper limit on number of nodes (like linkedlist & unlike array)
Application#
 Easy to search (using traversal)
 Router Algorithm
 decision making
Hand Shaking Lemma#
In an undirected graph, Number of vertex of odd degree are always even
e.g. Vertex of odd degree = Vertex connected to 3 edges.
Binary Tree#
Desc#
 tree whose each node have at most 2 children
 children typically known as left and right child
Representation#
 a node consist of data, pointer to left child, pointer to right child
Array Representation#
 Root at index 0
 Left child at index 2 \times i
 Right child at index 2 \times i + 1
 Parent at index i / 2
Note:
 The array should be filled with nil
value for nonexisting child nodes
 i.e. if traverse levelbylevel, then if there are any missing nodes, then the index of those nodes should be filled with nil
values
Properties#
 Level(Root) = 0
 Height of tree with only root node = 0
 Maximum number of nodes possible at level i is 2^{i}
 Maximum number of nodes possible in a binary tree of height h is 2^{h+1}1
 no. of Levels in a Tree = Height of the Tree + 1
 Minimum Possible Height of a tree having N nodes: h = \lfloor \log_2{(N+1)} \rfloor
 A binary tree with L leaves is of at least height h = \lceil \log_2{L} \rceil
 Number of leaves = Number of internal nodes having 2 children + 1
Operations#
Depth First Traversal#
 Types:
 In Order: left, root, right
 Pre Order: root, left, right
 Post Order: left, right, root
 Ways:
 Standard (Recursive)
 Desc:
 Approach: Recursive
 DS Used:
 Using Stack (Iterative, Which is same as Recursive one)
 Desc:
 Apprach: Iterative
 DS Used: Stack
 Without Recursion, Without Stack: Morris Traversal: Based on Threaded Binary Tree
 Desc:
 Apprach:
 DS Used:
 Standard (Recursive)
 Time Complexity: O(n)
 Auxilary Space:
 Avg: O(h) due to Recursive call stack; where h is max height of the tree
 Worst: When tree is skewed tree, i.e. h at last level = O(\log_2{n})
 DisAdvantage:
 Recursive solution
 Traversal starts from Leaf, unlike BFS.
Breadth First Traversal (Level Order Traversal)#
 Way:
 Using Level By Level Looping:
 Desc:
 Find out total level of the tree : Traverse each subtree + Compare level of left & right
 Loop from first level to last
 For each level print all the nodes:
 If root is empty, return; If level reached 1, print the data
 Make Recursive call for each child by decreasing level by 1
 Approach: Recursive
 DS Used: None
 Desc:
 Using Queue
 Desc:
 If given node is not none, visit & print given node,
 Then enqueue left & right child in the queue (if they are not none).
 Dequeue one node from the queue & make recursive call for that node.
 Approach: Recursive
 DS Used: Queue
 Desc:
 Using Level By Level Looping:
 Time Complexity: \theta{(n)}
 Auxilary Space:
 Avg: O(w), where w is max width of the tree
 Worst: When tree is Perfect tree, i.e. w at last level = O(\lceil {n/2} \rceil)
 Advantage:
 Non recursive solution
 Traversal starts from root, unlike DFS. So, better if our finding is closer to root.
Vertical Order Traversal#
When nodes are traversed in vertical lines.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 

TODO
Views#
Top View of a binary tree#
Top view of a binary tree is the set of nodes visible when the tree is viewed from the top.
Imagine a real Xmas tree, and view it from sky.
Note: Order of nodes doesn't matter.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 

Bottom View of a binary tree#
Bottom view of a binary tree is the set of nodes visible when the tree is viewed from the bottom.
Horizontal distance of node: The distance of a node from root, when measured horizontally (say in xaxis). And suppose root node lies on yaxis (i.e. x=0).
So, another definition of bottom view:
Set of bottommost nodes at their horizontal distance, i.e. For each horizontal distance unit, pick the bottom most node.
Note: If there are two bottom most nodes at same horizontal distance, then pick the last/right one.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 

Left View of a binary tree#
Left view of a binary tree is the set of nodes visible when the tree is viewed from the leftside.
Right View of a binary tree#
Right view of a binary tree is the set of nodes visible when the tree is viewed from the rightside.
Full Binary Tree#
Every node at a particular level has 0 or 2 children
Complete Binary Tree#
All level are completely filled, except possibly last level and last level has all keys as left as possible.
Practical e.g.: Binary Heap
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 

Properties#
 height of a complete binary tree (having N nodes) = \log_2{N}
Perfect Binary Tree#
All internal nodes have 2 children and all leaves are at same level.
Degenerated / Pathological Tree#
Every internal node has one child. Performancewise same as linked list.
Skew (Binary) Tree#
A tree with every node having one child only
Left Skew (Binary) Tree#
A tree with every node having one Left child only
Right Skew (Binary) Tree#
A tree with every node having one Right child only
Binary Search Tree (BST)#
Desc#
Operations#
 SEARCH, MINIMUM, MAXIMUM, PREDECESSOR, SUCCESSOR, INSERT, DELETE
Self Balancing Binary Tree#
(aka Balanced / Height Balanced BST)
To over come the downside of the skewed BST, (when its search performance degrades to linear time), various selfbalancing BST were proposed around 1962  1973.
 AVL Tree [1962]
 Btree [1970]
 RedBlack Tree [1972]
 B+ tree [197273]
 Splay Tree
 Treap
AVL Tree#
First selfbalancing BST to overcome the limitations of BST when its skewed.
Invented  By 

1962  Georgy AdelsonVelsky & Evgenii Landis 
Specification#
 While creating/updating a BST, if the heights of the two child subtree of a node differs by more than 1, then rebalance that node
 Rebalancing is done by performing single or doublestep rotation
 Note: Try to rebalance the BST as soon as possible.
Rotation#
LL (leftleft) Rotation#
1 2 3 4 5 

Here, the balance factor of the:
 node 3
 = height of left subtree  height of right tree
 = 0  0
 = 0
 node 4
 = height of left subtree  height of right tree
 = 1  0
 = 1
 node 5
 = height of left subtree  height of right tree
 = 2  0
 = 2
The balance factor of 5 is not in 1,0,1 (i.e. imbalanced by more than 1). Thus node 5 is imbalanced due to recent addition of node 3, to which we can reach by following "Left>Left" i.e. LL.
Thus node 5 is said to be LL imbalanced.
To fix that, rotate the tree around 5 such that, 4 becomes parent of 3 & 5.
(Assume putting a nail under node 5 and pulling 5 1step towards right)
i.e.
1 2 3 

Thus, it is called LLrotation.
RR (rightright) Rotation#
1 2 3 4 5 

Similarly this is called RR imbalance.
If we rotate the tree around node 5, such that node 7 become the parent of the 5 & 9. i.e.
1 2 3 

Then, this is called RR rotation.
LR (leftright) Rotation#
1 2 3 4 5 

Here, the balance factor of the:
 node 6
 = height of left subtree  height of right tree
 = 0  0
 = 0
 node 5
 = height of left subtree  height of right tree
 = 0  1
 = 1
 node 7
 = height of left subtree  height of right tree
 = 2  0
 = 2
The balance factor of 7 is not in 1,0,1 (i.e. imbalanced by more than 1). Thus node 7 is imbalanced due to recent addition of node 6, to which we can reach by following "Left>Right" i.e. LR.
Thus node 7 is said to be LR imbalanced.
To fix that, follow a 2step rotation:

swap the node 5 & 6 (doing so, we achieved LL imbalanced state; Note: we changed the side as well)
1 2 3 4 5
7 / 6 / 5

now follow the LL rotation (i.e. rotate the tree around 7 such that, 6 becomes parent of 5 & 7. i.e.
1 2 3
6 / \ 5 7
RL (rightleft) Rotation#
1 2 3 4 5 

Similarly this is called RL imbalance.
If we:

swap the node 7 and 6
1 2 3 4 5
5 \ 6 \ 7

rotate the tree around node 5, such that node 6 become the parent of the 5 & 7. i.e.
1 2 3
6 / \ 5 7
Then, this is called RL rotation.
The shorthand for this 2step rotation could be:
Replace Parent (node 5) with node L (node 6) and make the Parent (node 5) L child of node 6.
Exercise#
Q1.
1 2 3 4 5 

A.
1 2 3 4 5 

Q2.
1 2 3 4 5 

A.
1 2 3 4 5 

Q3.
1 2 3 4 5 6 7 8 9 

A.
Addition of 4 causes LLRL (no just call it LL) imbalance of node 7.
Now, just focus on node 3, 6, and 7 (as we do in LL rotation).
1 2 3 4 5 6 7 

Complexity#
Of  Avg  Worst 

Space  \Theta(n)  O(n) 
Search  \Theta(log_2{n})  O(log_2{n}) 
Insert  \Theta(log_2{n})  O(log_2{n}) 
Delete  \Theta(log_2{n})  O(log_2{n}) 
BTree#
(aka Balanced/Bayer/Boeing/Broad/Bushy Tree)
First selfbalancing mway Search Tree (ST) to overcome the limitations of mway ST when its skewed.
Invented  By 

1970  Rudolf Bayer & Edward M. McCreight while working at Boeing Research Labs 
Idea: While creating/updating a BST, if the heights of the two child subtree of a node differs by more than 1, then rebalance that node.
I see it as:
 generalization of the AVL tree
 self balanced version of mway search tree
Application#
 Indexing in Databases
 Filesystem
Specification#
 defined by order of the tree
 order = number of children
 size of key = order  1
 nodes should be half full
 root can have less than half order
 keys are in sorted order for sequential traversal
 a Btree of height h with all its nodes completely filled has n = m^{h+1}–1 entries
 minimum height of a btree: TODO
 maximum height of a btree: TODO
Complexity#
Of  Avg  Worst 

Space  \Theta(n)  O(n) 
Search  \Theta(log_2{n})  O(log_2{n}) 
Insert  \Theta(log_2{n})  O(log_2{n}) 
Delete  \Theta(log_2{n})  O(log_2{n}) 
RedBlack Tree#
B+ Tree#
Kary Tree#
Mway Search Tree#
Heap#
 Heap are nothing but a speacial form of tree
 The special condition is:
 the value of a node MUST be \ge (or \le ) value of its children (heap property)
 notice its different than BST
 the value of a node MUST be \ge (or \le ) value of its children (heap property)
 Another good to have condition is (not mandatory, but good for performance)
 all the levels should be fullfilled, except the possible last
 i.e. all the leaves should be in the level l or l1 (suppose l is height of the tree/root, for l \gt 0)
 i.e. the heap should be a complete binary tree
 reason:
 it will guarantee O(log_{2} N) to build operation
 it will guarantee O(log_{2} N) to insert operation
 because both depends on the height of the heap
 and height = log_{2} N, where N is size of the heap
Binary Heap#
A heap with atmost 2 children
Applications#
 Heap Sort: Heap Sort uses Binary Heap to sort an array in O(nLogn) time.
 Priority Queue: Priority queues can be efficiently implemented using Binary Heap because it supports insert(), delete() and extractmax(), decreaseKey() operations in O(logn) time. Binomoial Heap and Fibonacci Heap are variations of Binary Heap. These variations perform union also efficiently.
 Graph Algorithms: The priority queues are especially used in Graph Algorithms like Dijkstra’s Shortest Path and Prim’s Minimum Spanning Tree.
 Many problems likes
 K’th Largest Element in an array.
 Sort an almost sorted array/
 Merge K Sorted Arrays.
Min Heap#
A binary heap in which the value of a node MUST be \le value of its children
Operations#
 Insertion
 Top/Min
 DeleteTop / ExtractMin (Get & Delete) Top
 Replace
Max Heap#
A binary heap in which the value of a node MUST be \ge value of its children
Operations#
 Insertion
 Top/Max
 DeleteTop / ExtractMax (Get & Delete) Top
 Replace
Implementations#
Implementation  Insert  DeleteMax  ExtractMin 

Unordered Array  1  n  n 
Unordered List  1  n  n 
Ordered Array  n  1  1 
Ordered List  n  1  1 
Binary Search Tree  log n  log n  log n 
Balanced Binary Search Tree  log n  log n  log n 
Binary Heaps  log n  log n  1 
Operations#
Trie#
Graph#
G = (V, E)
Desc#
 a data structure describing pairwise relations between objects
 made up of nodes/vertices and edges; with or without having any cycle
 sometimes called undirected graph for distinguishing from a directed graph, or simple graph for distinguishing from a multigraph
Terminologies#
 Vertex (Node/Point): fundamental unit of the graph;
 Edge (Link/Line/Arc): The connection between one vertex and another
 Degree (of a Vertex):
𝛿(v)
in a graph is the number of edges incident to it  In Degree:
𝛿 (v)
number of incoming edges  Out Degree:
𝛿 +(v)
number of outgoing edges  Adjacent Vertex: vertices directly connected to the given vertex
 Adjacency Matrix:
Size: VxV
a matrix denoting (ordered/unordered) relations/edge between all the vertices  Adjacency List: a map of size
V
of list/linkedlist denoting vertices connected to a particular vertex  Isolated Vertex: is a vertex with degree zero
 Leaf Vertex (Pendant Vertex): is a vertex with degree one
 Source vertex: is a vertex with indegree zero
 Sink vertex: is a vertex with outdegree zero
 Simplicial vertex: is one whose neighbors form a clique: every two neighbors are adjacent
 Universal vertex: is a vertex that is adjacent to every other vertex in the graph
 Cut vertex: is a vertex the removal of which would disconnect the remaining graph
 Tree: A graph with no cycle; aka Acyclic Connected Graph
 Forest: is a disjoint set (nonoverlapping elements in each set) iof trees

Spanning Tree: A subset of a graph, such that it covers all the vertices of the graph but with minimal edges.
 Thus, it does NOT have any cycle
 It is NOT disconncted
 Can say, every connected & undirected graphs have atleast 1 spanning tree

Connected components: A connected component or simply component of an undirected graph is a subgraph in which each pair of nodes is connected with each other via a path
 Strongly connected components (SCC): In a directed graph if every vertex is reachable from every other vertex
Why#
 need to store network of objects / relations between objects
 e.g.
 a map of cities
 social n/w
 to describe such complicated thing in less space
 e.g.
Application#
 transport networks
 computer networks
 database relationships
 relationships between electronic components
Representation#
Adjacency Matrix#
A matrix (2d array) denoting if there is an edge (directed/undirected) between 2 vertices or not.
1 2 3 4 5 6 

 Size: V*V
 Advantages
 Some operations are efficient and easy
 Disadvantages
 More space
 could be wastage if the graph is sparse (not dense)
 More space
Adjacency List#
A array of list (or a map of list) denoting adjacent vetices of a particular vertex.
1 2 3 4 

 Size: E + V
 Advantages
 less space than adjacency matrix
 especially when the graph is not dense
 less space than adjacency matrix
 Disadvantages
 some operations are not efficient using adjacency list
 where lookup (whether there is an edge between 2 vertices or not) is required
 deleting a vertex
 deleting an edge
 some operations are not efficient using adjacency list
Adjacency Set#
TODO
Operations#
 Addition
 Add Node
 Add Edge
 Removal
 Remove Node
 Remove Edge
 Search
 Contains  check if the graph contains a given value
 HasEdge  check if there is an edge between 2 given vertices
 Traversal  traverse the graph & its nodes/edges in various fashion
Undirected Graph#
Directed Graph#
Directed Acyclic Graph (DAG)#
Directed Cyclic Graph (DCG)#
Weighted Graph#
Subgraph#
A graph whose edge & veritices are subset of the other graph.
Bipartite Graph#
A graph whose vertices could be divided/partitioned into 2 sets. Such that vertices of one set incidents an edge to verices on the other set.
Complete Graph#
A graph whose all the vertices are connected to each other. aka A graph whose all the edges are present.
1 

Sparse Graph#
A graph with relatively less edges.
1 

Dense Graph#
A graph with relatively less edges are missing.
References#
 https://gist.github.com/toransahu/bb1c9f1cd6490ff29c42fa229e827a2a
TODO#
 avl
 mway tree
 b tree
 b+ tree
 redblack tree
 trie