Sat, Jul 13, 2024

4 min read

Essential Data Structures and Algorithms for Efficient Problem Solving

A formal discussion on the importance of deep knowledge in select data structures and algorithms.

Introduction

In my view, it's unnecessary to know every algorithm and data structure, contrary to common belief. Instead, deep understanding of a select few is crucial. The rest you can learn when you need them.

So, what are the most important algorithms and data structures to know? I would say that the most important ones in a set SS are the ones that are essential to solve the most common problems in computer science and software engineering. Let's formalize this idea.

Let S={Data Structures and Algorithms}Let \ S = \{ \text{Data Structures and Algorithms} \}

You can think of SS as a set containing all data structures and algorithms. Now, we need to define a subset EE containing the essential structures and algorithms.

Let E={Essential DS and Algorithms}Let \ E = \{ \text{Essential DS and Algorithms} \}

For example, an essential subset of SS could be EE containing the following structures and algorithms:

E={AVL,BST,DAG,BFS,DFS,HT}E = \{ \text{AVL}, \text{BST}, \text{DAG}, \text{BFS}, \text{DFS}, \text{HT} \}

And, to use this subset, we need to define a set PP containing common problems in computer science and software engineering.

Let P={Common Problems}Let \ P = \{ \text{Common Problems} \}

For each problem pPp \in P, there exists a data structure or algorithm eEe \in E that solves pp efficiently.

pP,eEe solves p efficiently\forall p \in P, \exists e \in E \quad \vert \quad e \text{ solves } p \text{ efficiently}

Please don't place excessive importance on this notation, as it serves merely as an example. It is evident that some problems require solutions through data structures and algorithms not encompassed within EE.

I will define what I mean by "efficiently" in this context. The efficiency is measured by the time TT and space SS needed to solve pp using ee. We define that ee is efficient if:

T(e,p)=O(log n) or T(e,p)=O(n)S(e,p)=O(n)\begin{aligned} &T(e, p) = \mathcal{O}(log \ n) \text{ or } T(e, p) = \mathcal{O}(n) \\ &S(e, p) = \mathcal{O}(n) \end{aligned}

Where nn denotes the input size.

Note that O(n)\mathcal{O}(n) is the worst-case time complexity, what is acceptable for most problems, since the average cases are better. This definition is not strict, but it provides a general idea of what I mean by "efficiently".

Typically, EE includes structures and algorithms efficient for common problems. For example, in the best and average cases, these structures and algorithms have a time complexity of O(log n)\mathcal{O}(log \ n) and a space complexity of O(n)\mathcal{O}(n). In the worst case, they have a time complexity of O(n)\mathcal{O}(n) and a space complexity of O(n)\mathcal{O}(n).

DS / AInsertionSearchingDeletionTraversal
AVL TreeΘ(log n)/O(log n)\mathcal{\Theta}(log \ n) \quad / \quad \mathcal{O}(log \ n)Θ(log n)/O(log n)\mathcal{\Theta}(log \ n) \quad / \quad \mathcal{O}(log \ n)Θ(log n)/O(log n)\mathcal{\Theta}(log \ n) \quad / \quad \mathcal{O}(log \ n)O(n)\mathcal{O}(n)
BSTΘ(log n)/O(n)\mathcal{\Theta}(log \ n) \quad / \quad \mathcal{O}(n)Θ(log n)/O(n)\mathcal{\Theta}(log \ n) \quad / \quad \mathcal{O}(n)Θ(log n)/O(n)\mathcal{\Theta}(log \ n) \quad / \quad \mathcal{O}(n)O(n)\mathcal{O}(n)
DAGO(1)\mathcal{O}(1)O(V+E)\mathcal{O}(V + E)O(V+E)\mathcal{O}(V + E)O(V+E)\mathcal{O}(V + E)
BFS-O(V+E)\mathcal{O}(V + E)-O(V+E)\mathcal{O}(V + E)
DFS-O(V+E)\mathcal{O}(V + E)-O(V+E)\mathcal{O}(V + E)
Hash TableO(1)/O(n)\mathcal{O}(1) \quad / \quad \mathcal{O}(n)O(1)/O(n)\mathcal{O}(1) \quad / \quad \mathcal{O}(n)O(1)/O(n)\mathcal{O}(1) \quad / \quad \mathcal{O}(n)-

Therefore, focusing on the essential subset EE provides enough coverage to solve most common problems efficiently, eliminating the need to know all data structures and algorithms in SS.

References

  • Cormen, T. H., Leiserson, C. E., Rivest, R. L., & Stein, C. (2009). Introduction to Algorithms (3rd ed.). MIT Press.