Cover Page

Scrivener Publishing
100 Cummings Center, Suite 541J
Beverly, MA 01915-6106

Publishers at Scrivener
Martin Scrivener (martin@scrivenerpublishing.com)
Phillip Carmical (pcarmical@scrivenerpublishing.com)

Data Structure and Algorithms Using C++

A Practical Implementation

Edited by

Sachi Nandan Mohanty

ICFAI Foundation For Higher Education, Hyderabad, India

and

Pabitra Kumar Tripathy

Kalam Institute of Technology, Berhampur, India

Wiley Logo

Preface

Welcome to the first edition of Data Structures Using and Algorithms C++. A data structure is the logical or mathematical arrangement of data in memory. To be effective, data has to be organized in a manner that adds to the efficiency of an algorithm and also describe the relationships between these data items and the operations that can be performed on these items. The choice of appropriate data structures and algorithms forms the fundamental step in the design of an efficient program. Thus, a deep understanding of data structure concepts is essential for students who wish to work on the design and implementation of system software written in C++, an object-oriented programming language that has gained popularity in both academia and industry. Therefore, this book was developed to provide comprehensive and logical coverage of data structures like stacks, queues, linked lists, trees and graphs, which makes it an excellent choice for learning data structures. The objective of the book is to introduce the concepts of data structures and apply these concepts in real-life problem solving. Most of the examples presented resulted from student interaction in the classroom. This book utilizes a systematic approach wherein the design of each of the data structures is followed by algorithms of different operations that can be performed on them and the analysis of these algorithms in terms of their running times.

This book was designed to serve as a textbook for undergraduate engineering students across all disciplines and postgraduate level courses in computer applications. Young researchers working on efficient data storage and related applications will also find it to be a helpful reference source to guide them in the newly established techniques of this rapidly growing research field.

Dr. Sachi Nandan Mohanty and Prof. Pabitra Kumar Tripathy
December 2020

1
Introduction to Data Structure

1.1 Definition and Use of Data Structure

Data structure is the representation of the logical relationship existing between individual elements of data. In other words the data structure is a way of organizing all data items that considers not only the elements stored but also their relationship to each other.

Data structure specifies

  • Organization of data
  • Accessing methods
  • Degree of associativity
  • Processing alternatives for information

The data structures are the building blocks of a program and hence the selection of a particular data structure stresses on

  • The data structures must be rich enough in structure to reflect the relationship existing between the data, and
  • The structure should be simple so that we can process data effectively whenever required.

In mathematically Algorithm + Data Structure = Program

Finally we can also define the data structure as the “Logical and mathematical model of a particular organization of data”

1.2 Types of Data Structure

Data structure can be broadly classified into two categories as Linear and Non-Linear

Schematic illustration of a tree diagram depicting the classification of the data structure.

Linear Data Structures

In linear data structures, values are arranged in linear fashion. Arrays, linked lists, stacks, and queues are the examples of linear data structures in which values are stored in a sequence.

Non-Linear Data Structure

This type is opposite to linear. The data values in this structure are not arranged in order. Tree, graph, table, and sets are the examples of nonlinear data structure.

Operations Performed in Data Structure

In data structure we can perform the operations like

  • Traversing
  • Insertion
  • Deletion
  • Merging
  • Sorting
  • Searching

1.3 Algorithm

The step by step procedure to solve a problem is known as the ALGORITHM. An algorithm is a well-organized, pre-arranged, and defined computational module that receives some values or set of values as input and provides a single or set of values as out put. These well-defined computational steps are arranged in sequence, which processes the given input into output.

An algorithm is said to be accurate and truthful only when it provides the exact wanted output.

The efficiency of an algorithm depends on the time and space complexities. The complexity of an algorithm is the function which gives the running time and/or space in terms of the input size.

Steps Required to Develop an Algorithm

  • Finding a method for solving a problem. Every step of an algorithm should be defined in a precise and in a clear manner. Pseudo code is also used to describe an algorithm.
  • The next step is to validate the algorithm. This step includes all the steps in our algorithm and should be done manually by giving the required input, perform the required steps including in our algorithm and should get the required amount of output in a finite amount of time.
  • Finally implement the algorithm in terms of programming language.

Mathematical Notations and Functions

  • Floor and Ceiling Functions
  • Floor function returns the greatest integer that does not exceed the number.
  • Ceiling function returns the least integer that is not less than the number.
    images
  • Remainder Function

    To find the remainder “mod” function is being used as

    images
  • To find the Integer and Absolute value of a number

    INT(5.34) = 5 This statement returns the integer part of the number

    INT(- 6.45) = 6 This statement returns the absolute as well as the integer portion of the number

  • Summation Symbol

    To add a series of number as a1+ a2 + a3 +............+ an the symbol Σ is used

    images
  • Factorial of a Number

    The product of the positive integers from 1 to n is known as the factorial of n and it is denoted as n!.

    images

Algorithemic Notations

While writing the algorithm the comments are provided with in [ ].

The assignment should use the symbol “: =” instead of “=”

For Input use Read : variable name

For output use write : message/variable name

The control structures can also be allowed to use inside an algorithm but their way of approaching will be some what different as

Simple If

	If condition, then:
		Statements
	[end of if structure]

If...else

	If condition, then: 	 
		Statements 	 
	Else : 	 
		Statements
	[end of if structure] 	 

If...else ladder

	If condition1, then:
		Statements
	Else If condition2, then:
		Statements
	Else If condition3, then:
		Statements
	…………………………………………
	…………………………………………
	…………………………………………
	Else If conditionN, then:
		Statements
	Else:
		Statements
	[end of if structure]

LOOPING CONSTRUCT

	Repeat for var = start_value to end_value by
step_value
		Statements
	[end of loop]

	Repeat while condition:
		Statements
	[end of loop]
		Ex : repeat for I = 1 to 10 by 2
			Write: i
			[end of loop]

OUTPUT

1 3 5 7 9

1.4 Complexity of an Algorithm

The complexity of programs can be judged by criteria such as whether it satisfies the original specification task, whether the code is readable. These factors affect the computing time and storage requirement of the program.

Space Complexity

The space complexity of a program is the amount of memory it needs to run to completion. The space needed by a program is the sum of the following components:

  • A fixed part that includes space for the code, space for simple variables and fixed size component variables, space for constants, etc.
  • A variable part that consists of the space needed by component variables whose size is dependent on the particular problem instance being solved, and the stack space used by recursive procedures.

Time Complexity

The time complexity of a program is the amount of computer time it needs to run to completion. The time complexity is of two types such as

  • Compilation time
  • Runtime

The amount of time taken by the compiler to compile an algorithm is known as compilation time. During compilation time it does not calculate for the executable statements, it calculates only the declaration statements and checks for any syntax and semantic errors.

The run time depends on the size of an algorithm. If the number of instructions in an algorithm is large, then the run time is also large, and if the number of instructions in an algorithm is small, then the time for executing the program is also small. The runtime is calculated for executable statements and not for declaration statements.

Suppose space is fixed for one algorithm then only run time will be considered for obtaining the complexity of algorithm, these are

  • Best case
  • Worst case
  • Average case

Best Case

Generally, most of the algorithms behave sometimes in best case. In this case, algorithm searches the element for the first time by itself.

For example: In linear search, if it finds the element for the first time by itself, then it behaves as the best case. Best case takes shortest time to execute, as it causes the algorithms to do the least amount of work.

Worst Case

In worst case, we find the element at the end or when searching of elements fails. This could involve comparing the key to each list value for a total of N comparisons.

For example in linear search suppose the element for which algorithm is searching is the last element of array or it is not available in array then algorithm behaves as worst case.

Average Case

Analyzing the average case behavior algorithm is a little bit complex than the best case and worst case. Here, we take the probability with a list of data. Average case of algorithm should be the average number of steps but since data can be at any place, so finding exact behavior of algorithm is difficult. As the volume of data increases, the average case of algorithm behaves like the worst case of algorithm.

1.5 Efficiency of an Algorithm

Efficiency of an algorithm can be determined by measuring the time, space, and amount of resources it uses for executing the program. The amount of time taken by an algorithm can be calculated by finding the number of steps the algorithm executes, while the space refers to the number of units it requires for memory storage.

1.6 Asymptotic Notations

The asymptotic notations are the symbols which are used to solve the different algorithms and the notations are

  • Big Oh Notation (O)
  • Little Oh Notation (o)
  • Omega Notation (Ω)
  • Theta Notation (θ)

Big Oh (O) Notation

This Notation gives the upper bound for a function to within a constant factor. We write f(n) = O(g(n)) if there are +ve constants n0 and C such that to the right of n0, the value of f(n) always lies on or below Cg(n)

Omega Notation (Ω)

This notation gives a lower bound for a function to with in a constant factor. We write f(n) = Ωg(n) if there are positive constants n0 and C such that to the right of n0 the value of f(n) always lies on or above Cg(n)

Theta Notation (θ)

This notation bounds the function to within constant factors. We say f(n) = θg(n) if there exists +ve constants n0, C1 and C2 such that to the right of n0 the value of f(n) always lies between c1g(n) and c2(g(n)) inclusive.

Little Oh Notation (o)

Introduction

An important question is: How efficient is an algorithm or piece of code? Efficiency covers lots of resources, including:

CPU (time) usage
Memory usage
Disk usage
Network usage

All are important but we will mostly talk about CPU time

Be careful to differentiate between:

Performance: how much time/memory/disk/... is actually used when a program is running. This depends on the machine, compiler, etc., as well as the code.

Complexity: how do the resource requirements of a program or algorithm scale, i.e., what happens as the size of the problem being solved gets larger. Complexity affects performance but not the other way around. The time required by a method is proportional to the number of “basic operations” that it performs. Here are some examples of basic operations:

one arithmetic operation (e.g., +, *).
one assignment
one test (e.g., x == 0)
one read
one write (of a primitive type)

Note: As an example,

O(1) refers to constant time.

O(n) indicates linear time;

O(nk) (k fixed) refers to polynomial time;

O(log n) is called logarithmic time;

O(2n) refers to exponential time, etc.

n2 + 3n + 4 is O(n2), since n2 + 3n + 4 < 2n2 for all n > 10. Strictly speaking, 3n + 4 is O(n2), too, but big-O notation is often misused to mean equal to rather than less than.

1.7 How to Determine Complexities

In general, how can you determine the running time of a piece of code? The answer is that it depends on what kinds of statements are used.

1. Sequence of statements

	statement 1;
	statement 2;
	...
	statement k;

Note: this is code that really is exactly k statements; this is not an unrolled loop like the N calls to addBefore shown above.) The total time is found by adding the times for all statements:

total time = time(statement 1) + time
(statement 2) + ... + time(statement k)

If each statement is “simple” (only involves basic operations) then the time for each statement is constant and the total time is also constant: O(1). In the following examples, assume the statements are simple unless noted otherwise.

2. if-then-else statements

	if (cond) {
		sequence of statements 1
	}
	else {
		sequence of statements 2
	}

Here, either sequence 1 will execute, or sequence 2 will execute. Therefore, the worst-case time is the slowest of the two possibilities: max(time(sequence 1), time(sequence 2)). For example, if sequence 1 is O(N) and sequence 2 is O(1) the worst-case time for the whole if-then-else statement would be O(N).

3. for loops

	for (i = 0; i < N; i++) {
		sequence of statements
	}

The loop executes N times, so the sequence of statements also executes N times. Since we assume the statements are O(1), the total time for the for loop is N * O(1), which is O(N) overall.

4. Nested loops

	for (i = 0; i < N; i++) {
		for (j = 0; j < M; j++) {
			sequence of statements
		}
	}

The outer loop executes N times. Every time the outer loop executes, the inner loop executes M times. As a result, the statements in the inner loop execute a total of N * M times. Thus, the complexity is O(N * M). In a common special case where the stopping condition of the inner loop is j < N instead of j < M (i.e., the inner loop also executes N times), the total complexity for the two loops is O(N2).

5. Statements with method calls:

When a statement involves a method call, the complexity of the statement includes the complexity of the method call. Assume that you know that method f takes constant time, and that method g takes time proportional to (linear in) the value of its parameter k. Then the statements below have the time complexities indicated.

	f(k); // O(1)
	g(k); // O(k)

When a loop is involved, the same rule applies. For example:

	for (j = 0; j < N; j++) g(N);

has complexity (N2). The loop executes N times and each method call g(N) is complexity O(N).

Examples

Q1. What is the worst-case complexity of the each of the following code fragments?

Two loops in a row:

	for (i = 0; i < N; i++) {
		sequence of statements
	}
	for (j = 0; j < M; j++) {
		sequence of statements
	}

Answer: The first loop is O(N) and the second loop is O(M). Since you do not know which is bigger, you say this is O(N+M). This can also be written as O(max(N,M)). In the case where the second loop goes to N instead of M the complexity is O(N). You can see this from either expression above. O(N+M) becomes O(2N) and when you drop the constant it is O(N). O(max(N,M)) becomes O(max(N,N)) which is O(N).

Q2. How would the complexity change if the second loop went to N instead of M?

A nested loop followed by a non-nested loop:

	for (i = 0; i < N; i++) {
		for (j = 0; j < N; j++) {
			sequence of statements
		}
	}
	for (k = 0; k < N; k++) {
		sequence of statements
	}

Answer: The first set of nested loops is O(N2) and the second loop is O(N). This is O(max(N2,N)) which is O(N2).

Q3. A nested loop in which the number of times the inner loop executes depends on the value of the outer loop index:

	for (i = 0; i < N; i++) {
		for (j = i; j < N; j++) {
			sequence of statements
		}
	}

Answer: When i is 0 the inner loop executes N times. When i is 1 the inner loop executes N-1 times. In the last iteration of the outer loop when i is N-1 the inner loop executes 1 time. The number of times the inner loop statements execute is N + N-1 + ... + 2 + 1. This sum is N(N+1)/2 and gives O(N2).

Q4. For each of the following loops with a method call, determine the overall complexity. As above, assume that method f takes constant time, and that method g takes time linear in the value of its parameter.

a. for (j = 0; j < N; j++) f(j);
b. for (j = 0; j < N; j++) g(j);
c. for (j = 0; j < N; j++) g(k);

Answer: a. Each call to f(j) is O(1). The loop executes N times so it is N x O(1) or O(N).

b. The first time the loop executes j is 0 and g(0) takes “no operations.” The next time j is 1 and g(1) takes 1 operations. The last time the loop executes j is N-1 and g(N-1) takes N-1 operations. The total work is the sum of the first N-1 numbers and is O(N2).

c. Each time through the loop g(k) takes k operations and the loop executes N times. Since you do not know the relative size of k and N, the overall complexity is O(N x k).

1.8 Questions

  1. What is data structure?
  2. What are the types of operations that can be performed with data structure?
  3. What is asymptotic notation and why is this used?
  4. What is complexity and its type?
  5. Find the complexity of 3n2 + 5n.
  6. Distinguish between linear and non-linear data structure.
  7. Is it necessary is use data structure in every field? Justify your answer.