Monday, January 20, 2014

Research about Data Structures


Research Findings:
1.  A tuple is a finite ordered list of elements. In mathematics, an n-tuple is a sequence (or ordered list) of n elements, where n is a non-negative integer.
2.

Overview about Data Structure and Algorithms
A data structure is an arrangement of data in a computer's memory or even disk storage. An example of several common data structures are arrays, linked lists, queues, stacks, binary trees, and hash tables. Algorithms, on the other hand, are used to manipulate the data contained in these data structures as in searching and sorting.





Algorithms Types:
Insert a new data item
Search a specified item
Delete a specified item
Iterate through all item
Sorting
Recursion

Algorithms:
1.  Is it correct?
2.  What is the invariant ?
3.  What is the efficiency ?



Fibonacci


Use Cases:
Array vs Ordered Array:
Ordered arrays are therefore useful in situations in which searches are frequent, but insertions and deletions are not. An ordered array might be appropriate for a data- base of company employees, for example. Hiring new employees and laying off existing ones would probably be infrequent occurrences compared with accessing an existing employee’s record for information, or updating it to reflect changes in salary, address, and so on.

A retail store inventory, on the other hand, would not be a good candidate for an ordered array because the frequent insertions and deletions, as items arrived in the store and were sold, would run slowly.

Evaluate the performance of algorithms:
1. What you need is a comparison that tells how an algorithm’s speed is related to the number of items.
2. The time is proportional to the base 2 logarithm of N
3. When comparing algorithms, you don’t really care about the particular microprocessor chip or compiler; all you want to compare is how T changes for different values of N, not what the actual numbers are. Therefore, the constant isn’t needed.

Goal of Algorithms:
1.  To use the algorithms to solve real problems. We don't need to memory all the data structure and algorithms.
2.

Algorithms:
Bubble Sort - Swap close two items. In the first iteration, get tallest one on the right with n-1 swap, and then repeat the whole process. O(N2)
Selection Sort - Select the shortest, O(N2)
Insertion Sort - O(N2)

Understanding:
In many algorithms there are conditions that remain unchanged as the algorithm proceeds. These conditions are called invariants.

Stack:
Stack has pop, push, peek, empty method.

A stack can be used to check whether parentheses, braces, and brackets are balanced in a computer program source file. At the end of this chapter, we’ll see a stack playing a vital role in parsing (analyzing) arithmetic expressions such as 3*(4+5).

Most microprocessors use a stack-based architecture. When a method is called, its return address and arguments are pushed onto a stack, and when it returns, they’re popped off. The stack operations are built into the microprocessor.

Queue:
Queue is also used to model real-world situ- ations such as people waiting in line at a bank, airplanes waiting to take off, or data packets waiting to be transmitted over the Internet.

There’s a printer queue where print jobs wait for theprinter to be available. A queue also stores keystroke data as you type at the keyboard. This way, if you’re using a word processor but the computer is briefly doing something else when you hit a key, the keystroke won’t be lost; it waits in the queue until the word processor has time to read it. Using a queue guarantees the keystrokes stay in order until they can be processed.

Priority Queue:
In a preemptive multitasking operating system, for example, programs may be placed in a priority queue so the highest-priority program is the next one to receive a time-slice that allows it to execute.

Linked List:
Self-Referencial

Binary Search Tree:
1.  Use recursion and InOrder to traverse the tree.
2.  Prefix, Infix, Postfix notation.

Red Black Tree:
1.  In a red-black tree, balance is achieved during insertion
2.

Red Black Rules:
1. Every node is either red or black.
2. The root is always black.
3. If a node is red, its children must be black (although the converse isn’t necessarily true).
4. Every path from the root to a leaf, or to a null child, must contain the same number of black nodes.

2-3-4 Trees:
1. 2-3-4 Trees are Multiway Trees that can have up to four children and three data items per node, are balanced, multiway tree of order 4.
2. A leaf node, by contrast, has no children, but it can nevertheless contain one, two, or three data items. Empty nodes are not allowed. 2-node, 3-node, 4-node.
3. Trees have different rules, especially for Binary Tree, Red-Black Tree, and 2-3-4 Trees. The rules are compliant, when we insert the data nodes.
4. The 2-3-4 tree’s self-balancing capability results from the way new data items are inserted, as we’ll see in a moment.
5. In short, a non-leaf node must always have one more child than it has data items.

External Storage:
1. For this reason, and to simplify the drive control mechanism, data is stored on the disk in chunks called blocks, pages, allocation units, or some other name, depending on the system. We’ll call them blocks.

B Trees:
1. Strictly speaking, 2-3 trees and 2-3-4 trees are B-trees of order 3 and 4, respec- tively, but the term B-tree is often taken to mean many more children per node.


Hash Table:
1. If you don’t need to visit items in order, and you can predict in advance the size of your database, hash tables are unparalleled in speed and convenience.
2. This is an example of a hash function. It hashes (converts) a number in a large range into a number in a smaller range. This smaller range corresponds to the index numbers in an array. An array into which data is inserted using a hash function is called a hash table.
3.  In this array we expect that, on the average, there will be one word for every two cells. Some cells will have no words; and others, more than one.
4.  Open Address(Linear Probing, Quadratic Probing, Double Probing), filled sequence, clustering, rehashing,
5.  The more full the array is, the worse clustering becomes. It’s not usually a problem when the array is half full, and still not too bad when it’s two-thirds full. Beyond this, however, performance degrades seriously as the clusters grow larger and larger. For this reason it’s critical when designing a hash table to ensure that it never becomes more than half, or at the most two-thirds, full.

Challenges:
1.  There are many data structures and algorithms
2.  Need to understand their pros, cons, performance.
3.  Need to apply them and solve problems.

References:
http://www.idevelopment.info/data/Programming/data_structures/overview/Data_Structures_Algorithms_Introduction.shtml#Abstract Data Types
http://www.learnalgorithms.in/
https://hbfs.wordpress.com/2008/12/23/the-10-classes-of-algorithms-every-programmer-must-know-about/



No comments:

Post a Comment