Sixteen Volts: Time to sow, time to reap

In computer science, the term "data structure" refers to any technique of organizing data in a way that allows certain operations on that data to be faster compared to what they would be if the data was just kept in a single unorganized pile. For a real-world example, the names in a phone book are kept in alphabetical order instead of just any order to facilitate fast lookup based on the name. Since sorting the names needs to be done only once, the total time saved is greater compared to not sorting the names and spending more time in each lookup.

Sorted arrays and lists are perhaps the simplest nontrivial data structures, but there are many different data structures available, consisting of a more complex network of objects and the relationships between them than just a single linear array of elements. These structures differ in what operations the efficiently support. Some other operations may be inefficient (for example, looking up a number in an ordinary phone book instead of a name), but the idea is to always use a data structure that makes those operations efficient that you actually need in your program, and the operations that you don't need at all or need only rarely don't have to be fast.

Of course, there is always the cost of organizing and maintaining the structure with each operation, which you don't need to do if you don't care how your data is organized. But it usually pays off to do this extra work to save (a lot) more time elsewhere in the future. This is especially true if the organization work needs to be done just once in the beginning, as is the case with a simple printed phone book.

Dynamic sets are data structures that represent a collection of elements. As the program runs, it can add more elements to the set, remove existing elements and check whether the set currently contains a given element. Some well-known dynamic set data structures are binary trees and hash tables. There are countless variants available tuned for specific purposes, but in a very interesting fashion they all seem to have one thing in common: at least for all dynamic set data structures that I can think of right now, inserting a new element to the structure is both computationally cheaper and conceptually less complex (or at least, not more expensive or more complex) than removing an existing element from the structure, when all the reorganization work to make a structure to be legal after each operation is taken into consideration.

This asymmetry is quite astonishing when I think about it, and I kinda suspect that there is some interesting and fundamental information-theoretic aspect of the very reality itself behind it. The physical universe of course constantly erases massive quantities of information (which, as a side note, is one reason why your computer will generate heat as it computes stuff, unless we could achieve sufficiently reversible computation that doesn't erase information), but then again, the physical reality is not a data structure whose purpose is to allow efficient access to and operations on the data that it contains.

Posted by Ilkka Kokkarinen on Sunday, January 15, 2006 at 10:08 AM

Sixteen Volts

You bet your sweet ass I am a turtle

Time to sow, time to reap

Comments

Links to this post

Contact

Search

Previous posts

Archives

I enjoy reading

I laugh with

I like

I dislike

Buttons