The creation and destruction of data

pre-dawn musings on the creation and destruction of data

One of the problems a systems developer encounters when designing a new system is the finite nature of storage space. Admittedly the amount of data it is possible to store is increasing every day but there is an upper limit on the amount of data it is possible to store most especially when data access speeds enter the equation.

Let's postulate a theoretical system which stores data for any person in the world. *sheepish grin*. The first step is to decide what data to destroy. Two obvious criteria spring to mind. Relevance and Age.

Let's examine 'relevance' first of all. Obviously if the data is not relevant to the task at hand it need not be stored. This has been the saving grace for the majority of current systems. Problems do occur if you store whatever is handed to the database. In that situation one starts to wonder if it is really necessary to store whatever a user enters. What if it is a seemingly random set of characters. Those characters might appear random to the administrator but they might be vitally important to the user. Take for example the word "barracuda". To the majority of people "barracuda" is a fish which is easily provoked. However to me it also represents a World War II plan for an assault on Naples which was cancelled. Most Sys Admins at this point group the data by 'The Person Entering The Data' and then they stick an upper limit for that user. Problem solved, that is until you understand the amount of data one user can create in a year. Whatever upper limit you set, by the time it's realistic for the user it's unrealistic for the system.

Trying to accommodate the end-users (cause that's the reason we're designing the system in the first place right?) we take a look at 'age'. Immediately we hit a brick wall. Just because data is old doesn't mean that it will not be needed or wanted sometime in the future!

Let's examine some more possibilities:
1. Usage
2. User moderation

Usage is a distinct possibility. If we monitor data access we pinpoint data in the system which is 'dead'. But still we need to set a limit on the system. Who says when dead is dead? There's has been many occurances of famous works and information being re-discovered years even centuries after they were thought lost. Famous example - AYBABTU.

User moderation is also a distinct possibility. If the users decide for themselves which information to keep and which to destroy then the there can be no problems or complaints when data is destroyed...the users will have voted it destroyed. There's alot more to examine on this train of thought.... issues such as how to call a vote and should all the users have the right to vote? Another method of user moderation is to have super-users who will edit the content of the database. These super-users have dictatorial powers so most current systems select super-users from dedicated users who have a demonstrated commitment to the system.

To sum up there are two extremes. Make all data transient or make all data permanent. Having all the data transient can be represented by a chat room or a conversation. Perhaps the other extreme can be represented by the internet? Somewhere inbetween is a system I want to design for users hopefully closer to the internet than to a chat room. Until then this is my bit of permanence.

A complete backup of the entire Internet	Life-cycle of an online community	Data Sheet Distractions	Charles Moose
spiderwebs for flying things	Fundamentals of Database Systems	web community	cross-linked list
M.A.D.	Olbers' Paradox	meta data	open database
data structure	data loss	creation	destruction
virtual communities	dictator	database