Don’t Erase
So, I recently finished Dunbar’s “Grooming, Gossip, and the Evolution of Language”. Although I am not convinced about Nostratic, Dunbar gave more than enough material to convince me I was right about past/future tenses in Chinese and English. I think it is pretty obvious, and was probably common sense to people with access to Alexandria library.
I think it was Meister Eckert who once said “Only the hand that erases can write the true thing”. And Saint-Exupery said “Perfection is achieved, not when enough has been added, but when there is nothing more to remove”.
These philosophies of willful filing of gear teeth lead to a “snaggle-toothed thought machine” that “clicks and whirs with the imprecision of a cuckoo clock in hell.” (Vonnegut).
The proponents of ”document deletion^H^H^H retention policies” argue that e-mail takes up too much space. Let’s examine that claim.
Suppose that the average human can spew forth language at 200 words per minute, rarely listens, and spews language for 8 hours a day (speech, typing, or otherwise).
200 wpm
* 2 bytes
* 60 min
* 10 hours
* 365 days
* 80 years
/ 1024 bytes/kb
/ 1024 bytes/mb
/ 1024 bytes/gb
Every bit of language you emit over your entire lifetime would fit on less than 7GB of space. This is basically uncompressed. Add modest compression, some timestamping and geocoding, and your whole life still fits in less than 10GB. There is no excuse for EVER deleting anything that any human being ever says.
Imagine if linguists 1,000 years from today have access to a corpus containing every word spoken/written by every human for the previous millennium. Imagine how much easier it would be to track dialect migrations (ala Brothers Grimm), repetition of memes, and so on. Imagine if a person asking questions on Yahoo answers can have instant access to every answer given for the same question in the past 1,000 years, and the personal histories and outcomes of every one.
It would be a lot harder for Chomskyans to bluff and bluster and shout “IT IS ALL AN ACCIDENT; LANGUAGE MEANS NOTHING!”
September 18th, 2006 at 11:25 am
The storage goes up if you store everything the camera in your pantechnicon “sees”. But this might just mean that the numbers of GB won’t seem as large when the storage system is sort of “astral”.
When one’s libertarian streak balks at the committee of gossips, that’s because we like to pretend that there’s secrets inside our skin, which should, but may not have, bothered Bros. Dahmer/Gacey/Bundy.
The “Web of Trust” FOAF presages is only possible (and this seems strange) if barnraising.org accepts postings offering/requesting help from killthewhales.net.
Love.
September 18th, 2006 at 12:50 pm
This reminds me of a Microsoft Research Project reported in IEEE Spectrum. http://www.spectrum.ieee.org/nov05/2153
But, I think it takes more than 2 bytes to store a word, text, speech, or otherwise. Perhaps compression might help, but 2 bytes isn’t much.
September 18th, 2006 at 2:32 pm
The 2-bytes is based on the idea that there are only 8-11,000 words in common use in English language; maybe 22,000 in other languages. So the most naive “compression” (just a dictionary lookup with 30 bits; and more bytes for less common words) should do. But that assumes you have speech-to-text or are just using e-mail (recording keystrokes, for example). I guess that could balloon; but it seems like there have to be MUCH better compression schemes.
September 19th, 2006 at 2:42 pm
Tell that to my boss who has a 2GB mailbox. That’s for this year - he archives it every year.
I, sadly, am the Exchange administrator who has to deal with a 16GB database because he, and several others, have mailboxes that big. I’ve suggested introducing limits and surprise, surprise, was shot down. It probably doesn’t help that actually my full-time job is as a software developer.
There must be some enormous binaries in that mailbox.
September 20th, 2006 at 3:14 pm
Well, it’s also because every e-mail reply to a thread usually includes all the text from every previous reply. And exchange does all sorts of duplication, indexing, logging, etc. The first thing perhaps would be to introduce your boss to outlook thread compressor. My point is that the explosion of disc space is caused by the implementation, not the person. Hotmail and GMail provide 2GB per user (for free), but the average Exchange admin can’t do so even at great expense — obviously it’s not impossible since Hotmail does it. Really same goes for binaries, IMO — Word, PowerPoint, etc. are all gated by a human being’s ability to type (and read), so it’s absurd that they still take up so much space.
November 1st, 2006 at 3:35 pm
Have you read the Jim Gray paper on storage prediction? Its called ‘Long Term Storage Trends and You’ - updated Sept 2006. It also links quite nicely into the MyLifeBits project run by Gordon Bell at Microsoft Research. Theres’ some great technical papers and presentations.
http://research.microsoft.com/users/GBell/
Best Regards
Trevor