herme3 Posted July 4, 2007 Share Posted July 4, 2007 I don't really understand the problem described in this article: http://news.bbc.co.uk/2/hi/technology/6265976.stm They seem to be saying that the contents of a document will be lost if nobody is able to find a copy of the software that was used to create the document. In all word processing programs I'm aware of, the text itself is stored as plain text within the file unless you encrypt it. You can open an old Word document in Notepad and still retrieve the text. You would have to set the font and style information again, but why is it such a large problem? Link to comment Share on other sites More sharing options...
insane_alien Posted July 4, 2007 Share Posted July 4, 2007 how hard could it be to right up a script to convert them all to a different format (say .odt)? all you would have to do is run that every time there is a major format change and you are still capable of accessing the old files. Link to comment Share on other sites More sharing options...
Dak Posted July 4, 2007 Share Posted July 4, 2007 or, if worse comes to worst(?), just keep several computers knocking about, with copies of archaic OSs and word-processors on them? Link to comment Share on other sites More sharing options...
insane_alien Posted July 4, 2007 Share Posted July 4, 2007 or, if worse comes to worst(?), just keep several computers knocking about, with copies of archaic OSs and word-processors on them? or a virtual machine. less hardware involved. Link to comment Share on other sites More sharing options...
Cap'n Refsmmat Posted July 4, 2007 Share Posted July 4, 2007 They seem to be saying that the contents of a document will be lost if nobody is able to find a copy of the software that was used to create the document. In all word processing programs I'm aware of, the text itself is stored as plain text within the file unless you encrypt it. You can open an old Word document in Notepad and still retrieve the text. You would have to set the font and style information again, but why is it such a large problem? A lot of the time it's a lot harder than that. Suppose you have government records stored in Word files and you can't use Office. How do you retrieve that many files? It would be a monumental effort to apply the correct formatting to hundreds of documents, and if things like "Track Changes" have been done, it would be even harder. Picking a format you know you'll be able to read fifty years from now, even if it takes writing a piece of software for your shiny new quantum computers, is a lot nicer. Store files in the OpenDocument format and even if all of the OpenOffice and OASIS people are dead and the software long-gone, you'll be able to dig up a copy of the standard and write a program to extract all of the text. Link to comment Share on other sites More sharing options...
Cap'n Refsmmat Posted July 4, 2007 Share Posted July 4, 2007 how hard could it be to right up a script to convert them all to a different format (say .odt)? all you would have to do is run that every time there is a major format change and you are still capable of accessing the old files. If you have hundreds of terabytes of data, like the article describes, it's not that easy. And this is a government agency, too, so inefficiency is at work. Link to comment Share on other sites More sharing options...
insane_alien Posted July 4, 2007 Share Posted July 4, 2007 ahh, so they'll go for the method of getting people to write it down on paper then type it back into the computer. Link to comment Share on other sites More sharing options...
Cap'n Refsmmat Posted July 4, 2007 Share Posted July 4, 2007 Right. Link to comment Share on other sites More sharing options...
doG Posted July 4, 2007 Share Posted July 4, 2007 If they're only worried about preserving the information itself they should just archive it as plain old ascii text. It's not like vi is going to become unavailable. Link to comment Share on other sites More sharing options...
D H Posted July 4, 2007 Share Posted July 4, 2007 I don't really understand the problem described in this article: http://news.bbc.co.uk/2/hi/technology/6265976.stm They seem to be saying that the contents of a document will be lost if nobody is able to find a copy of the software that was used to create the document. In all word processing programs I'm aware of, the text itself is stored as plain text within the file unless you encrypt it. You can open an old Word document in Notepad and still retrieve the text. You would have to set the font and style information again, but why is it such a large problem? Try opening one of these in Notepad and you might see what the problem is. Note: The visible contents of the two files are identical. http://www.caam.rice.edu/~caam452/CAAM452Lecture4b.ppt http://www.caam.rice.edu/~caam452/CAAM452Lecture4b.ppf Link to comment Share on other sites More sharing options...
herme3 Posted July 5, 2007 Author Share Posted July 5, 2007 A lot of the time it's a lot harder than that. Suppose you have government records stored in Word files and you can't use Office. How do you retrieve that many files? It would be a monumental effort to apply the correct formatting to hundreds of documents, and if things like "Track Changes" have been done, it would be even harder. Yes, reformatting a large number of documents might be difficult. However, I wouldn't consider any knowledge to be lost if the text itself is kept in plain text. You could just scroll past the unreadable code, and copy-paste the plain text contained within the document. Picking a format you know you'll be able to read fifty years from now, even if it takes writing a piece of software for your shiny new quantum computers, is a lot nicer. Store files in the OpenDocument format and even if all of the OpenOffice and OASIS people are dead and the software long-gone, you'll be able to dig up a copy of the standard and write a program to extract all of the text. This is very strange. I just compared the raw text of a Word document and an OpenOffice document in NotePad. The Word document had a little formatting data at the top, but the rest of the text was easy to read. In the OpenOffice document, the whole document appears to be written in unreadable code. Why is the document encrypted if I didn't choose to encrypt it? Link to comment Share on other sites More sharing options...
Cap'n Refsmmat Posted July 5, 2007 Share Posted July 5, 2007 Yes, reformatting a large number of documents might be difficult. However, I wouldn't consider any knowledge to be lost if the text itself is kept in plain text. You could just scroll past the unreadable code, and copy-paste the plain text contained within the document. And if you have terabytes of government data, do you really want to spend the time to copy/paste it all? This is very strange. I just compared the raw text of a Word document and an OpenOffice document in NotePad. The Word document had a little formatting data at the top, but the rest of the text was easy to read. In the OpenOffice document, the whole document appears to be written in unreadable code. Why is the document encrypted if I didn't choose to encrypt it? OpenOffice documents are actually several files zipped together. Rename the file to file.zip and open it with a decompressor program. The same is true for Office 2007 documents. Link to comment Share on other sites More sharing options...
herme3 Posted July 6, 2007 Author Share Posted July 6, 2007 And if you have terabytes of government data, do you really want to spend the time to copy/paste it all? It could be done whenever the information is necessary. There won't be a time when every terabyte of data needs to be opened at once. When a certain document is needed, just open it in Notepad instead of the original application. From there, the data can be copied into open source format. OpenOffice documents are actually several files zipped together. Rename the file to file.zip and open it with a decompressor program. The same is true for Office 2007 documents. I wonder why the documents are zipped? Unless you are saving a major server's log files or other large files, compressing each text document shouldn't make much of a difference on modern hard drives and flash memory devices. Wouldn't it be better to reduce the open/save time by not compressing them? Link to comment Share on other sites More sharing options...
insane_alien Posted July 6, 2007 Share Posted July 6, 2007 I wonder why the documents are zipped? Unless you are saving a major server's log files or other large files, compressing each text document shouldn't make much of a difference on modern hard drives and flash memory devices. Wouldn't it be better to reduce the open/save time by not compressing them? zipped as in archive. it keeps the files together as one document. makes it easier to handle. also, any compression used will not make a noticable effect on speed as todays processors are generally multi-core and extremely fast. Link to comment Share on other sites More sharing options...
Dak Posted July 6, 2007 Share Posted July 6, 2007 It could be done whenever the information is necessary. There won't be a time when every terabyte of data needs to be opened at once. unless you're searching thru the text for a key-phrase or unless you want casual access, without the need to copy/paste before you can access it. Link to comment Share on other sites More sharing options...
Cap'n Refsmmat Posted July 6, 2007 Share Posted July 6, 2007 I wonder why the documents are zipped? Unless you are saving a major server's log files or other large files, compressing each text document shouldn't make much of a difference on modern hard drives and flash memory devices. Wouldn't it be better to reduce the open/save time by not compressing them? It's because an OpenDocument file (and an OpenXML file) consists of more than one file. There's usually a file with the text structure, one with a style sheet, and a folder containing any images used in the document. To make it easy to handle, they're all zipped up into one file so you don't have to deal with dozens of files sitting around for all three documents you have. Link to comment Share on other sites More sharing options...
Ndi Posted July 11, 2007 Share Posted July 11, 2007 There are ways of doing this, and I have the solution for you, it's simple: Convert data to instructions for rebuilding. Like this: Type: Hello Make font 1 cm high, and red: world end document. Then we need something to separate commands from text in a clear way. I know, let's use angular brackets: <text>Hello<yey high, red>World Great, let's refine a little. Text is already there, and color needs to be more flexible, so hello<font color=#FF0000 size=4>world</font> Cool. Let's make something like this. Even if nobody can open it, someone can easily write a program that parses this and saves in any format I want. I'm a genius. It's so simple it can be even used in jokes. Look: </sarcasm> Link to comment Share on other sites More sharing options...
Recommended Posts
Create an account or sign in to comment
You need to be a member in order to leave a comment
Create an account
Sign up for a new account in our community. It's easy!
Register a new accountSign in
Already have an account? Sign in here.
Sign In Now