Thought I'd post this in this section, as it's most applicable to the writers.
Anyway, it turns out that Word .doc files can contain a lot more than simply the words on the screen - get a load of this list:
- Text from other documents open at the same time
- Previously deleted text
- E-mail headers and server information
- Printer names
- Data about the machine where the document was written
- Where the document was saved
- Word version number and document format
- Names and usernames of document authors
all embedded in the actual .doc file code.
Here's an article from the BBC about the issue:
http://news.bbc.co.uk/1/hi/technology/3154479.stm
The hidden dangers of documents
Your Microsoft Word document can give readers more information about you than you might think. Even Alastair Campbell has fallen foul of the snippets of invisible data few of us realise our documents contain.
Usually with Microsoft Word, what you see is what you get.
If you make a change to a document, then that is what you see when it gets printed out.
But in fact, in many cases it is what you cannot see at first glance that proves more interesting.
Analysis of hidden information in the so-called Iraq "dodgy dossier" showed, among other things, the names of the four civil servants who worked on it.
Downing Street press office head Alastair Campbell had to explain who these people were to the House of Commons Foreign Affairs Select Committee investigating the genesis of the plagiarised document.
"The time when most information tends to leak is when you are using a document that has a number of revisions or a number of people working on it," says Nick Spenceley, founder director of computer forensics firm Inforenz.
The UK government has now largely abandoned Microsoft Word for official documents and has turned to documents created using Adobe Acrobat which uses the Portable Data Format (PDF).
"I'm not sure many people check Word documents before they go out or are published," says Mr Spenceley.
He says he knows of a case in which someone found previous versions of an employment contract buried in the Word copy he was sent. Reading the hidden extras gave the person applying for the job a big advantage during negotiations.
Sometimes the mistakes are even more public.
During the hunt for the Washington sniper the police allowed the Washington Post to publish a letter sent to the police that included names and telephone numbers.
The newspaper tried to hide these details using black boxes which were easily removed and the sensitive details exposed for all to see.
But it is not just governments, businesses and newspapers that can be embarrassed in this way.
You could be too.
There is a function in many versions of Microsoft Office programs, which includes Word, Excel and PowerPoint, that means that fragments of data (which Microsoft refers to as metadata) from other files you deleted or were working on at the same time could be hidden in any document you save.
This could be embarrassing for any home workers whose colleagues find out that they have been applying for jobs while working at home or being less than complimentary about their co-workers.
Look and learn
With the right tools this hidden data can easily be extracted.
Unix and Linux users can turn to tools such as Antiword and Catdoc to turn the document, including its formatting information, into a simple text file.
Computer researcher Simon Byers has conducted a survey of Word documents available on the net and found that many of them contain sensitive information.
He gathered about 100,000 Word documents from sites on the web and every single one of them had hidden information.
In a research paper about the work Mr Byers wrote that about half the documents gathered had up to 50 hidden words, a third up to 500 words hidden and 10% had more than 500 words concealed within them.
The hidden text revealed the names of document authors, their relationship to each other and earlier versions of documents.
Occasionally it revealed very personal information such as social security numbers that are beloved of criminals who specialise in identity theft.
Also available was useful information about the internal network the document travelled through, which could be useful to anyone looking for a route into a network.
Mr Byers wrote that the problem of leaky Word documents is pervasive and wrote that anyone worried about losing personal information might want to consider using a different word processing program.
Alternatively he recommends using utility programs that scrub information from Word documents or following Microsoft's advice about how to make documents safer.
"Microsoft is aware of the functionality of metadata being stored within Word 97 documents and would advise users to follow the instructions laid out in [the Microsoft Knowledge Base - see Related Internet Links]," says a spokesperson. "However, Microsoft do not wish to comment on how customers use the functionality within our software."
Anyway, it turns out that Word .doc files can contain a lot more than simply the words on the screen - get a load of this list:
- Text from other documents open at the same time
- Previously deleted text
- E-mail headers and server information
- Printer names
- Data about the machine where the document was written
- Where the document was saved
- Word version number and document format
- Names and usernames of document authors
all embedded in the actual .doc file code.
Here's an article from the BBC about the issue:
http://news.bbc.co.uk/1/hi/technology/3154479.stm
The hidden dangers of documents
Your Microsoft Word document can give readers more information about you than you might think. Even Alastair Campbell has fallen foul of the snippets of invisible data few of us realise our documents contain.
Usually with Microsoft Word, what you see is what you get.
If you make a change to a document, then that is what you see when it gets printed out.
But in fact, in many cases it is what you cannot see at first glance that proves more interesting.
Analysis of hidden information in the so-called Iraq "dodgy dossier" showed, among other things, the names of the four civil servants who worked on it.
Downing Street press office head Alastair Campbell had to explain who these people were to the House of Commons Foreign Affairs Select Committee investigating the genesis of the plagiarised document.
"The time when most information tends to leak is when you are using a document that has a number of revisions or a number of people working on it," says Nick Spenceley, founder director of computer forensics firm Inforenz.
The UK government has now largely abandoned Microsoft Word for official documents and has turned to documents created using Adobe Acrobat which uses the Portable Data Format (PDF).
"I'm not sure many people check Word documents before they go out or are published," says Mr Spenceley.
He says he knows of a case in which someone found previous versions of an employment contract buried in the Word copy he was sent. Reading the hidden extras gave the person applying for the job a big advantage during negotiations.
Sometimes the mistakes are even more public.
During the hunt for the Washington sniper the police allowed the Washington Post to publish a letter sent to the police that included names and telephone numbers.
The newspaper tried to hide these details using black boxes which were easily removed and the sensitive details exposed for all to see.
But it is not just governments, businesses and newspapers that can be embarrassed in this way.
You could be too.
There is a function in many versions of Microsoft Office programs, which includes Word, Excel and PowerPoint, that means that fragments of data (which Microsoft refers to as metadata) from other files you deleted or were working on at the same time could be hidden in any document you save.
This could be embarrassing for any home workers whose colleagues find out that they have been applying for jobs while working at home or being less than complimentary about their co-workers.
Look and learn
With the right tools this hidden data can easily be extracted.
Unix and Linux users can turn to tools such as Antiword and Catdoc to turn the document, including its formatting information, into a simple text file.
Computer researcher Simon Byers has conducted a survey of Word documents available on the net and found that many of them contain sensitive information.
He gathered about 100,000 Word documents from sites on the web and every single one of them had hidden information.
In a research paper about the work Mr Byers wrote that about half the documents gathered had up to 50 hidden words, a third up to 500 words hidden and 10% had more than 500 words concealed within them.
The hidden text revealed the names of document authors, their relationship to each other and earlier versions of documents.
Occasionally it revealed very personal information such as social security numbers that are beloved of criminals who specialise in identity theft.
Also available was useful information about the internal network the document travelled through, which could be useful to anyone looking for a route into a network.
Mr Byers wrote that the problem of leaky Word documents is pervasive and wrote that anyone worried about losing personal information might want to consider using a different word processing program.
Alternatively he recommends using utility programs that scrub information from Word documents or following Microsoft's advice about how to make documents safer.
"Microsoft is aware of the functionality of metadata being stored within Word 97 documents and would advise users to follow the instructions laid out in [the Microsoft Knowledge Base - see Related Internet Links]," says a spokesperson. "However, Microsoft do not wish to comment on how customers use the functionality within our software."