A tip for removing unwanted data from text

Mirannan

Well-Known Member
Joined
Jan 20, 2013
Messages
1,791
One problem with most word processors (and also probably web design programs and the like as well) is file bloat caused by all the invisible data that forms part of the file. It can also be a headache to adjust line spacing, font, character size and the like when copying text around.

One way to get just the text, and know that's all you are getting, is to use some simple text editor such as Notepad as an intermediary between two different files - for example, copying Web text into a word processor. This strips out all the junk, guaranteed, because Notepad doesn't support it in the first place.

I suspect that the same applies for basic editors such as Paint for graphics files.
 
Absolutely don't paste Web content direct to a Wordprocessor, unless you really understand it! Yes, indeed, it must be pasted into a Text Editor. Excel at least has Paste Options to allow plain, rich or attempt to format options.

I use Notepad++
Multiple tabs for multiple docs.
Very much more powerful search and replace with optional regex
Only JPEG are a problem for data in Graphics. Um ... don't use Paint for JPEG!
Do save all graphics separate to the wordprocessor and paste different resolutions for eBook, web and print / PDF use rather than resizing in the WP program. Set the Screen DPI option in the wordprocessor to suit target system not your own screen.
 
For those who like using keyboard shortcuts, you'll also find that some applications have a "Special Paste" shortcut that pastes the text on your clipboard without any of the formatting.

For example, when visiting Chrons in Chrome, using Ctrl + Shift + V when pasting into the reply box, or right-clicking and choosing "Paste as plain text", will paste the text without formatting. It's supposed to be a fairly universal shortcut, but that doesn't mean it is supported by every browser on every system, so definitely keep a simple text editor like Notepad handy.
 
If you're dealing with programs; most have options to export or save in other formats that are there because of this problem.

If your scabbing off a web page then it makes a difference how you chose to copy the content. There is usually one way to get it sans the formatting though it does also work to place it into a text file though I tend to use programming text editors that will accept the strange formatting characters. That's because I usually want to see what they are doing in their formatting. Often the formatting is at top and bottom and the mid section is clear and contains the content; though with recent web pages there are lots of strange things embedded in-between.
 
As Lenny points out. Paste Special. It's a feature that's pretty much becoming standard. For instance, in the more recent versions of Word and Google Docs, you have the option to "Keep source formatting," or "Keep text only." If you chose the text only option, it strips away all the formatting and special code, leaving just the text. The result is exactly the same as pasting into notepad and removes the need for an intermediary application. I believe even OpenOffice/LibreOffice has the same paste special options.

You also have the option to "Merge Formatting," if perhaps you wanted to copy the formatting from one page to another without overwriting any text. It makes your text look like the source formatting without replacing anything.

Whenever you paste something in Word, it comes up with a tooltip to pick which option you want. Right clicking within the document also gives you the submenu with the option to choose between the three different choices before you've even pasted anything. Super easy and painless.

Of course, going in reverse from word processor to browser is a little more complicated. Only some browsers support Paste Special. But then, most forum software strips out the majority of hidden code anyway, only keeping the basics, like bold, italics and sizes, etc. The things you typically want it to keep. (Imagine all the 1st person italics in your 3rd person story that you will have to hunt down and put back in) Chron's forum software strips out indenting even (which I wish it wouldn't)
 
Last edited:
On the formatting issue when you copy-paste; I love that Scrivener has a context-sensitive R button action for 'Paste and match style'. I have a lot of half written stuff and half baked ideas in Word and .rtf or .txt format that I used to write in a few years ago before I invested in Scriv.

pH
 

Similar threads


Back
Top