Mail Archives: djgpp/1998/10/15/14:16:25
From: | "Harlan Grove" <HrlnGrv AT aol DOT com>
|
Newsgroups: | comp.lang.awk,comp.os.msdos.djgpp
|
Subject: | Re: Anyone have code to strip text from HP-PCL5 files?
|
Date: | Thu, 15 Oct 1998 11:17:50 -0700
|
Organization: | Planet Access Network Inc.
|
Lines: | 36
|
Message-ID: | <705drn$k84@jupiter.planet.net>
|
References: | <36258305 DOT 18274931 AT news3 DOT banet DOT net>
|
NNTP-Posting-Host: | 207.3.98.50
|
X-Newsreader: | Microsoft Outlook Express 4.72.3110.5
|
X-MimeOLE: | Produced By Microsoft MimeOLE V4.72.3110.3
|
To: | djgpp AT delorie DOT com
|
DJ-Gateway: | from newsgroup comp.os.msdos.djgpp
|
Reply-To: | djgpp AT delorie DOT com
|
Peter J. Farley III wrote in message <36258305 DOT 18274931 AT news3 DOT banet DOT net>...
>I know there are programs (e.g., pstotext) to strip text from
>Postscript files, but has anyone got any code to do the same thing for
>HP-PCL5 files?
>
>Alternatively, are there any editors or word processors that
>understand HP-PCL5, and can present an on-screen image of the text
>that the printer would produce, maybe with an option to save-as a
>simple text file (i.e., with the PCL5 codes stripped out)?
>
>TIA for any help, info or url's you can provide.
>
>----------------------------------------------------
>Peter J. Farley III (pjfarley AT nospam DOT dorsai DOT org OR
> pjfarley AT nospam DOT banet DOT net)
I have a very basic awk utility that simply strips almost all HP-PCL escape
sequences from files that otherwise contain plain text. It doesn't translate
positioning sequences, so if your text contains overwriting, tabstops or
other positioning formatting, it'll be garbled. Also, it dies when it
encounters embedded binary data. Without further ado, here it is.
# Choke on binary data, embedded fonts, etc.
/\x1B\&p[0-9]+X/ || /\x1B[()]s[0-9]+W/ || /\x1B\*b[0-9]+W/ {
print "Encountered binary data block. Unable to procede."
exit
}
{
gsub("\x1B[9=]", "") # simple sequences
gsub("\x1B[^A-Z@]*[A-Z@]", "") # complex sequences
print
}
- Raw text -