Mail Archives: djgpp/1998/10/15/19:50:32
"Harlan Grove" <HrlnGrv AT aol DOT com> wrote:
<Snipped>
>I have a very basic awk utility that simply strips almost all HP-PCL escape
>sequences from files that otherwise contain plain text. It doesn't translate
>positioning sequences, so if your text contains overwriting, tabstops or
>other positioning formatting, it'll be garbled. Also, it dies when it
>encounters embedded binary data. Without further ado, here it is.
>
># Choke on binary data, embedded fonts, etc.
>/\x1B\&p[0-9]+X/ || /\x1B[()]s[0-9]+W/ || /\x1B\*b[0-9]+W/ {
> print "Encountered binary data block. Unable to procede."
> exit
>}
>{
> gsub("\x1B[9=]", "") # simple sequences
> gsub("\x1B[^A-Z@]*[A-Z@]", "") # complex sequences
> print
>}
Thanks for the code, Harlan. Unfortunately, I made an incorrect
assumption, and it looks like the files I've got are not PCL5, but
something called PCLXL. Here are the headers in the file:
%-12345X AT PJL COMMENT HP LaserJet 6P/6MP - Enhanced Driver
@PJL COMMENT 1.20.0.0
@PJL SET PAGEPROTECT=AUTO
@PJL SET ECONOMODE=OFF
@PJL SET RESOLUTION=600
@PJL SET TIMEOUT=90
@PJL DEFAULT MPTRAY=FIRST
@PJL ENTER LANGUAGE = PCLXL
) HP-PCL XL;1;1;Comment Copyright Hewlett-Packard Company 1989-1996
Have you or anyone else ever seen this printer language? I'm not
familiar with it myself. I wonder if it's an extension of HPGL, the
plotting language?
I can see the text I want to extract when I browse the file, but there
is a *LOT* of binary stuff and what looks like font information in
between chunks of text.
I guess I'll go to one of the HP forums and ask around there.
Thanks again for your code. It may well come in handy one day!
----------------------------------------------------
Peter J. Farley III (pjfarley AT nospam DOT dorsai DOT org OR
pjfarley AT nospam DOT banet DOT net)
- Raw text -