word documents

Igor Gueths igueths at attbi.com
Sun Mar 16 09:56:00 EST 2003


Hi Dave. I have a not so good result with pdftotext. Basically I converted
a fairly large manual written in pdf format to text, the result was a
textfile that didn't have the line rap set properly. The result was having
about half a line of text, and the other half of the line had been lost.
Needless to say the conversion was not accurate. have you been able to get
around this problem and if so how?

May you code in the power of the source,
may the kernel, libraries, and utilities be with you,
throughout all distributions until the end of the epoch.

On Sun, 16 Mar 2003, Dave Hunt wrote:

> There are Word document viewers for Linux console.  The one I use is
> called wv.  Another is called antiword.  No doubt, there are more.
> Because Word is a proprietary format, and the specification is not
> available, the authors of programs such as wv have had to
> reverse-engineer a bit.  Because of this, certain things in the Word
> document may not decode as well as we'd like.  Nonetheless, I use wv
> and get reasonable results when converting from Word to html.  The
> resulting html source is quite bloated, but, it's there.
>
> For pdf conversion, there's pdftotext.  This is part of the xpdf
> package, and may already be on your system.  Surprise, it was already
> on my stock installation of RH 7.2.  the one thing I don't like about
> pdftotext-s rendering, is that hyperlinks get lost.  To preserve the
> navigability of pdf documents, I visit <access.adobe.com>, and submit
> the url of a pdf document (assuming I've found it on the web) to the
> form.  What comes back is a nice html rendering (links and all).
>
>
> Hope this helps,
>
>
> -Dave
>
>
> _______________________________________________
> Speakup mailing list
> Speakup at braille.uwo.ca
> http://speech.braille.uwo.ca/mailman/listinfo/speakup
>





More information about the Speakup mailing list