Anyone able to OCR a PDF file?

Wed Jan 4 05:24:26 EST 2012

Willem van der Walt wrote:
> The different ocr engines require different image formats.
> Some of them are really dum.

They probably derive from old code written without a
format-independent graphics library.

> I find that the best of the open-source engines is cuneiform.

Aha, interesting.  I've always used tesseract.  cuneiform is
in debian wheezy (testing) but not yet in debian stable... 

Depending on how the PDF was produced, it's possible that
  ps2txt filename.pdf
(a.k.a. ps2ascii) might help; I think it comes with ghostscript.

Regards,  Peter Billam

http://www.pjb.com.au       pj at pjb.com.au      (03) 6278 9410
"Was der Meister nicht kann,   vermöcht es der Knabe, hätt er
 ihm immer gehorcht?"   Siegfried to Mime, from Act 1 Scene 2