command line scanned pdf to text

Tom Fowle wa6ivgtf at fastmail.fm
Mon Nov 2 23:15:32 EST 2015


Sheryl,
I  arbitrarilly chose to convert the pdf to jpeg as tesseract doesn't do
pdf.

Then I just did
tesseract filename.jpg  outfile
produces
outfile.txt

sorry havn't tried .tif and I couldn't find a list of supported file types.

tom fowle

On Mon, Nov 02, 2015 at 02:53:45PM -0600, Cheryl Homiak wrote:
> Would you mind enlarging on this if you can and have time? What kind of file did you use and what did you put in your command-line? I am asking this because I have tried to use tesseract a couple of times with tiff files and have gotten mostly gibberish so obviously I am doing something wrong. I am running debian testing if that makes a difference.
> 
> Thanks.
> 
> -- 
> Cheryl
> 
> May the words of my mouth
> and the meditation of my heart
> be acceptable to You, Lord,
> my rock and my Redeemer.
> (Psalm 19:14 HCSB)
> 
> 
> 
> 
> 
> > On Nov 2, 2015, at 2:13 PM, John G Heim <jheim at math.wisc.edu> wrote:
> > 
> > 
> > I've been scanning in the D&D 5th Edition player's handbook. I tried every open source OCR program I could find and tesseract was easily the best. On pages that are just prose, it probably does about 99% accuracy. Even on pages where that are 2 columns of prose, it does really well if you tell it to look for that. Somebody sent me a pdf of the same book done with a professional OCR program for Windows. The results are approximately equal. Tesseract may lack the bells & whistles of commercial products but for accuracy, it's pretty good.
> > 
> > 
> > 
> > On 11/01/2015 11:24 PM, Tom Fowle wrote:
> >> Am I the last to find this?
> >>  command line ocr tesseract
> >> won't directly support .pdf but
> >> pdftocairo
> >> produces .jpg among others which tesseract will read.
> >> 
> >> May not do well with collumns but not too bad.
> >> 
> >> Is there anything better?
> >> 
> >> Thanks
> >> tom Fowle
> >> _______________________________________________
> >> Speakup mailing list
> >> Speakup at linux-speakup.org
> >> http://linux-speakup.org/cgi-bin/mailman/listinfo/speakup
> >> 
> > 
> > -- 
> > John Heim, jheim at math.wisc.edu, 608-263-4189, skype:john.g.heim, sip:jheim at sip.linphone.org
> > _______________________________________________
> > Speakup mailing list
> > Speakup at linux-speakup.org
> > http://linux-speakup.org/cgi-bin/mailman/listinfo/speakup
> 
> _______________________________________________
> Speakup mailing list
> Speakup at linux-speakup.org
> http://linux-speakup.org/cgi-bin/mailman/listinfo/speakup


More information about the Speakup mailing list