command line scanned pdf to text

Tom Fowle wa6ivgtf at fastmail.fm
Wed Nov 4 21:36:17 EST 2015


Just installed tesseract as debian package and the en pack came with it
automatically.
Tom Fowle

On Wed, Nov 04, 2015 at 09:49:03AM -0600, Cheryl Homiak wrote:
> On debian, it is tesseract-ocr-eng and it may or may not be installed with the main package; I don't remember having to do it separately but I have it.
> 
> -- 
> Cheryl
> 
> May the words of my mouth
> and the meditation of my heart
> be acceptable to You, Lord,
> my rock and my Redeemer.
> (Psalm 19:14 HCSB)
> 
> 
> 
> 
> 
> > On Nov 4, 2015, at 9:11 AM, John G Heim <jheim at math.wisc.edu> wrote:
> > 
> > On ubuntu it's tesseract-ocr-en.
> > 
> > 
> > 
> > On 11/04/2015 09:01 AM, Jude DaShiell wrote:
> >> What data pack for tesseract has the english language in it?  I'm being
> >> prompted to download a data pack and I figure best get what language I
> >> understand rather than the whole data set since both memory and disk
> >> space over here are not unlimited.
> >> 
> >> On Mon, 2 Nov 2015, Cheryl Homiak wrote:
> >> 
> >>> Date: Mon, 2 Nov 2015 17:39:38
> >>> From: Cheryl Homiak <cah4110 at icloud.com>
> >>> Reply-To: Speakup is a screen review system for Linux.
> >>>    <speakup at linux-speakup.org>
> >>> To: Speakup is a screen review system for Linux.
> >>> <speakup at linux-speakup.org>
> >>> Subject: Re: command line scanned pdf to text
> >>> 
> >>> Thanks much. No, the way to get into a turned-off computer far away
> >>> hasn't been invented yet, unless you can turn it on by remote control
> >>> somehow - :-)
> >>> I suspect the error was mine so I won't give up on it yet.
> >>> 
> >>> Thanks.
> >>> 
> >>> --
> >>> Cheryl
> >>> 
> >>> May the words of my mouth
> >>> and the meditation of my heart
> >>> be acceptable to You, Lord,
> >>> my rock and my Redeemer.
> >>> (Psalm 19:14 HCSB)
> >>> 
> >>> 
> >>> 
> >>> 
> >>> 
> >>>> On Nov 2, 2015, at 4:06 PM, John G Heim <jheim at math.wisc.edu> wrote:
> >>>> 
> >>>> Huh, it strikes me as strange that tesseract didn't work for you. I
> >>>> used tesseract last week to read a page in a pdf document that was
> >>>> stored as an image. I used pdftohtml to extract the image and then
> >>>> tesseract to convert it to text. I also pretty routinely use
> >>>> tesseract to read screen capture images. It's not very accurate there
> >>>> but it's usually good enough to make sense of.
> >>>> 
> >>>> Just "tesseract <infile> <outfile>" should work. The infile can be
> >>>> the string "stdin" in which case it read from standard input. The
> >>>> outfile can be "stdout" in which case it writes the text to stdout.
> >>>> Right off hand, I do not have the command line I use to scan the D&D
> >>>> book. It's on a computer at home that is turned off at the moment.
> >>>> But I can post the whole thing tonight. Here are some lines from a
> >>>> backup version of the script:
> >>>> 
> >>>> scanimage --format=tiff --mode Lineart --resolution 600 > /tmp/page.tiff
> >>>> tesseract /tmp/page.tiff stdout
> >>>> 
> >>>> 
> >>>> On 11/02/2015 02:53 PM, Cheryl Homiak wrote:
> >>>>> Would you mind enlarging on this if you can and have time? What kind
> >>>>> of file did you use and what did you put in your command-line? I am
> >>>>> asking this because I have tried to use tesseract a couple of times
> >>>>> with tiff files and have gotten mostly gibberish so obviously I am
> >>>>> doing something wrong. I am running debian testing if that makes a
> >>>>> difference.
> >>>>> 
> >>>>> Thanks.
> >>>>> 
> >>>> 
> >>>> --
> >>>> John Heim, jheim at math.wisc.edu, 608-263-4189, skype:john.g.heim,
> >>>> sip:jheim at sip.linphone.org
> >>>> _______________________________________________
> >>>> Speakup mailing list
> >>>> Speakup at linux-speakup.org
> >>>> http://linux-speakup.org/cgi-bin/mailman/listinfo/speakup
> >>> 
> >>> _______________________________________________
> >>> Speakup mailing list
> >>> Speakup at linux-speakup.org
> >>> http://linux-speakup.org/cgi-bin/mailman/listinfo/speakup
> >> 
> > 
> > -- 
> > John Heim, jheim at math.wisc.edu, 608-263-4189, skype:john.g.heim, sip:jheim at sip.linphone.org
> > _______________________________________________
> > Speakup mailing list
> > Speakup at linux-speakup.org
> > http://linux-speakup.org/cgi-bin/mailman/listinfo/speakup
> 
> _______________________________________________
> Speakup mailing list
> Speakup at linux-speakup.org
> http://linux-speakup.org/cgi-bin/mailman/listinfo/speakup


More information about the Speakup mailing list