command line scanned pdf to text
Jude DaShiell
jdashiel at panix.com
Wed Nov 4 11:32:16 EST 2015
Thanks, it's tesseract-data-eng on archlinux and it's a separate package
install. Got it too.
On Wed, 4 Nov 2015, Cheryl Homiak wrote:
> Date: Wed, 4 Nov 2015 10:49:03
> From: Cheryl Homiak <cah4110 at icloud.com>
> Reply-To: Speakup is a screen review system for Linux.
> <speakup at linux-speakup.org>
> To: Speakup is a screen review system for Linux. <speakup at linux-speakup.org>
> Subject: Re: command line scanned pdf to text
>
> On debian, it is tesseract-ocr-eng and it may or may not be installed with the main package; I don't remember having to do it separately but I have it.
>
> --
> Cheryl
>
> May the words of my mouth
> and the meditation of my heart
> be acceptable to You, Lord,
> my rock and my Redeemer.
> (Psalm 19:14 HCSB)
>
>
>
>
>
>> On Nov 4, 2015, at 9:11 AM, John G Heim <jheim at math.wisc.edu> wrote:
>>
>> On ubuntu it's tesseract-ocr-en.
>>
>>
>>
>> On 11/04/2015 09:01 AM, Jude DaShiell wrote:
>>> What data pack for tesseract has the english language in it? I'm being
>>> prompted to download a data pack and I figure best get what language I
>>> understand rather than the whole data set since both memory and disk
>>> space over here are not unlimited.
>>>
>>> On Mon, 2 Nov 2015, Cheryl Homiak wrote:
>>>
>>>> Date: Mon, 2 Nov 2015 17:39:38
>>>> From: Cheryl Homiak <cah4110 at icloud.com>
>>>> Reply-To: Speakup is a screen review system for Linux.
>>>> <speakup at linux-speakup.org>
>>>> To: Speakup is a screen review system for Linux.
>>>> <speakup at linux-speakup.org>
>>>> Subject: Re: command line scanned pdf to text
>>>>
>>>> Thanks much. No, the way to get into a turned-off computer far away
>>>> hasn't been invented yet, unless you can turn it on by remote control
>>>> somehow - :-)
>>>> I suspect the error was mine so I won't give up on it yet.
>>>>
>>>> Thanks.
>>>>
>>>> --
>>>> Cheryl
>>>>
>>>> May the words of my mouth
>>>> and the meditation of my heart
>>>> be acceptable to You, Lord,
>>>> my rock and my Redeemer.
>>>> (Psalm 19:14 HCSB)
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>> On Nov 2, 2015, at 4:06 PM, John G Heim <jheim at math.wisc.edu> wrote:
>>>>>
>>>>> Huh, it strikes me as strange that tesseract didn't work for you. I
>>>>> used tesseract last week to read a page in a pdf document that was
>>>>> stored as an image. I used pdftohtml to extract the image and then
>>>>> tesseract to convert it to text. I also pretty routinely use
>>>>> tesseract to read screen capture images. It's not very accurate there
>>>>> but it's usually good enough to make sense of.
>>>>>
>>>>> Just "tesseract <infile> <outfile>" should work. The infile can be
>>>>> the string "stdin" in which case it read from standard input. The
>>>>> outfile can be "stdout" in which case it writes the text to stdout.
>>>>> Right off hand, I do not have the command line I use to scan the D&D
>>>>> book. It's on a computer at home that is turned off at the moment.
>>>>> But I can post the whole thing tonight. Here are some lines from a
>>>>> backup version of the script:
>>>>>
>>>>> scanimage --format=tiff --mode Lineart --resolution 600 > /tmp/page.tiff
>>>>> tesseract /tmp/page.tiff stdout
>>>>>
>>>>>
>>>>> On 11/02/2015 02:53 PM, Cheryl Homiak wrote:
>>>>>> Would you mind enlarging on this if you can and have time? What kind
>>>>>> of file did you use and what did you put in your command-line? I am
>>>>>> asking this because I have tried to use tesseract a couple of times
>>>>>> with tiff files and have gotten mostly gibberish so obviously I am
>>>>>> doing something wrong. I am running debian testing if that makes a
>>>>>> difference.
>>>>>>
>>>>>> Thanks.
>>>>>>
>>>>>
>>>>> --
>>>>> John Heim, jheim at math.wisc.edu, 608-263-4189, skype:john.g.heim,
>>>>> sip:jheim at sip.linphone.org
>>>>> _______________________________________________
>>>>> Speakup mailing list
>>>>> Speakup at linux-speakup.org
>>>>> http://linux-speakup.org/cgi-bin/mailman/listinfo/speakup
>>>>
>>>> _______________________________________________
>>>> Speakup mailing list
>>>> Speakup at linux-speakup.org
>>>> http://linux-speakup.org/cgi-bin/mailman/listinfo/speakup
>>>
>>
>> --
>> John Heim, jheim at math.wisc.edu, 608-263-4189, skype:john.g.heim, sip:jheim at sip.linphone.org
>> _______________________________________________
>> Speakup mailing list
>> Speakup at linux-speakup.org
>> http://linux-speakup.org/cgi-bin/mailman/listinfo/speakup
>
> _______________________________________________
> Speakup mailing list
> Speakup at linux-speakup.org
> http://linux-speakup.org/cgi-bin/mailman/listinfo/speakup
--
More information about the Speakup
mailing list