Automating Bookshare

Sat Feb 23 13:47:39 EST 2008

On Fri, Feb 22, 2008 at 09:11:15PM -0700, Steve Holmes wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: RIPEMD160
> 
> Yeah, it is the login sequence that isn't working for me.  My last
> attempt to login with a script used two itterations of wget and I
> specified the wget command with the post variables for email and
> password but I didn't get any real meaningful feedback to tell me if
> the login was successful or not.  the second pass worked in the sense
> that it requested the desired file but then I got back the error page
> telling me that I was never logged in.
> 
> My ultimate goal was to have a script pull my daily newspaper from
> bookshare and dump the thing onto an SD card so I could come up to my
> computer in the morning and grab the card and pop it into my Victor
> Reader Stream and groove on the day's news.:)  I heard people doing
> that with the Icon small computer so figured there could be a way for
> us to do it with shell script or something similar and yield the same
> results.  In any case, it would be a fun project if I could get it to
> work.
:End-Quote:

	I was on a similar project a few years back, just before losing
my sight in fact, which put a halt to things. I was using lynx to grab a
web page off yahoo.finance (because I didn't know about wget at the
time), and using sed, grep, and tr, boil the pages into
link lists.  to make a long story short, the tools are there.  You just
need to carefully examine the web source and grep the command they use
to log in, fill in the blanks, and do the same with the next.  I still
have yet to strip the stock quote data itself from yahoo finance and
feed it into postgresql, but I did manage to get a listing of over 30000
web urls to each stock.   My method was pretty much as follows:

1)  Use lynx to get a web page and redirect it to a file.
2) Use tr to convert all angle brackets to new-lines.
3) Use grep and sed to find and alter urls and redirect the output to a
	file.
4) Repeat the process with the newly created list until I had a listing
	of all 30000 NASDAQ stock pages.

	I was refining things at that time when I lost my sight, so
hadn't gotten to stripping the data out of each page to feed it into the
SQL database, nor was JAVA as prevalent back then as it is today.  Hey,
I was bored, alright?
	Anyway, if you can redirect the web page to a file, you can
certainly use linux commands to strip out and fill in the same blanks as
if you had performed it manually.  It's all there in the source code of
the web page.  I would probably use wget instead of lynx though, now
that I know it exists.

			Michael