milianw/springer_download

chapter sorting

mortbauer opened this issue · 5 comments

Sometimes, the chapters of the books are sorted alphabetically on the contents page of springerlink, as the script only uses this information for its list order, the chapters are mixed up which isn't very nice.
Maybe there could be a sorting, based on the page numbers of the chapters. I think it should be possible, but I'm not very good on regex, so I can't present a solution myself.

It would be pretty helpful if you could provide us with an example. A simple URL will be enough...

But I am not sure if this is possible. Are you asking to extract page numbers out of the PDFs?

ok, sorry for that. here an example, 978-3-540-23957-4 this is the ISBN of the book Springer Handbook of Robotics, it has 66 chapters and when i download it with the script the are not in the correct order. But i was browsing the contents page of the book, this url: http://www.springerlink.com/content/978-3-540-23957-4/contents/ , and next to the chapters are the pagenumbers of the chapter so i thought i shouldn't be to difficult to make the ordering based on this numbers.

I have implemented something that might handle this... please give it a try and report back if it is what you intended.

The sorting seems to work, but only tryed it with one example so far, but if i try without sorting, I get now following error:

$ python2 springer_download.py -l http://www.springerlink.com/content/978-3-540-77876-9/
fetching book information...
    http://springerlink.com/content/978-3-540-77876-9/contents/

Now Trying to download book 'VDI Heat Atlas'

found 68 chapters
Traceback (most recent call last):
  File "springer_download.py", line 310, in <module>
    main(sys.argv[1:])
  File "springer_download.py", line 194, in main
    chapterLink = baseLink + chapterLink
TypeError: cannot concatenate 'str' and 'tuple' objects

As already commented inline your modification can not handle front-the matter because of it's roman pagenumbers. Additionally there are back-matters with pagenumbers starting at 1. E.g. www.springerlink.com/content/978-3-540-25202-3/