Page 1 is not scraped in this case

Question

Page 1 is not scraped in this case

Closed this issue 2 years ago · 3 comments

Only page 2 of the PDF is scraped. Scraping of page 1 can also be done and stored inside csv_files under separate folders for page 1 and page 2.

The header for page 1 begins at index 2 something like this

header_row = lines[2]

And the for loop can run from index 3

for line in lines[3:]:

Rest of the code remains the same for page 1

Answer 1 · 2023-01-20T13:49:43.000Z

Page 1 was intentionally ignored because the reference rates are only on Page 2. SBI calls this out at the top of the Page 2.
Why do you think Page 1 data is useful?

Answer 2 · 2023-01-21T12:17:04.000Z

@sahilgupta I feel both pages are equally important. The first one's header mentions it is for transactions below 10 lacs.

Answer 3 · 2023-01-21T13:47:35.000Z

The goal of this project is to track forex rates relevant for Income Tax purposes, which are quoted on the 2nd page.

Reference rates to be used for converting capital gains/foreign income are independent of the size of the transaction.
Even a 100 USD transaction must use the reference rates as a 10000 USD transaction.

Rates on the 1st page are thus not relevant.