sewardlee337/finreportr

Reports until 2018 are available, those of 2019 & 2020 are not

Opened this issue · 5 comments

After successfully reading SEC financial data until 2018 inclusive, my attemps to read 2019 & 2020 fail.

I get the following message:

_> GetIncome("NVDA", 2019)
Error in fileFromCache(file) :
Error in download.file(file, cached.file, quiet = !verbose) :
no fue posible abrir la URL -> it was not possible to open URL 'https://www.sec.gov/Archives/edgar/data/1045810/000104581019000023/https://xbrl.sec.gov/dei/2018/dei-2018-01-31.xsd'

Además: Warning message:
In download.file(file, cached.file, quiet = !verbose) :
cannot open URL 'https://www.sec.gov/Archives/edgar/data/1045810/000104581019000023/https://xbrl.sec.gov/dei/2018/dei-2018-01-31.xsd': HTTP status was '404 Not Found'_

Reports for 2019 & 2020 (and 2021) are available

AnnualReports(symbol = "NVDA", foreign = FALSE)
filing.name filing.date accession.no
1 10-K 2021-02-26 0001045810-21-000010
2 10-K 2020-02-20 0001045810-20-000010
3 10-K 2019-02-21 0001045810-19-000023
4 10-K 2018-02-28 0001045810-18-000010
5 10-K 2017-03-01 0001045810-17-000027
6 10-K 2016-03-17 0001045810-16-000205
7 10-K 2015-03-12 0001045810-15-000036
8 10-K 2014-03-13 0001045810-14-000030
9 10-K 2013-03-12 0001045810-13-000008
10 10-K 2012-03-13 0001045810-12-000013
11 10-K 2011-03-16 0001045810-11-000015
12 10-K 2010-03-18 0001045810-10-000006
13 10-K 2009-03-13 0001045810-09-000013
14 10-K 2008-03-21 0001045810-08-000011
15 10-K 2007-03-16 0001045810-07-000008
16 10-K/A 2006-11-29 0001193125-06-243224
17 10-K 2006-03-16 0001045810-06-000014
18 10-K 2005-03-22 0001045810-05-000008
19 10-K/A 2004-05-20 0001045810-04-000014
20 10-K 2004-03-29 0001045810-04-000007
21 10-K 2003-04-25 0001045969-03-001196
22 10-K 2002-05-14 0001012870-02-002262
23 10-K405/A 2001-05-25 0001012870-01-501023
24 10-K405 2001-04-27 0001012870-01-500492
25 10-K405 2000-03-13 0001012870-00-001346
26 10-K405 1999-04-29 0000929624-99-000772

However, I think there might be a problem with a date, since finreportr is trying to find a file whose date "2018-01-31.xsd" it's unavaible to find.

Is it possible that there might be a problem with the dates?

And congratulations for the package, it can be very useful.

VS

I'm experiencing this as well. I'll take a look into it

I have the same problem.

I spent the day looking into this. The problem is actually in the XBRL library that finreportr uses to do the heavy lifting of pulling and parsing XBRL data from the SEC. In particular, the bit of XBRL code that downloads supporting schemas fails to detect that an https url is in fact a url rather than a local file path. It works fine with an http url. Because it thinks the https url is a file path, it appends it to the dirname of the cache directory (which is also an https url). That’s why it is attempting to get a double url string in the code snippet below

Además: Warning message:
In download.file(file, cached.file, quiet = !verbose) :
cannot open URL 'https://www.sec.gov/Archives/edgar/data/1045810/000104581019000023/https://xbrl.sec.gov/dei/2018/dei-2018-01-31.xsd': HTTP status was '404 Not Found'_

I went and found the XBRL source code on the cran GitHub mirror, forked it, and implementation a fix. If you install my version of XBRL using devtools::install_github(“riazarbi/XBRL”), and restart your session, this error should go away.

Obviously it would be better if the maintainer of XBRL implemented the fix in the CRAN version. I’ve emailed him to ask how I should submit the fix to him, as the package original source code is not available anywhere that I can find. In the meantime the above patched version of XBRL should work.

Incidentally, I suspect that this issue does not occur with earlier reports because at some point companies started migrating from http to https endpoints for their schema definitions.

Here’s a reproducible example of how to use my patched XBRL package.

devtools::install_github("riazarbi/XBRL")
library(XBRL)
library(finreportr)
options(stringsAsFactors = FALSE)
options(HTTPUserAgent = "REDACTED USERNAME@REDACTED.COM")
GetIncome("NVDA", 2019)

Returns


> GetIncome("NVDA", 2019) |> dplyr::glimpse()
Rows: 51
Columns: 5
$ Metric    <chr> "Cost of Goods and Services Sold", "Cost of Goods and Services Sold", …
$ Units     <chr> "usd", "usd", "usd", "usd", "usd", "usd", "usd", "usd", "usd", "usd", …
$ Amount    <chr> "2847000000", "787000000", "928000000", "1067000000", "1110000000", "3…
$ startDate <chr> "2016-02-01", "2017-01-30", "2017-05-01", "2017-07-31", "2017-10-30", …
$ endDate   <chr> "2017-01-29", "2017-04-30", "2017-07-30", "2017-10-29", "2018-01-28", …
Warning message:
In roleId == role.id :
  longer object length is not a multiple of shorter object length

Bear in mind that the originalXBRL package uses download.file under the hood, which has a timeout of 60 seconds, so if your internet connection is slow and you get a timeout error you might need to manually download some of these schema files and drop them into the cache.

In recent times, I tried the R package finreportr to retrieve Apple's balance sheet. I struggle to find a proper solution as the package may not seem to be robust and accurate. Would there be any material update on this package for the foreseeable future?

I believe this work has tremendous value. Many end users can benefit from the proper fundamental analysis of balance sheets, income statements, and cash flow statements for individual stocks. The vast majority of current issues (XBRL, XML location on SEC EDGAR, and some bugs in the R package finreportr) seem to be highly technical for the average retail end user.