sewardlee337/finreportr

HTTP error 403

Opened this issue · 15 comments

rrik commented

Hello,

I am getting a 403 error when attempting the following

`> GetIncome("FB", 2016)
Error in fileFromCache(file) :
Error in download.file(file, cached.file, quiet = !verbose) :
cannot open URL 'https://www.sec.gov/Archives/edgar/data/1326801/000132680116000043/fb-20151231.xsd'

In addition: Warning message:
In download.file(file, cached.file, quiet = !verbose) :
cannot open URL 'https://www.sec.gov/Archives/edgar/data/1326801/000132680116000043/fb-20151231.xsd': HTTP status was '403 Forbidden'`

Do the source links need updating? Thank you!

Hello,
I'm having a similar issue, but with "404 Not Found":

GetIncome("TSLA", 2020)
Error in fileFromCache(file.inst) : 
  Error in download.file(file, cached.file, quiet = !verbose) : 
  cannot open URL 'https://www.sec.gov/Archives/edgar/data/1318605/000156459020004475/tsla-20191231.xml'

In addition: Warning message:
In download.file(file, cached.file, quiet = !verbose) :
  cannot open URL 'https://www.sec.gov/Archives/edgar/data/1318605/000156459020004475/tsla-20191231.xml': HTTP status was '404 Not Found'

@darh78 That file doesn't exist try:
https://www.sec.gov/Archives/edgar/data/1318605/000156459020004475/tsla-10k_20191231_htm.xml

@rrik that happens to me also with older submissions, seems like it has to do with the SEC fair use policy, you can try downloading the file manually and put it in the cache folder, or you can run the code few times, it will eventually end up downloading it.

Hi,

I also tried same error,

   if (foreign == FALSE) {
        url <- paste0("http://www.sec.gov/cgi-bin/browse-edgar?action=getcompany&CIK=", 
            symbol, "&type=10-k&dateb=&owner=exclude&count=100")
    }
    filings <- xml2::read_html(url)

I try to change count for 1 and works, so it seems this page is detecting that we are not a browser and block. We need to use rSelenium :(

I have been receiving the same error. Is there any workaround?

same error here:

CompanyInfo("GOOG")
Error in open.connection(x, "rb") : HTTP error 403.

Same error 403 in all functions

AnnualReports ("TSLA")
Error in open.connection(x, "rb") : HTTP error 403.

R version 4.1.0 (2021-05-18)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 19043)

Matrix products: default

locale:
[1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United States.1252
[3] LC_MONETARY=English_United States.1252 LC_NUMERIC=C
[5] LC_TIME=English_United States.1252

attached base packages:
[1] stats graphics grDevices utils datasets methods base

other attached packages:
[1] edgarWebR_1.1.0 finreportr_1.0.2

loaded via a namespace (and not attached):
[1] xml2_1.3.2 magrittr_2.0.1 tidyselect_1.1.1 rvest_1.0.1 R6_2.5.1 rlang_0.4.11
[7] fansi_0.5.0 stringr_1.4.0 httr_1.4.2 dplyr_1.0.7 tools_4.1.0 utf8_1.2.2
[13] DBI_1.1.1 selectr_0.4-2 ellipsis_0.3.2 assertthat_0.2.1 tibble_3.1.4 lifecycle_1.0.0
[19] crayon_1.4.1 purrr_0.3.4 vctrs_0.3.8 curl_4.3.2 glue_1.4.2 stringi_1.7.4
[25] compiler_4.1.0 pillar_1.6.2 generics_0.1.0 pkgconfig_2.0.3

I am also experiencing this problem.

Here is my workaround to your problem.

The problem is that the SEC wants the scraper to be identified in what it is called user-agent.

Before placing my request for data I execute ...

     options(HTTPUserAgent = "your name here   my_name@domain.com")

The user name is only remembered during the current session.

With this workaround, everything works fine for me, no more errors 403

VS

I used vsoler's suggestion to use the options statement and I'm still having trouble:

GetIncome("MA", 2020)
Error in fileFromCache(file.inst) : 
  Error in download.file(file, cached.file, quiet = !verbose) : 
  cannot open URL 'https://www.sec.gov/Archives/edgar/data/1141391/000114139120000032/ma-20191231.xml'

In addition: Warning messages:
1: In download.file(file, cached.file, quiet = !verbose) :
  downloaded length 0 != reported length 324
2: In download.file(file, cached.file, quiet = !verbose) :
  cannot open URL 'https://www.sec.gov/Archives/edgar/data/1141391/000114139120000032/ma-20191231.xml': HTTP status was '404 Not Found'

According to the SEC the user-agent must be used in the request header.

Hi guys,

Any chances of having an update solving the pb here?
I am still running into errors despite using the user agent, but only for specific years.

My work around for this problem was to install two missing packages 'XBRL' and 'Rcpp'

Guys could you please suggest current solution for this problem? (HTTP error 403)
Secondly is this package actively maintained or not?
Thanks in advance!

There are several errors being conflated in this issue.

The 403 errors are because your clement is not authorised. This is because you have not set (or have improperly set) your User-Agent header and the SEC is saying you can’t have access.

The 404 error mentioned by @eweiss99 is because the file that finreportr is trying to download does not exist. The finreportr package guesses the correct file name of the submission file by adding the date to the ticker code (ma-20191231.xml). But, for whatever reason, the filer didn’t name their submission file like that. If you got to the actual accession web page, you see that the file is actually called ma12312019-10xk_htm.xml. This is a legit bug in finreportr because it is not correctly determining the file name.

IMO the best fix here would be for finreportr to actually download the header file for the accession number, extract the table with the file descriptions, and select the correct file name on the basis of the description.

I’ve got a bit of momentum here so I’ll try see if it’s a simple fix and make a pull request.

@vsoler's answer on

options(HTTPUserAgent = "your name here   my_name@domain.com")

worked like a charm. Hope this can be seen on the main readme page!