Finnhub-Stock-API/finnhub-python

Latest version on PyPi doesn't fetch stock candles

stephenrs opened this issue · 22 comments

I recently discovered that the library that I was using previously is no longer being served by PyPi, and the project has been archived on Github (https://github.com/s0h3ck/finnhub-api-python-client). So, I switched to this client library.

Unfortunately, however, it doesn't appear to be working properly. When fetching data using the stock_candles method (for any stock that actually has data), the following result is returned:

{'c': None, 'h': None, 'l': None, 'o': None, 's': None, 't': None, 'v': None}

So, I'm currently just using the python requests library and hitting the REST endpoints directly, which obviously isn't as convenient.

Any thoughts would be appreciated.

Hi @stephenrs thank you for reporting this, we are on it.

Hi @stephenrs we have made the fix, can you try 1.1.6 and let us know. Thank you for reporting, we are so sorry for this issue

@nongdenchet Thanks for looking into this. v1.1.6 is returning data, but there's something different about the structure of the json compared to the json that is returned by the other library that I was using, and compared to the json returned by requests.get(). I haven't looked closely enough to see what the problem is, or if the response needs to be transformed in some way, but it can be summarized like this:

        ```
        resp = requests.get(url, timeout=5.0)
        df = pd.DataFrame(resp.json())

        # df is a proper DataFrame, ready for additional processing

        json = fh.stock_candles(
            symbol,
            resolution,
            first,
            last,
            adjusted=True
        )

        df = pd.DataFrame(json)

        # throws the exception "DataFrame constructor not properly called!"

@stephenrs it would be nice if you could be more specific about the problem you are facing.

@nongdenchet The problem is that I have 2 techniques for fetching data from the Finnhub API that both return JSON that can be readily consumed by the DataFrame constructor. One is the previous library that I mentioned, and the other is a simple call using the requests.get() method. Why the response data from this API client throws an exception in the DataFrame constructor is the mystery.

So unfortunately, at this point trying to get the official API client to drop in and "just work" has been costing me time that I unfortunately can't afford at the moment. And, since there isn't any documentation or examples for how to get the data returned by this client into a usable form (like a DataFrame), the only way for me to be more specific about the problem I'm having would be for me to closely inspect the response data to find out why the DataFrame constructor is rejecting it, which again, I just don't have time for, particularly since I already have working code, as above.

As a side note, I took a quick look at the code for this client, and it appears to be from some kind of code generator, which I think is a bad idea for this use case, IMHO. There's just too much unnecessary mystery code and dependencies and I would predict that it will be hard to get and keep this library stable. So, I've starting writing my own little API client based on requests.get(), that includes retry logic and exception handling. Right now it only handles a few endpoints, but I'll add to it as needed.

Hi @stephenrs seem like you have issue with models <-> dictionary. Can you try the suggestion from @finnhubio from this issue #2 and let us know? We will try to make it more friendly from our side.

@nongdenchet Thanks for the reply, but I think the issue is that I was expecting the client to return json, but it doesn't...maybe I'm just lazy or have unreasonable expectations...nothing to do with models or dictionaries.

So, I think the suggestion on #2 seems more like a workaround than a solution. I agree with the OP...it seems to me that an API client in this context should directly return json or some other common format that you can configure/specify (without writing additional conversion code).

So I think I'll stick with what I'm doing with the requests library for now. It provides a lot of the built-in features that I want (like timeouts and returning json), and I don't need full FH API coverage for now.

I mean...think about it...the REST endpoint already returns json directly, so why is the API client converting to a new nonstandard format that needs to get converted back to json? Doesn't make much sense to me, and It creates unnecessary extra work for people who want to use it. The whole point of an API client is to make it so that you have to write less code to get stuff done.

I just realized the the /stock_candles endpoint already allows you to specify json or csv for the response...with json being the default. Shouldn't the API client preserve this feature? What's the point of having to iterate through the response to get json, as in the suggestion in #2 ? Besides the obvious performance impact this unnecessary iteration will have.

It really comes down to design choice in this case. Most languages other than python always prefer to get the data in an object instead of a map/json. Though I understand where you are coming from, we will take into account the feedback and improve the design in the next release.

p/s: Having objects/models increase the consistency of data. This approach is widely adopted by all big tech companies even in Python. If you take a look at Python libraries of google, using object is almost always the case. For example: Python protobuf library.

I hear you, but I think those design principles might be misapplied in this case. For my purposes, the most important thing about moving price data around is speed, every millisecond counts. There are no structural data relationships to keep "consistent", it's isolated read-only data, and the best way to make sure it gets from point A to point B as fast as possible without being altered - is to not alter it unless necessary.

Right now, the api client is altering the source data, then suggesting that I alter it again before I can use it. Not only is this going to slow things down with no justification/benefit that I can see, it's also going to introduce new points of failure in a sensitive part of any system. We've already seen that happen that with this particular ticket/issue.

So, for my project, in my judgment, it will lead to a more solid build if I rely on the super mature requests library to deliver data as fast as humanly possible in exactly the format(s) I need, while providing all sorts of useful features at the same time.

So, in practical terms, at least in the case of the /stock_candles endpoint, this api saves a little coding time by not having to write the URL yourself, but in every other way it serves as a detriment to the task of fetching candles: it makes it harder to code with, removes features of the underlying FH API, and it ensures that fetches will take longer than other readily available alternatives. My 2 cents.

Ultimately, I guess it depends on the individual use case, and what folks plan to do with the data once they've fetched it.

And perhaps the best question is: is it the job of an API client/wrapper to make design choices for developers?....

"Just give me the data, save me some time/code, and stay out of my way"....isn't that a reasonable set of objectives for an API wrapper?

Maybe build a middleware library for people who want to work with price data in an object/relational world. Working with this wrapper should give you exactly what you would get from the underlying REST endpoints. I'm not sure there exists a good argument against that - that's what a wrapper is - it's meant to be transparent. No?

And I'll just chime in with...we've come full circle. The point I was making with the code snippet above is that an API wrapper that doesn't return the same data as the underlying API is...well, just broken. I was actually surprised that @nongdenchet asked me what the problem was.

The problem is that if I have to learn a new API in order to use a wrapper to make use of the underlying API that I already understand, then the API wrapper is just of no use to me - it's just getting in the way without any tangible benefit.

I hope I don't sound overly critical, because that's not my intention. I'm just throwing out dev/design thoughts, and hopefully some of it resonates. We're on the same team.

Most languages other than python always prefer to get the data in an object instead of a map/json.

Also...if Finnhub believes this as an organization, then the REST endpoints should make the same choices and assumptions...What you're doing now with this library is a bit schizophrenic, no?

In my humble opinion, from an architecture point of view, https://github.com/s0h3ck/finnhub-api-python-client/blob/master/finnhub/client.py is a more rational and easily maintained starting point for a client/wrapper, and it's about 10% of the code of the official library. What exactly has been gained by the extra 90% of the code (aside from doc generation, project bloat, inconsistency, instability, and guaranteed developer/consumer confusion)?

...just read some docs on protobuf...and I get it, seems quite useful for some use cases...just doesn't seem to have natural application to rapidly fetching read-only time series data. "design overkill" is the phrase that comes to mind.

...and if you want to make a design choice that will do a favor for the vast majority of python developers...make "dataframe" an option for return data format...wrappers are about convenience and accelerating development...know your audience.

As I said earlier, we do understand where you are coming from and we will incorporate this feedback into our next release.

....and for the benefit, adoption, and sustainability of the project, I'd tend to hope the next release is a total reboot/rethink that considers that this library is a wrapper to an existing API - it's not a new API or a development framework...some code can't be fixed, sometimes it just needs to be replaced. I personally can't see trusting this library if I have to dig through layers of model transformation code to be sure that it's not polluting the source data in some way or impacting performance needlessly (and the auto-generated code looks like a nightmare to maintain). Or, maybe it would make sense to publish the requirements/design objectives of this library, in case I'm alone in wanting a simple python wrapper. just 2 more cents.

I know I'm probably being a pain, but I'm really just trying to help...

One other reason this massive openapi-generated project is a bad idea is that it makes it difficult/impossible for developers in the community to help, or to help themselves. I can look at the @s0h3ck project and in 5 minutes I can see that it's doing everything that it needs to do, it's clean, not doing anything stupid/wasteful, and it's ready to be extended with new features if desired. And, if a flaw surfaces (which is unlikely, given how tight/small it is), I know that I can quickly point out the exact problem or modify the code myself.

With the official library, if another flaw surfaces I can look forward to hours/days of trying to figure out how openapi works (a completely unnecessary dependency), and reading github issues and SO discussions before I can even take a guess at what might be going wrong. This makes it a dangerous piece of code to integrate into any serious project, given the simple task it needs to implement. There's just way too much code and way too many levels of architectural indirection for the job at hand...and it looks like a lot of the code just duplicates what you can already get from python (executed in C) for free.

I know it's hard to throw away code that you've worked hard on, but I guess I'm just trying to encourage you to see that maybe this is one of those occasions to do it. Better now than later. This project in its current form looks like a dead end to me. The fact that a critical endpoint like /stock_candles got released in a totally non-functional state is a pretty bad sign for the health of a project. Seems like simplifying the code base would be a strong positive move for everyone involved.

I guess also it just concerns me to see that using a list comprehension to iterate through every single record of every single response to get json from a High Usage json endpoint was considered to be a viable solution (#2). Something is not right here (I don't need to "try it" to see if it works, I can look at it and know that it's a bad idea).

In other words, making the client return proper json in the next release wouldn't solve the more profound problems I see in this library, from a wider engineering perspective...and if you can't tell, I love to talk about architecture. I've been building software for a long time.

I'll try to make this the last comment to encourage killing this beast of a project: If you can accomplish a task with x amount of code, what possible reason is there to use 10x to 20x more code? Please kill this beast or tell me what I'm missing.

Just swap in the project by @s0h3ck and forget about this monster - and add features as people seem to want/need them..so you have time to focus on other things...and/or make this project the middleware for people who are building CRUD apps for market data.

@stephenrs we have released version 2.0.0 that addressed your issue. Please try it out and let us know

This is great. I haven't had a chance to try it out yet, but the code looks great. A massive improvement, nice work. Thanks for putting up with my rambling and taking this seriously.

One minor suggestion I might make is that maybe the request timeout should be configurable. It looks like the default is 10 seconds, which might be long for some ("fail fast") use cases.