๐๏ธ merge `types-scipy-sparse` into `scipy-stubs`
Closed this issue ยท 12 comments
@BarakKatzir has been developing the types-scipy-sparse
stub package for scipy.sparse
. A large portion appears to more complete that the scipy.sparse
stubbs in scipy-stubs
.
I have talked with @BarakKatzir about joining forces, and we both agreed that gradually merging types-scipy-sparse
into scipy-stubs
is not only a good idea, but also a feasible one.
@BarakKatzir, how about we could use this issue as central planning on how we can go about this? So e.g. how we can "chop it up" into PR's, and in which order we do so.
related:
Good idea. Currently we talked about the first PR including:
1.Moving a single sparse array type (csc / csc etc.) from types-scipy-sparse
package to scipy-stubs
. Add inheritance from _spbase
class (since in my package _spbase
isn't type-annotated and you mentioned that despite being private it is used by users).
2. I probably have to add generics to _spbase
as well, since there are some methods which need it (e.g., real
). This might also be in the form of generated code.
2. Regarding the code generation - I'll start by copying my tool (just for the relevant sparse type in the PR). It's a rough tool, but it's a good starting point IMO.
After the first PR we'll have a better idea of issues in the merger and we can plan ahead.
- I probably have to add generics to
_spbase
as well, since there are some methods which need it (e.g.,real
). This might also be in the form of generated code.
Do you think that it would help if we defer the Generic
bits, and re-introduce them once we have the fundamentals in place? You used __ceil__
as example to show how generics lead to unavoidable code duplication. From that perspective, omitting the generics would be like compressing a file before you transfer it, and decompressing it when all files are received.
- Regarding the code generation - I'll start by copying my tool (just for the relevant sparse type in the PR). It's a rough tool, but it's a good starting point IMO.
The codegen/
directory seems like the right place for that. But I'm not really sure on what the best approach for integrating this in the CI, and perhaps also pre-commit, would be.
The current codegen stuff is mostly meant for one-time only use, and is therefore not part of the CI.
But if we decide to defer the generics, do you still think it would be needed?
Because as I mentioned in #47, I plan to integrate unpy
in scipy-stubs
soon, which also does code-generation. So I'm sure you can imagine that having two codegens together could become a bit messy, if not done right.
I think that the ideal scenario would be to achieve the same DRY result as your template-based approach through unpy
, because unpy
code is (and probably always will be) syntactically valid Python. This means that IDE's still understand it, and all of our linters and type-checkers can also be used in the same way.
After the first PR we'll have a better idea of issues in the merger and we can plan ahead.
Yea that makes sense. Sometimes I forget that premature optimization is the root of all evil.
Okay, you're right. Since this is a big change we should aim at doing it as small as possible, but still in a way we can see how it will scale up. How about I add generic to the child class but only use it on a single method that isn't in _spbase
? that way the changes will be contained to a single pyi
file?
I don't think unpy
is the correct code generation tool for the job, but maybe I'm misunderstanding it. The code generation here is very specific to scipy-stubs
, and is not directly related to downgrading stubs to previous python versions. Adding overrides that depend on generics and the specific method at hand doesn't sound like something unpy
is intended to do, no?
I'll put a suggestion here and let me know what you think:
We can have some meta stub file _csr_meta.pyi
that contains a compressed type annotations. This file will not be distributed in scipy-stubs
and sit elsewhere in the repo, maybe under codegen/compressed_stubs/
. We can write a tool that decompresses it to _csr.pyi
which will be distributed under scipy-stubs
. Regarding CI we can add a check that compares the generated _csr.pyi
with the one currently present in the commit.
In this approach, having the _csr.pyi
in the repo is redundant and can be annoying since we have this _csr.pyi
as a source file which we don't actually want to edit by hand, but this keeps the build simple.
Later on, when we're ready, we can apply unpy
to the generated _csr.pyi
stub and downgrade it to older python versions.
Okay, you're right. Since this is a big change we should aim at doing it as small as possible, but still in a way we can see how it will scale up. How about I add generic to the child class but only use it on a single method that isn't in
_spbase
? that way the changes will be contained to a singlepyi
file?
Yea that makes sense ๐๐ป
I don't think
unpy
is the correct code generation tool for the job, but maybe I'm misunderstanding it. The code generation here is very specific toscipy-stubs
, and is not directly related to downgrading stubs to previous python versions. Adding overrides that depend on generics and the specific method at hand doesn't sound like somethingunpy
is intended to do, no?
You're right in saying that at the moment unpy
won't be able to help much with this. But I can imagine that once jorenham/unpy#97 and jorenham/unpy#98 are implemented, that we'd be able to achieve similar DRY results as by using a template-based approach in the case of methods.
I'll put a suggestion here and let me know what you think: We can have some meta stub file
_csr_meta.pyi
that contains a compressed type annotations. This file will not be distributed inscipy-stubs
and sit elsewhere in the repo, maybe undercodegen/compressed_stubs/
. We can write a tool that decompresses it to_csr.pyi
which will be distributed underscipy-stubs
. Regarding CI we can add a check that compares the generated_csr.pyi
with the one currently present in the commit.
NumPy does something similar with template-based code generation. They use a {module}.py.in
naming scheme for templates that generate {module}.py
. See e.g. numpy/__config__.py.in
. This way IDE's, linters, and type-checkers won't try to parse the template file, which would inevitably cause them to report errors. For the most part, keeping the template in a separate directory could avoid this, you'd still have the IDE that wouldn't be able to parse it (e.g. VSCode only considers filename extensions when selecting a file type I believe).
As I also mentioned in jorenham/unpy#61, I think that for any code-generation approach it is important to make it clear to the user that the code they're looking at is fully generated. Otherwise there's the risk of them submitting PR's that (only) change this generated code, instead of its source template.
I also like numpy's approach of having the template in the same place as the source, so that it's easier to find it if you're looking for it, but forgot it's template-based (which, given my scatterbrain, is bound to happen to me, lol).
In this approach, having the
_csr.pyi
in the repo is redundant and can be annoying since we have this_csr.pyi
as a source file which we don't actually want to edit by hand, but this keeps the build simple. Later on, when we're ready, we can applyunpy
to the generated_csr.pyi
stub and downgrade it to older python versions.
That seems like a good plan actually; kickstarting it in a dry manner with a (no offense) "quick-and-dirty" approach, and then gradually introducing unpy
once we're ready to do so.
I have some experience with jinja
as a templating language. It has great python-like syntax that you can use to create macros and such, and it has very decent IDE integration, which includes support for the the jinja + python combo. So perhaps we could use that? Syntactically, it's actually rather similar to what you've already been using.
See https://github.com/koxudaxi/datamodel-code-generator/ for an example of how it can be used in a python codegen context.
Wow, I didn't know of jinja
. Looks like adapting my tool to a more comprehensive tool will not be such a hassle. The IDE integration is a great feature!
I'll start working on a PR and we can continue refining the framework over there.
How's it going @BarakKatzir ?
Hi, @jorenham. I'm progressing a bit slowly due to deadlines at work (and a family vacation and a sick kid).
It took a bit of trial and error to learn how to work with poethepoet
and jinja2
, but I think I got the basics of them.
Also, I'm trying not to downgrade some improvements that scipy-stubs
has over types-scipy-sparse
such as using @override
- but maybe I should just open a PR and we can refine it over there.
I'll hopefully open a PR in a couple of days.
I'll hopefully open a PR in a couple of days.
๐
Also, I'm trying not to downgrade some improvements that scipy-stubs has over types-scipy-sparse such as using @OverRide - but maybe I should just open a PR and we can refine it over there.
I believe that basedpyright is configured to require @override
at the moment.
If there are a lot of missing @overloads
, then maybe we could also consider writing a libcst codemod for it. But I suspect it'll probably be faster to manually do this when there are <100 or so cases.
It took a bit of trial and error to learn how to work with poethepoet and jinja2, but I think I got the basics of them.
Ah yea there's quite a bit of tooling involved. But in the long run those should all save time. If there's anything I can help, then don't hesitate to ask :)
@BarakKatzir Any updates on this @BarakKatzir?