Dataclass Support + Hashing Support
kabirkhan opened this issue · 0 comments
Hi I'm using this project to parse requirements and have a couple features I'd love to build into the package itself that I think other users would benefit from. Happy to submit a PR if both sound good. Thanks!
1. Dataclass Support
This would be a super small change of adding type annotations for all members. Basically adding the type definitions in #65 as annotations instead of inside __init__
and adding the dataclass decorator.
from dataclasses import dataclass
@dataclass
class Requirement(BaseModel):
line: str
editable: bool = False
local_file: bool = False
specifier: bool = False
vcs: ty.Optional[str] = None
name: ty.Optional[str] = None
subdirectory: ty.Optional[str] = None
uri: ty.Optional[str] = None
path: ty.Optional[str] = None
revision: ty.Optional[str] = None
hash_name: ty.Optional[str] = None
hash: ty.Optional[str] = None
extras: ty.Optional[ty.List[str]] = None
specs: ty.Optional[ty.List[ty.Tuple[str, str]]] = None
def __init__(self, line: str) -> None:
...
Benefits
- Lot's of tools parse dataclass types.
- Easy Pydantic support so the
Requirement
could be passed around web requests in FastAPI requests/responses.
Cons
- No downsides really, it's a super simple change and none of the existing functionality goes away
2. Hashing support
My use case is building remote virtual environments that users can execute code in. Building venvs can be a little expensive so I'd like to add caching support by hashing the Environment name + set of requirements.
Currently, the Requirement type is not hashable or sortable.
This can be achieved by just adding the __hash__
method and returning the hash of the line
member + adding the built in methods to sort by the Requirement name
or line
member.
Proposed methods
@dataclass
class Requirement:
...
def __hash__(self) -> int:
return hash(self.line)
def __getitem__(self, key: str) -> str:
return getattr(self, key)
def __lt__(self, other: "Requirement") -> bool:
return (self.name) < (other.name)
def __gt__(self, other: "Requirement") -> bool:
return (self.name) > (other.name)
def __le__(self, other: "Requirement") -> bool:
return (self.name) <= (other.name)
def __ge__(self, other: "Requirement") -> bool:
return (self.name) >= (other.name)
With these methods, I can run a good cache check to see if a list of requirements actually changed.
Usage
requirements.txt
httpx
requests
other_requirements.txt
requests
httpx
test.py
The above requirements files are technically the same even though the file contents are seen as different.
Usage
from requirements import parse
with open("./requirements.txt") as f:
requirements = list(parse(f.read()))
with open("./other_requirements.txt") as f:
other_requirements = list(parse(f.read()))
sorted_requirements = sorted(set(requirements)) # set and sorted currently won't work
sorted_other_requirements = sorted(set(other_requirements))
assert sorted_requirements == sorted_other_requirements
from