madpah/requirements-parser

Dataclass Support + Hashing Support

kabirkhan opened this issue · 0 comments

Hi I'm using this project to parse requirements and have a couple features I'd love to build into the package itself that I think other users would benefit from. Happy to submit a PR if both sound good. Thanks!

1. Dataclass Support

This would be a super small change of adding type annotations for all members. Basically adding the type definitions in #65 as annotations instead of inside __init__ and adding the dataclass decorator.

from dataclasses import dataclass


@dataclass
class Requirement(BaseModel):
    line: str
    editable: bool = False
    local_file: bool = False
    specifier: bool = False
    vcs: ty.Optional[str] = None
    name: ty.Optional[str] = None
    subdirectory: ty.Optional[str] = None
    uri: ty.Optional[str] = None
    path: ty.Optional[str] = None
    revision: ty.Optional[str] = None
    hash_name: ty.Optional[str] = None
    hash: ty.Optional[str] = None
    extras: ty.Optional[ty.List[str]] = None
    specs: ty.Optional[ty.List[ty.Tuple[str, str]]] = None

    def __init__(self, line: str) -> None:
        ...

Benefits

  • Lot's of tools parse dataclass types.
  • Easy Pydantic support so the Requirement could be passed around web requests in FastAPI requests/responses.

Cons

  • No downsides really, it's a super simple change and none of the existing functionality goes away

2. Hashing support

My use case is building remote virtual environments that users can execute code in. Building venvs can be a little expensive so I'd like to add caching support by hashing the Environment name + set of requirements.

Currently, the Requirement type is not hashable or sortable.
This can be achieved by just adding the __hash__ method and returning the hash of the line member + adding the built in methods to sort by the Requirement name or line member.

Proposed methods

@dataclass
class Requirement:
    ...
    def __hash__(self) -> int:
        return hash(self.line)

    def __getitem__(self, key: str) -> str:
        return getattr(self, key)

    def __lt__(self, other: "Requirement") -> bool:
        return (self.name) < (other.name)

    def __gt__(self, other: "Requirement") -> bool:
        return (self.name) > (other.name)

    def __le__(self, other: "Requirement") -> bool:
        return (self.name) <= (other.name)

    def __ge__(self, other: "Requirement") -> bool:
        return (self.name) >= (other.name)

With these methods, I can run a good cache check to see if a list of requirements actually changed.

Usage

requirements.txt

httpx
requests

other_requirements.txt

requests
httpx

test.py
The above requirements files are technically the same even though the file contents are seen as different.
Usage

from requirements import parse
with open("./requirements.txt") as f:
    requirements = list(parse(f.read()))

with open("./other_requirements.txt") as f:
    other_requirements = list(parse(f.read()))

sorted_requirements = sorted(set(requirements)) # set and sorted currently won't work
sorted_other_requirements = sorted(set(other_requirements))

assert sorted_requirements == sorted_other_requirements

from