biglocalnews/court-scraper

Docs do not explain how to iterate and experiment when drafting a scraper

palewire opened this issue · 5 comments

There's a little bit of a gap in the contributing docs. Say you've got the repo installed. Say you want to write a scraper. Say you have identified a site to scrape. Say you have even started writing your Python Site class to implement a scraper.

Now, how do you run your scraper as you iterate and experiment?

My guess based on past experience is that you would install the CLI in development mode using setup.py. But there are other techniques as well, like a test-driven development where you add features as you write new unittests.

How do you do it? How would you recommend a newbie take this on?

In the past, I've used a technique like this:

pipenv run pip install --editable .

@palewire Great point. Definitely think it would be helpful for docs to fill in that workflow gap. I'd vote for initially docs that suggest a simple editable install for local manual testing as opposed to requiring TDD up front, with hope that would make contributing less of a slog for folks who haven't yet done formal unit testing.

In terms of documenting the workflow for a parallel editable install, what do you think about a basic example that uses the built-invenv? Or even a global/system install?

This is a situation where I don't know the different options well enough to have a discerning opinion. The --editable trick worked well for me, but it may have shortcomings I'm unaware of.

@palewire Oh, forgot to directly address your question about my workflow. I tend to use TDD (with pytest runs and then tox once the dust has settled), along with manual testing in an editable install. But I'd worry a little about requiring that workflow (which can be a PITA) for all potential contributors. I think if you've written code and tested manually and/or perhaps even have a few basic unit tests that you checked with pytest, we could rely on CI or testing by core maintainers for the last mile of automated testing. Feels like that could lower the bar to entry for contributions, but let me know what you think.

@palewire I think editable is a solid option, fwiw, and as mentioned above, makes it easier to get up and running. But shout back if you think we should recommend the TDD + editable workflow or some other variation that requires folks to get more deeply into unit testing before submitting PRs.