Pathos315/sciscraper

Feedback

Opened this issue · 7 comments

Here's some feedback after a quick look over the code.

Some of the feedback here is mostly related to style and formatting. Although that's usually not going to keep the code from working, that can impact readability.

I've linked some sections of the PEP8 and other sources that I usually follow.

One thing I would suggest, but this is mostly personal preference, is to use a code formatter and linter, such as Black and Flake8. Those, among other tools, will greatly help with keeping a consistent style and making the code easier to read.

If you're interested, here's a nice list of the tools I personally use:
https://github.com/MicaelJarniac/BuildURL/blob/main/CONTRIBUTING.md (expand the "Quick Reference" section)


https://github.com/Pathos315/pdfcurate/blob/63f41324d71a6e8425dfd8693bad3638237da80f/altscraper/program_v008.py#L55-L59
I think that this for loop will iterate over the first element only, and immediately return, without iterating through the other elements.

One way of working around that would be to create an empty list before the for, and instead of returning inside the loop, appending to that list, and then returning that list outside the for loop, after it's done.

Another option would be to use yield instead of return, thus turning that function into a generator.


https://github.com/Pathos315/pdfcurate/blob/63f41324d71a6e8425dfd8693bad3638237da80f/altscraper/program_v008.py#L1

Imports should usually be on separate lines
https://pep8.org/#imports


https://github.com/Pathos315/pdfcurate/blob/63f41324d71a6e8425dfd8693bad3638237da80f/altscraper/program_v008.py#L16
https://github.com/Pathos315/pdfcurate/blob/63f41324d71a6e8425dfd8693bad3638237da80f/altscraper/program_v008.py#L17
https://github.com/Pathos315/pdfcurate/blob/63f41324d71a6e8425dfd8693bad3638237da80f/altscraper/program_v008.py#L26
https://github.com/Pathos315/pdfcurate/blob/63f41324d71a6e8425dfd8693bad3638237da80f/altscraper/program_v008.py#L34

Avoid extraneous whitespace in the following situations
Immediately before the open parenthesis that starts the argument list of a function call
https://pep8.org/#whitespace-in-expressions-and-statements


https://github.com/Pathos315/pdfcurate/blob/63f41324d71a6e8425dfd8693bad3638237da80f/altscraper/program_v008.py#L33

# Add default value for an argument after the type annotation
def f(num1: int, my_float: float = 3.5) -> float:
    return num1 + my_float

https://mypy.readthedocs.io/en/stable/cheat_sheet_py3.html#functions

Another thing, on many try, except blocks, there's an explicit pass statement after other expressions. I believe the pass is only necessary if the block would otherwise be empty. If there are already other expressions in a block, the pass is redundant.

Needs pass:

try:
    do_stuff()
except SomeException:
    pass

Doesn't need pass:

try:
    do_stuff()
except SomeException:
    print("workn't")

Also, if using an empty except block, it might be nice to instead use the suppress helper:

from contextlib import suppress
with suppress(SomeException):
    do_stuff()

https://docs.python.org/3/library/contextlib.html#contextlib.suppress

https://github.com/Pathos315/pdfcurate/blob/63f41324d71a6e8425dfd8693bad3638237da80f/altscraper/program_v008.py#L61-L69
Multiple exceptions can be handled by the same except block:

 except (json.decoder.JSONDecodeError, KeyError) as err: 
     print(f"{search_text} failed to serialize to Python object, moving to next available object. Reason Given:", err) 
     logging.error(err) 

https://docs.python.org/3/tutorial/errors.html#handling-exceptions

https://github.com/Pathos315/pdfcurate/blob/63f41324d71a6e8425dfd8693bad3638237da80f/altscraper/program_v008.py#L80-L81
https://github.com/Pathos315/pdfcurate/blob/63f41324d71a6e8425dfd8693bad3638237da80f/altscraper/program_v008.py#L91-L92
You could avoid repetition, keeping the code "DRY" (Don't Repeat Yourself) by assigning those strings to variables.
That way, if you ever decide to change what those strings say, you only need to do this once, and it'll propagate correctly.

https://en.wikipedia.org/wiki/Don't_repeat_yourself

Agreed on all points, will work these into the next release/patch

https://github.com/Pathos315/pdfcurate/blob/63f41324d71a6e8425dfd8693bad3638237da80f/altscraper/program_v008.py#L38-L51
On this first try block, the response of the request is assigned to r, but that request could fail.

If it did fail, one of the except blocks would run, but they don't do anything special to the exception, other than printing it and letting the code continue running.

So subsequent parts of code that rely on r will still run, as if everything was successful, even if the request wasn't successful.

It could be a good idea to re-raise the exception on those except blocks, as to halt the execution of the rest of the function, since r would be a bad response.

One way of doing that is simply adding raise to the end of the except blocks, like:

 try:
     r = requests.request("GET", URL_DMNSNS, params=querystring)
     r.raise_for_status()
     logging.info(r.status_code)
      
 except requests.exceptions.HTTPError as errh:
     print("HTTP Error Encountered, moving to next available object. Reason Given:", errh)
     logging.error(r.status_code, errh)
     raise
  
 except requests.exceptions.RequestException as e:
     print("Unexcepted error encountered, moving to next available object. Reason Given:", e)
     logging.error(r.status_code, e)
     raise

https://docs.python.org/3/tutorial/errors.html#raising-exceptions

The problem of doing it this way is that this function will then raise exceptions, that will need to be handled elsewhere.

Another option would be to explicitly return None when those exceptions happen. That'd immediately exit the function, skipping the parts of it that rely on a good r, while not having to worry about raising and handling exceptions elsewhere too.

And technically, this is the current behavior of this function, because if r is malformed (as in, there was an error in the request), the rest of this function will catch possible exceptions caused by that, and will suppress those, implicitly returning None at the end.

So by explicitly returning None on those first few except blocks, you'd be avoiding further exceptions down the line, while mostly maintaining the current behavior of the code.