Feedback
Opened this issue · 7 comments
Here's some feedback after a quick look over the code.
Some of the feedback here is mostly related to style and formatting. Although that's usually not going to keep the code from working, that can impact readability.
I've linked some sections of the PEP8 and other sources that I usually follow.
One thing I would suggest, but this is mostly personal preference, is to use a code formatter and linter, such as Black and Flake8. Those, among other tools, will greatly help with keeping a consistent style and making the code easier to read.
If you're interested, here's a nice list of the tools I personally use:
https://github.com/MicaelJarniac/BuildURL/blob/main/CONTRIBUTING.md (expand the "Quick Reference" section)
https://github.com/Pathos315/pdfcurate/blob/63f41324d71a6e8425dfd8693bad3638237da80f/altscraper/program_v008.py#L55-L59
I think that this for
loop will iterate over the first element only, and immediately return, without iterating through the other elements.
One way of working around that would be to create an empty list before the for
, and instead of returning inside the loop, appending to that list, and then returning that list outside the for
loop, after it's done.
Another option would be to use yield
instead of return
, thus turning that function into a generator.
Imports should usually be on separate lines
https://pep8.org/#imports
https://github.com/Pathos315/pdfcurate/blob/63f41324d71a6e8425dfd8693bad3638237da80f/altscraper/program_v008.py#L16
https://github.com/Pathos315/pdfcurate/blob/63f41324d71a6e8425dfd8693bad3638237da80f/altscraper/program_v008.py#L17
https://github.com/Pathos315/pdfcurate/blob/63f41324d71a6e8425dfd8693bad3638237da80f/altscraper/program_v008.py#L26
https://github.com/Pathos315/pdfcurate/blob/63f41324d71a6e8425dfd8693bad3638237da80f/altscraper/program_v008.py#L34
Avoid extraneous whitespace in the following situations
Immediately before the open parenthesis that starts the argument list of a function call
https://pep8.org/#whitespace-in-expressions-and-statements
# Add default value for an argument after the type annotation
def f(num1: int, my_float: float = 3.5) -> float:
return num1 + my_float
https://mypy.readthedocs.io/en/stable/cheat_sheet_py3.html#functions
Another thing, on many try
, except
blocks, there's an explicit pass
statement after other expressions. I believe the pass
is only necessary if the block would otherwise be empty. If there are already other expressions in a block, the pass
is redundant.
Needs pass
:
try:
do_stuff()
except SomeException:
pass
Doesn't need pass
:
try:
do_stuff()
except SomeException:
print("workn't")
Also, if using an empty except
block, it might be nice to instead use the suppress
helper:
from contextlib import suppress
with suppress(SomeException):
do_stuff()
https://docs.python.org/3/library/contextlib.html#contextlib.suppress
https://github.com/Pathos315/pdfcurate/blob/63f41324d71a6e8425dfd8693bad3638237da80f/altscraper/program_v008.py#L71
https://github.com/Pathos315/pdfcurate/blob/63f41324d71a6e8425dfd8693bad3638237da80f/altscraper/program_v008.py#L74
# Use Optional[] for values that could be None
x: Optional[str] = some_function()
https://mypy.readthedocs.io/en/stable/cheat_sheet_py3.html#built-in-types
https://github.com/Pathos315/pdfcurate/blob/63f41324d71a6e8425dfd8693bad3638237da80f/altscraper/program_v008.py#L20
There are better ways of removing NaN
values from Pandas objects, like:
https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.dropna.html
https://github.com/Pathos315/pdfcurate/blob/63f41324d71a6e8425dfd8693bad3638237da80f/altscraper/program_v008.py#L61-L69
Multiple exceptions can be handled by the same except
block:
except (json.decoder.JSONDecodeError, KeyError) as err:
print(f"{search_text} failed to serialize to Python object, moving to next available object. Reason Given:", err)
logging.error(err)
https://docs.python.org/3/tutorial/errors.html#handling-exceptions
https://github.com/Pathos315/pdfcurate/blob/63f41324d71a6e8425dfd8693bad3638237da80f/altscraper/program_v008.py#L80-L81
https://github.com/Pathos315/pdfcurate/blob/63f41324d71a6e8425dfd8693bad3638237da80f/altscraper/program_v008.py#L91-L92
You could avoid repetition, keeping the code "DRY" (Don't Repeat Yourself) by assigning those strings to variables.
That way, if you ever decide to change what those strings say, you only need to do this once, and it'll propagate correctly.
Agreed on all points, will work these into the next release/patch
https://github.com/Pathos315/pdfcurate/blob/63f41324d71a6e8425dfd8693bad3638237da80f/altscraper/program_v008.py#L38-L51
On this first try
block, the response of the request is assigned to r
, but that request could fail.
If it did fail, one of the except
blocks would run, but they don't do anything special to the exception, other than printing it and letting the code continue running.
So subsequent parts of code that rely on r
will still run, as if everything was successful, even if the request wasn't successful.
It could be a good idea to re-raise the exception on those except
blocks, as to halt the execution of the rest of the function, since r
would be a bad response.
One way of doing that is simply adding raise
to the end of the except
blocks, like:
try:
r = requests.request("GET", URL_DMNSNS, params=querystring)
r.raise_for_status()
logging.info(r.status_code)
except requests.exceptions.HTTPError as errh:
print("HTTP Error Encountered, moving to next available object. Reason Given:", errh)
logging.error(r.status_code, errh)
raise
except requests.exceptions.RequestException as e:
print("Unexcepted error encountered, moving to next available object. Reason Given:", e)
logging.error(r.status_code, e)
raise
https://docs.python.org/3/tutorial/errors.html#raising-exceptions
The problem of doing it this way is that this function will then raise exceptions, that will need to be handled elsewhere.
Another option would be to explicitly return None
when those exceptions happen. That'd immediately exit the function, skipping the parts of it that rely on a good r
, while not having to worry about raising and handling exceptions elsewhere too.
And technically, this is the current behavior of this function, because if r
is malformed (as in, there was an error in the request), the rest of this function will catch possible exceptions caused by that, and will suppress those, implicitly returning None
at the end.
So by explicitly returning None
on those first few except
blocks, you'd be avoiding further exceptions down the line, while mostly maintaining the current behavior of the code.