elsevierlabs-os/AnnotationQueryPython

Incorrect imports style masks builtin Python operations

Opened this issue · 0 comments

A few modules of the code imports the entire pyspark SQL function like so:
from pyspark.sql.functions import *

This is an anti-pattern which has the side effect of causing collisions with the builtin Python functions of with the same name as available PySpark functions, e.g. sum, max, etc. The end result is that you can't invoke the builtin functions within UDFs as you get signature mismatches and odd errors in Python notebooks. The imports should be of the format:
from pyspark.sql import functions as F

and then reference the PySpark variants using the 'F' prefix as needed.