voc_scraper_nlp_api

image

Click the link if you want to know about the project in more details: Medium Article


Preprocessing for NLP

image image

image

  • Dropping NaN Reviews ( for both 'Title' and 'Content' empty) image

  • Post_date

image

It appears that the date format 'Apr 1, 2023' is causing an issue when trying to parse it using the new date parser introduced in Spark 3.0 due to inconsistencies in how certain date formats are handled by the new parser. To resolve this, you can set the Spark configuration to use the legacy time parser policy by adding the following line: spark.conf.set("spark.sql.legacy.timeParserPolicy", "LEGACY")

image

image

a