Jigsaw Unintended Bias in Toxicity Comment

This repository utilizes the Jigsaw comment data to analyze, visualize, and classify the toxic comments.

Part 1: EDA on comment text

The data contains the target score (or toxicity score) with other targets, which can be used. However, this repository is focusing on the toxic score / class.

The notebook demonstrates the usage of word cloud with masked image. The below code shows the toxic class word cloud.

mask = np.array(Image.open(os.path.join(PATH, "../img/trump2.png")))
wc = WordCloud(background_color="white", max_words=2000, mask=mask,
               contour_width=1, contour_color='grey',
               max_font_size=40, random_state=SEED)

wc.generate_from_frequencies(toxic_freq_dist)
image_colors = ImageColorGenerator(mask)
plt.figure(figsize=(12, 6))
plt.imshow(wc.recolor(color_func=image_colors), interpolation="bilinear")
plt.axis("off")
plt.title(r"Top words in $\bf{Toxic}$ comments")
plt.show();

Similar coding can be used to visualize the non toxic class word as well.

Part 2: Model

TBC

netsatsawat/Jigsaw-Toxicity-Comment

Jigsaw Unintended Bias in Toxicity Comment

Part 1: EDA on comment text

Part 2: Model