/memgraph-wordcloud

async redis scraper & fancy wordcloud generator

Primary LanguagePython

creating a gradient word cloud

gradient on gradient white on gradient
masked image colour generator masked custom colouring

generating words

In theory we would like to choose a domain and get all relevant words from it. One way to do it is to scrape all text from heading and paragraph tags. We can do that using python and beautiful soup. If we did that synchronously it could take an hour or more, depending on the domain size. For that reason we can do it using the python asyncio package. This way the bottleneck becomes the internet speed and the maximum number of open files. On a Linux system that is usually 1024. For the task queue and the memory set we can use redis alongside the asynchronous aioredis package. Check out the implementation in scraper.py.

creating a mask

black & white colourful
Create by fitting the logo to the center of the image with gimp, selecting alpha to selection on the layer and growing it by a couple of pixels. Create by fitting the logo to the center of the image with gimp and then dilate it with imagemagick.

colourize

Simplest way to colour the word cloud is using one of the predefined matplotlib colormaps. A step up would be to use the colourful mask we created. The best option is to colour the text transparent and edit the rest in gimp.

colormap plasma colormap magma
colormap inferno mask colouring
custom colouring custom colouring