ByMykel/spanish-cities

Add the city's flag and coat of arms.

Opened this issue · 5 comments

We have to add the city's flag and coat of arms.

If someone want to help just go to cities.json and look for cities with those attributes to null.

I would like to help with this task as well.

I will do something later today. Wanna try convincing ChatGPT to help with extracting the image data from wikipedia. Did some attempts yesterday with provincies as well, but he ended up distracted all the time and was returning wrong URLs.

@AloisSeckar That would be very cool cause there are around 8000 cities. I have added a couple but seems like a long task to do by "hand".

So I am trying, but it is not very effecive right now. It says it has to fetch each image separately and often asks for permission to proceed. It is working, but it is slow.

However, I have learned a few things we might use to create some "import script":

UPDATE: The linked list of flag images for cities in A Coruna province is surely incomplete. And the list of coat of arms have several duplicate entries. But it is at least something to start with.

I made a first version of custom web crawler to get the actual Wiki image URLs - https://github.com/AloisSeckar/wiki-image-crawler

So far it "only" retrieves the list of image URLs from Wiki category pages (example), but unlike ChatGPT, it does it quickly. I will try to improve it soon, so it will be able to fill the retrieved data directly to cities.json file.

Simple python script to list all the images:

# List of flags of municipalities:
# https://commons.wikimedia.org/wiki/Category:SVG_flags_of_municipalities_of_Spain_by_province

# List of coats of arms of municipalities:
# https://commons.wikimedia.org/wiki/Category:SVG_coats_of_arms_of_municipalities_of_Spain_by_province

import os
import requests
from bs4 import BeautifulSoup
from urllib.parse import urljoin

# URL of the Wikimedia Commons category
url = "https://commons.wikimedia.org/wiki/Category:SVG_coats_of_arms_of_municipalities_of_La_Rioja_(Spain)"

response = requests.get(url)
soup = BeautifulSoup(response.text, 'html.parser')

with open("output.txt", "w", encoding="utf-8") as file:
    gallery_boxes = soup.find_all('li', class_='gallerybox')

    for gallery_box in gallery_boxes:
        relative_image_url = gallery_box.find('img')['src']
        
        image_url = urljoin(url, relative_image_url.replace('/thumb/', '/'))
        image_url = os.path.dirname(image_url)

        file_name = gallery_box.find('a', class_='galleryfilename')['title']

        gallery_text = gallery_box.find('div', class_='gallerytext').text.strip()

        file.write(f"Image URL: {image_url}\n")
        file.write(f"File Name: {file_name}\n")
        file.write("\n")

output.txt:

Image URL: https://upload.wikimedia.org/wikipedia/commons/1/1b/Escudo_de_%C3%81balos_%28La_Rioja%29.svg
File Name: File:Escudo de Ábalos (La Rioja).svg

Image URL: https://upload.wikimedia.org/wikipedia/commons/d/d7/Escudo_de_Agoncillo-La_Rioja.svg
File Name: File:Escudo de Agoncillo-La Rioja.svg

Image URL: https://upload.wikimedia.org/wikipedia/commons/8/8c/Escudo_de_Albelda_de_Iregua-La_Rioja.svg
File Name: File:Escudo de Albelda de Iregua-La Rioja.svg

Image URL: https://upload.wikimedia.org/wikipedia/commons/4/4e/Escudo_de_Alberite-La_Rioja.svg
File Name: File:Escudo de Alberite-La Rioja.svg

Image URL: https://upload.wikimedia.org/wikipedia/commons/4/46/Escudo_de_Alcanadre-La_Rioja.svg
File Name: File:Escudo de Alcanadre-La Rioja.svg