Creates a HTML page and a corresponding Excel file listing all Wikipedia articles (in all languages) in which (one or more) images from a given category on Wikimedia Commons are used.
Latest update: 1 March 2024
The script GLAMorousToHTML.py creates a HTML page and a corresponding Excel file listing all Wikipedia articles (in all languages) in which (one or more) images/media from a given category on Wikimedia Commons are used. It does so by converting the XML output of the GLAMorous tool.
The KB uses the GLAMorous tool to measure the use of KB media files (as stored in Wikimedia Commons) in Wikipedia articles. This tool rapports 4 things :
- 1 The total number of KB media files in Category:Media contributed by Koninklijke Bibliotheek (Category "Media contributed by Koninklijke Bibliotheek" has XXXX files.)
- 2 The total number of times that KB media files are used in WP articles (Total image usages).
- 3 The number of Wikipedia language versions in which KB media files are used (length of the table)
- 4 The number of unique KB media files that are used in Wikipedia articles in all those languages. (Distinct images used)
Please note: 'Total image usages' does NOT equal the number of unique WP articles! A single unique KB image can illustrate multiple unique WP articles, and/or the other way around, 1 unique WP article can contain multiple unique KB images. In other words: images-articles have many-to-many relationships.
What was still missing was the functionality to measure
- 5 The number of unique WP articles in which KB media files are used,
- 6 A manifest overview of those articles, grouped per WP language version,
- 7 A structured output format that can be easily processed by tools, such as Excel.
That is why we made the GLAMorousToHTML tool. This script uses the XML-output of GLAMorous to make an HTML page listing unique WP articles (in which one or more KB media files are used), grouped by language.
Per 14-02-2024 it also delivers an Excel file with equivalent data.
The script relies on the XML output of GLAMorous, which needs to be configured so that it only lists pages from Wikipedia
-
that are in the main namespace (a.k.a Wikipedia articles) (&ns0=1)
-
and not pages from Wikimedia Commons, Wikidata or other Wikimedia projects (projects[wikipedia]=1)
The base URL looks like https://glamtools.toolforge.org/glamorous.php?doit=1&use_globalusage=1&ns0=1&projects[wikipedia]=1&format=xml&category=. The Commons category of interest needs to be added to the end, omitting the Category: prefix. It is defined (and can be adapted) in the xml_base_url variable in setup.py.
By default the depth of the GLAMorous output is set to 0, meaning no subcategories are read. If you want to include images from subcategories in your outputs, you can change the depth variable in setup.py.
If you want to run this script for your own Commons category and create HTML and Excel overviews for your own institution, you can clone/download the repo and run it on your own machine. You will need to make some simple adaptations to the existing code to make it work for the Commons category of your choice. These are:
-
Adapt the category_logo_dict.json for your own needs, making sure the existing syntax is maintained.
-
If not yet available, make a new top level country key (similar to "Netherlands", "USA", "Norway" etc.) to include your country.
-
Under this country key, add a line with a syntax identical to the one starting with "Media contributed by Koninklijke Bibliotheek", but with modifications for three things:
-
The exact name (without underscores '_') of the Wikimedia Commons category you want run the script for ("Media contributed by Koninklijke Bibliotheek")
-
A shortname of the institution ("KoninklijkeBibliotheekNL"). This is used for the name of the sheet in the Excel file, so keep it shorter than 32 characters.
-
Name of an institutional logo file, starting with "icon_", followed by a unique and descriptive letter code for the institution, and appended with a .png or .jpg extension at the end. This logo/icon is displayed at the top of the HTML page. Don't forget the next step!
-
-
-
Add a small logo of the institution (256x256 px or so) as a .png of .jpg to the site/logos folder, and add the filename "icon_xxxxx.png/jpg" to the json file.
-
In setup.py, change
- the country_key variable to the new country key you added to the json file (default = "Netherlands")
- the institute_index to the index of the line corresponding to your institution in the json file (default = 0; first line under a country key)
That's all, you should now be able to run the main GLAMorousToHTML script. The generated HTML page will be added to the site/ folder and the Excel to the data/ folder.
In case you can't get the script up and running, please open an issue in this repo.
- Input: Commons category = Media contributed by Koninklijke Bibliotheek
- Output:
- this output dd 14-02-2024, together with this Excel file.
- this output dd 16-01-2024 or this output dd 10-01-2024,
- this result dd 20-12-2022, related to the article Public outreach and reuse of KB images via Wikipedia, 2014-2022, or
- this output dd 16-02-2022, related to this analysis on Dutch Wikipedia dd 16-02-2022, or
- this output dd 27-01-2022
- Input: Commons category = Atlas de Wit 1698
- Output: AtlasdeWit1698_Wikipedia_NS0_27012022.html
- Input: Commons category = Atlas van der Hagen
- Output: AtlasvanderHagen_Wikipedia_NS0_27012022.html
- Input: Commons category = Media from Atlas of Mutual Heritage - Koninklijke Bibliotheek
- Output: MediafromAtlasofMutualHeritage-KoninklijkeBibliotheek_Wikipedia_NS0_27012022.html
- Input: Commons category = Nederlandsche vogelen van Nozeman en Sepp
- Output: NederlandschevogelenvanNozemanenSepp_Wikipedia_NS0_27012022.html
- Input: Commons category = Der naturen bloeme - KB KA 16
- Output: Dernaturenbloeme-KBKA16_Wikipedia_NS0_27012022.html (incl. images in the subcategories, depth=2)
- Input: Commons category = Catchpenny prints from Koninklijke Bibliotheek
- Output: CatchpennyprintsfromKoninklijkeBibliotheek_Wikipedia_NS0_27012022.html
- Input: Commons category = Bookbindings from Koninklijke Bibliotheek
- Output: BookbindingsfromKoninklijkeBibliotheek_Wikipedia_NS0_27012022.html
See also this LinkedIn post
- Nationaal Archief : Output on 16-01-2024
- Rijksmuseum Amsterdam : Output on 16-01-2024
- Beeld en Geluid : Output on 16-01-2024
- Tropenmuseum (former) : Output on 16-01-2024
- Afrika Studiecentrum (Universiteit Leiden) : Output on 17-01-2024
- Universiteitsbibliotheek Maastricht : Output on 17-01-2024 and 15-02-2024
- Het Utrechts Archief : Output on 17-01-2024
- Rijksdienst voor het Cultureel Erfgoed : Output on 17-01-2024
- University of Amsterdam (Special Collections) : Output on 17-01-2024
- Naturalis Biodiversity Center : Output on 17-01-2024
- Stadsarchief Amsterdam : Output on 17-01-2024
- Museum Catharijneconvent : Output on 17-01-2024
- Nationaal Museum van Wereldculturen : Output on 17-01-2024
See also this LinkedIn post
- National Park Service Gallery : Output on 24-01-2024
- Boston Public Library : Output on 24-01-2024
- Los Angeles County Museum of Art : Output on 24-01-2024
- U.S. Navy Museum : Output on 24-01-2024
- Walters Art Museum : Output on 24-01-2024
- Smithsonian Institution : Output on 24-01-2024
- Library of Congress : Output on 24-01-2024 - Warning: big file, loading might take some time
- National Archives and Records Administration (NARA) : Output on 24-01-2024 - Warning: big file, loading might take some time
- Metropolitan Museum of Art : Output on 24-01-2024
- New York Public Library : Output on 24-01-2024
- National Gallery of Art (Washington, D.C.) : Output on 24-01-2024
See also this LinkedIn post
- Nasjonalbiblioteket : Output on 01-03-2024
- Norwegian Directorate for Cultural Heritage : Output on 01-03-2024
- Digitalt Museum, Norway : Output on 01-03-2024
- National Archives of Norway : Output on 01-03-2024
- Kartverket : Output on 01-03-2024
- Oslo Museum : Output on 01-03-2024
- Municipal Archives of Trondheim : Output on 01-03-2024
- Nationalmuseum Stockholm : Output on 01-03-2024
- National Archives of Sweden : Output on 01-03-2024
- National Library of Sweden : Output on 01-03-2024
- National Museums of World Culture : Output on 01-03-2024
- National Museum of Science and Technology : Output on 01-03-2024
- Livrustkammaren : Output on 01-03-2024
- Helsinki City Museum : Output on 01-03-2024
- National Archives of Finland : Output on 01-03-2024
- Finnish Society of Swedish Literature : Output on 01-03-2024
- Statens Museum for Kunst : Output on 01-03-2024
- Royal Danish Library, Portraits : Output on 01-03-2024
See also this LinkedIn post
- Australian Paralympic Committee : Output on 14-03-2024
- Australian National Maritime Museum : Output on 14-03-2024
- PaDIL : Output on 14-03-2024
- New South Wales Heritage Database : Output on 14-03-2024
- State Archives and Records Authority of New South Wales : Output on 14-03-2024
- State Library of New South Wales : Output on 14-03-2024
- State Library of Queensland : Output on 14-03-2024
- State Library of South_Australia : Output on 14-03-2024
- State Library of Victoria : Output on 14-03-2024
- Australian War Memorial : Output on 14-03-2024
- Auckland Museum : Output on 14-03-2024
- Archives New Zealand : Output on 14-03-2024
- New Zealand Defence Force : Output on 14-03-2024
- New Zealand Tertiary Education Union : Output on 14-03-2024
- https://commons.wikimedia.org/wiki/Commons:GLAMorousToHTML
- Public outreach and reuse of KB images via Wikipedia, 2014-2022 (20-12-2022)
- Included reports for 14 institutions from Australia and New Zealand.
- Included reports for institutions from Norway, Sweden, Finland and Sweden.
- README.md: Added explanations how you can run the script yourself.
- Refactored all code into multiple separated modules: setup.py, general.py, buildHTML.py and buildExcel.py. This has reduced the complexity of the main script GLAMorousToHTML.py significantly and made the total suite of code much more modular and easier to understand, maintain and expand.
- Moved all HTML report pages into a separate site/ folder. This has made the repo much cleaner, clearer and more maintainable.
- Created five HTML files that redirect the old KB HTML pages (from 27-01-2022 to 16-01-2024) to the new equivalent ones in "/site" folder. Did not implement redirection for other institutions.
- Per 14-02-2024 added Excel outputs in data/ folder, to be used as structured input for data applications, such as OpenRefine
- In the proces of updating the data structure in category_logo_dict.json, where the new structure can be seen under the 'Netherlands' key.
- Improved pagetemplate.html to be key based ({numarticles} Wikipedia articles) rather than index based ({0} Wikipedia articles)
- Export reports to Wiki format and put on Commons: (work in progress)
- https://commons.wikimedia.org/wiki/Commons:GLAMorousToHTML/Reports (index page)
- https://commons.wikimedia.org/wiki/Commons:GLAMorousToHTML/Reports/Media_contributed_by_Koninklijke_Bibliotheek (index page for KB)
- https://commons.wikimedia.org/wiki/Commons:GLAMorousToHTML/Reports/Media_contributed_by_Koninklijke_Bibliotheek/14022024 (KB report dd 14 Feb 2024)