salgo60/Wikidata_riksdagen-corpus

Feedback Riksdagens Corpus

salgo60 opened this issue · 2 comments

copy of

The big step I see with this project Riksdagens Corpus

  • excellent usage of GITHUB
  • showing that TEI is maybe a way forward
  • even a community driven site like Wikidata can add values if they have persistent identifiers for e.g. Swedish PM people...

What I lack from project Riksdagens corpus - 2023 oct

  • You have not explored Wikidata extensively to determine its potential for enhancing research, particularly in the realm of political research on an international scale, such as incorporating relationships between different countries' corpus

    image

    • I miss PROV provenance #167? I would like to see that the National Archives, RAÄ (Swedish National Heritage Board), research projects like the Riksdagen Corpus, and Umeå University Familia had better data and worked in a similar manner. It seems to me that in the Riksdagen Corpus, a lot of knowledge is currently built on a book "Tvåkammar-riksdagen 1867–1970," which, when we also scan portrait books from the early 1900s, shows that this book classifies political outliers with different terminology. Without having provenance, we lose a significant part of traceability and credibility, something that the Wikipedia world is often rightfully criticized for due to the lack of sources. I now observe that the research data from 2023 is affected by this
    • there hasn't been an effort on your part to evaluate tools like Scholia, which is built upon Wikidata, and has been utilized by other researchers see Swedish MP Anna Lind scholia.toolforge.org/author/Q208591 --> Scholia --> GITHUB WDscholia/scholia
    • a vision entails creating a unified framework where your data, structured as Linked Data, can seamlessly integrate with the broader body of research data across Europe. This integration can foster a collaborative research ecosystem, enabling the sharing of knowledge and insights across borders. By aligning your data with common standards and leveraging tools like Persistent Identifiers, it's possible to achieve a higher level of interoperability and data consistency. This will not only enhance the scalability and international scope of political research but also contribute to a more cohesive understanding of complex, transnational issues. Through platforms like Wikidata and tools like Scholia, there's potential to build upon existing infrastructures, thereby enriching the collective knowledge base and advancing the broader research objectives.
    • #269 Utilizing Persistent Identifiers (PIDs) is a practice aimed at ensuring the long-term accessibility and traceability of digital items. Wikidata, for instance, has employed PIDs to all uploaded images, enabling a more organized and searchable database and support of more than 300 languages. Moreover, they've introduced a feature allowing for parts of an image to be annotated to indicate what or who is depicted, enhancing the information retrieval process. An example of this is seen in the annotations of "The Coronation of Napoleon" image, where labels are provided in various languages including Chinese (zh), Swedish (sv), and English (en). By integrating such practices, you can significantly improve the management and sharing of digital resources within the European research data framework. This move towards a more structured and interlinked data environment can facilitate collaborative research efforts, and potentially unlock new insights through cross-referencing and analysis of a rich, multilingual data repository.
  • It appears that there hasn't been a clear initiative to challenge organizations like Riksdagens Öppna data, Riksarkivet, Riksarkivet SBL, Kungliga biblioteket, and Digital museum regarding the quality of data they provide. Understanding the level of support or the lack thereof from these organizations is crucial as it may significantly impact research outcomes. The gaps in support might relate to various factors including easy to communicate like using GITHUB, data accuracy, completeness, accessibility, or interoperability which could hinder the progress and quality of research. By addressing these issues and advocating for better data practices, it could pave the way for more reliable and comprehensive research, fostering a conducive environment for scholarly endeavors. Furthermore, collaborating with these organizations to improve data quality and availability could potentially lead to more insightful findings and a richer knowledge base, thus advancing the broader research objectives.

    • Observe my attempt to get your focus on some Swedish MPs with less good data using Q120143028 - (#359 - #324..., where I believe you should concentrate and enhance the data quality. It's advisable to apply the same approach to the aforementioned organizations, avoiding the isolation of data silos #25 / #24...

Why cant they work together and produce ONE knowledgegraph and support citation graphs?

image

What I observe is a lack of an ecosystem - #datasilos.

While the project serves as a commendable example of GitHub utilization, it appears to overlook fundamental aspects such as semantic skills. Additionally, there seems to be little collaboration with Riksarkivet, SBL, museums, etc., suggesting they operate within another new data silo.

My conclusion

The aforementioned project had machine learning professionals, and their use of GitHub was commendable. However, we require individuals with a digital foresight who can confidently communicate expectations to other organizations.

Those overseeing finances must acquire new competencies and possess a vision for building an ecosystem.

See nfdi4culture.de/resources/user-stories

image

image

image

image

image

image