Feedback Riksdagens Corpus
salgo60 opened this issue · 2 comments
The big step I see with this project Riksdagens Corpus
- excellent usage of GITHUB
- showing that TEI is maybe a way forward
- even a community driven site like Wikidata can add values if they have persistent identifiers for e.g. Swedish PM people...
What I lack from project Riksdagens corpus - 2023 oct
-
You have not explored Wikidata extensively to determine its potential for enhancing research, particularly in the realm of political research on an international scale, such as incorporating relationships between different countries' corpus
- if the Wikidata approach could be used for better scaling political research and do it more international i.e. adding relation between different countries corpus. I compared the productivity with SBL text strings and SKBL structured data, and SKBL using structured data produced 100 times more female bios. My feeling and what I see with Wikipedia/Wikidata and moving to Linked data is the real game changer see Tim Berners Lee The Next web of open linked data and Google "Applied semantics: beyond the catalog". The representative from Google talks about the importance of good metadata, and asserts that if they do their job well, there should be no need for anyone else to look up information such as Angela Merkel's birthdate...
-
seeing all the cut and pasting and that SKBL conducts research on the same women that SBL has already researched is not 2023 and it cannot be justified that tax money is not used more efficiently, and everyone has to start from scratch sad that SKBL produced so bad data and didnt do Linked data....
-
I miss any semantic discussion like SKOS how 2 knowledge domains should connect or how to handle differences between sources (#222,#222) and how to describe a source with uncertainty so metadataroundtripping will work
- how to handle differencies in Wikidata see Ranking (see blog "On truths and lies" - "Unlike many other databases, Wikidata can contain contradicting statements, supported by different references") compare Wikidata see #35
-
my talk about metadata roundtripping and persistent identifiers / video that we already 1750 used identifiers for rune stones but often miss it in 2023 for projects like Riksdagens corpus. It works for ORCID and DOI but is needed everywhere and it should be available day one in a project like riksdagens corpus so you can track the development of good data and not just wgen publishing data. GIthub is great but we need to track every datapoint also have tombstone pages would be nice... we see this problem everywhere like even when there are available persistent identifiers people have excuses why not use them and owl.sameas. See issue #269 that should have be solved day one in the project... now wikidata cant reference the datapoints or track the differencies between the two domains... I also would like to see usage of SKOS and handle sources that we trust less or more see WD P1480 "sourcing circumstances"
-
good thoughts from Katherine McDonough is that digital humaniora needs to start work together and create curated data that works together... right now I feel Wikidata is an enabler but that is not serious....
-
- if the Wikidata approach could be used for better scaling political research and do it more international i.e. adding relation between different countries corpus. I compared the productivity with SBL text strings and SKBL structured data, and SKBL using structured data produced 100 times more female bios. My feeling and what I see with Wikipedia/Wikidata and moving to Linked data is the real game changer see Tim Berners Lee The Next web of open linked data and Google "Applied semantics: beyond the catalog". The representative from Google talks about the importance of good metadata, and asserts that if they do their job well, there should be no need for anyone else to look up information such as Angela Merkel's birthdate...
-
- I miss PROV provenance #167? I would like to see that the National Archives, RAÄ (Swedish National Heritage Board), research projects like the Riksdagen Corpus, and Umeå University Familia had better data and worked in a similar manner. It seems to me that in the Riksdagen Corpus, a lot of knowledge is currently built on a book "Tvåkammar-riksdagen 1867–1970," which, when we also scan portrait books from the early 1900s, shows that this book classifies political outliers with different terminology. Without having provenance, we lose a significant part of traceability and credibility, something that the Wikipedia world is often rightfully criticized for due to the lack of sources. I now observe that the research data from 2023 is affected by this
- there hasn't been an effort on your part to evaluate tools like Scholia, which is built upon Wikidata, and has been utilized by other researchers see Swedish MP Anna Lind scholia.toolforge.org/author/Q208591 --> Scholia --> GITHUB WDscholia/scholia
- a vision entails creating a unified framework where your data, structured as Linked Data, can seamlessly integrate with the broader body of research data across Europe. This integration can foster a collaborative research ecosystem, enabling the sharing of knowledge and insights across borders. By aligning your data with common standards and leveraging tools like Persistent Identifiers, it's possible to achieve a higher level of interoperability and data consistency. This will not only enhance the scalability and international scope of political research but also contribute to a more cohesive understanding of complex, transnational issues. Through platforms like Wikidata and tools like Scholia, there's potential to build upon existing infrastructures, thereby enriching the collective knowledge base and advancing the broader research objectives.
- #269 Utilizing Persistent Identifiers (PIDs) is a practice aimed at ensuring the long-term accessibility and traceability of digital items. Wikidata, for instance, has employed PIDs to all uploaded images, enabling a more organized and searchable database and support of more than 300 languages. Moreover, they've introduced a feature allowing for parts of an image to be annotated to indicate what or who is depicted, enhancing the information retrieval process. An example of this is seen in the annotations of "The Coronation of Napoleon" image, where labels are provided in various languages including Chinese (zh), Swedish (sv), and English (en). By integrating such practices, you can significantly improve the management and sharing of digital resources within the European research data framework. This move towards a more structured and interlinked data environment can facilitate collaborative research efforts, and potentially unlock new insights through cross-referencing and analysis of a rich, multilingual data repository.
- merely copying and pasting from an old book not significantly advance the frontier of possible research in 2023. I observe the same lack of vision in other research project Familia see my try with them - see how I challenge the term "vilde" by using SPARQL and linked data to compare what was published in books from 1900 compared with "the bible" you trust
- with the network iNaturalist we have
-
It appears that there hasn't been a clear initiative to challenge organizations like Riksdagens Öppna data, Riksarkivet, Riksarkivet SBL, Kungliga biblioteket, and Digital museum regarding the quality of data they provide. Understanding the level of support or the lack thereof from these organizations is crucial as it may significantly impact research outcomes. The gaps in support might relate to various factors including easy to communicate like using GITHUB, data accuracy, completeness, accessibility, or interoperability which could hinder the progress and quality of research. By addressing these issues and advocating for better data practices, it could pave the way for more reliable and comprehensive research, fostering a conducive environment for scholarly endeavors. Furthermore, collaborating with these organizations to improve data quality and availability could potentially lead to more insightful findings and a richer knowledge base, thus advancing the broader research objectives.
- Observe my attempt to get your focus on some Swedish MPs with less good data using Q120143028 - (#359 - #324..., where I believe you should concentrate and enhance the data quality. It's advisable to apply the same approach to the aforementioned organizations, avoiding the isolation of data silos #25 / #24...
Why cant they work together and produce ONE knowledgegraph and support citation graphs?
-
Kungliga biblioteket #25 LIBRISXL koppla "samma som" Riksdagens Öppna data / #17 LIBRISXL: citation graphs
-
Riksarkivet SBL #12 SBL leverera data som data - Things not strings
-
#4 Nationell dataverkstad skall jobba med FAIRDATA och persistenta identifierare
What I observe is a lack of an ecosystem - #datasilos.
While the project serves as a commendable example of GitHub utilization, it appears to overlook fundamental aspects such as semantic skills. Additionally, there seems to be little collaboration with Riksarkivet, SBL, museums, etc., suggesting they operate within another new data silo.
- Riksarkivet SBL - works 2023 like they did 1918 my question if they 2023 could create some structured data - It feels like achieving a world record in a non-learning organisation.
- Museums - dont have skills to do same as Wikidata, nothing has happened with the 2022 vision they had 2014 in this video - see "Museerna icke lärande organisationer #33"
- RAÄ - dont have a public backlog and create to much linkroot #36 and dont use tombstone pages -> see 2023 oct Maintenance hell
- Riksakivet - is an archive that dont support persistent identifiers #17 --> they havnt created a lifecycle management of information --> The Swedish "Domstolsverket" dont know where to find documents #1 and there is no vision of an echosystem and "Lack of information lifecycle thinking and handling 'persistent' identifiers that 'disappear' #63."
- Riksarkivet SBL how wrong they work with textstrings and not Things "#7 Riksarkivet SBL: Släktartiklar saknar persistenta identifierare för personerna i biografierna"
- Familia - a research project that has as an excuse that they are a research project and dont need to create linked data
- .....
My conclusion
The aforementioned project had machine learning professionals, and their use of GitHub was commendable. However, we require individuals with a digital foresight who can confidently communicate expectations to other organizations.
Those overseeing finances must acquire new competencies and possess a vision for building an ecosystem.
- Riksarkivet #25 Strukturerad kravinsamling
- Riksarkivet SBL #26 Strukturerad kravinsamling
- LIBRISXL #24 Strukturerad kravinsamling
- DIGG #99 Strukturerad kravinsamling
- Nationell dataverkstad #27 Strukturerad kravinsamling
- Riksdagens Öppna data #96 Strukturerad kravinsamling
- RAÄ #29 Strukturerad kravinsamling