anansi-project/comicinfo

New Element: LocalizedSeries

ocgineer opened this issue · 43 comments

Where does this comes from?

Myself and Discord

What is the rationale for adding support for this element?

LocalizedTitle To have a proper field available for the title if the comic/manga originates from another country than the language the book is translated published as.

LocalizedTitleScript could also be added in addition to LocalizedTitle for the use of actual native script (japanese or korean script).

Can you provide example of what this looks like?

What if instead a language attribute was added to the title tag and multiple title tags are allowed so long as they specify a separate language?

<Series lang="eng" sort="Batman">The Batman</Series>
<Series lang="spa" sort="Batman">El Batman</Series>

I think the language is not relevant, at least not for the published title, given we already have the LanguageISO field that should provide the language of the publication, so it would be redundant.

So for a Japanese publication translated in English, the English publication would have those if i understand correctly:

  • Series: Batman
  • OriginalSeries in romanized script: Battomano (Batman in Romaji)
  • OriginalSeries in local script: バットマノ (Batman in Katakana)

But that wouldn't tell us the language of origin of the publication.

It has as far as I know two uses, mostly for manga/manhwa/manhua and webtoons.

The manga is a scanlation

In the case of a scanlation the there is no 'official' English title thus the main title is in the language it was written in. Therefor the LocalizedTitle would be used for the 'common' fan title in English or the language it was translated to.

The manga is an official translation publication

In this case, there is an official English title and can be used as the main title. Many manga users still prefer to also use the Japanese (for manga) title to use as they are used to it and can search by this name as it is in the metadata.

Kavita has a localized title field that maps to alternateSeries but then in a discussion on Discord came up this specialized LocalizedTitle or LocalizedName element for the specific use case.

AlternateSeries can then be used for manga when there are is another 'English title' in case of scanlation, or it can be used for manga with long names that have a common shorter name among the fans that would also use these names in search. E.g.;

  • SAO => Sword Art Online
  • BakaRina => My Next Life as a Villainess: All Routes Lead to Doom!
  • DanMachi => Is It Wrong to Try to Pick Up Girls in a Dungeon?

I would just say, based on another issue, it's been said that Comic users use AlternativeSeries as Story Arc type tags. I think having LocalizedSeries would be the best idea to ensure we can store the mappings, like DanMachi (which is how most people would look up a series) but also not create confusion between manga/comics on what a tag is used for.

I would like to clarify a few things:

  • this is an exchange data format, we need to think about the data it represents, not how the data is used in a particular consuming application (like Komga or Kavita)
  • as mentioned elsewhere and also in the documentation, and as @majora2007 said well above also, AlternateSeries fields are historically designed for cross-over usage (example). We shouldn't try to shoehorn something entirely different in there, while many applications already use those fields for the very specific purpose of cross-overs or story arcs.

@ocgineer can you provide detailed examples of real life cases for both scanlation and official translation, of what the different series titles are?

What i'd like to say is also this: please try to explain the problem at stake (what kind of data is missing or what you cannot express with the current data model), and not jump at the solution (what fields need to be added).

Not to derail this too much but as a practical matter, i suspect most tagging is done from comicvine data (with comictagger i'd suspect). I can't recall having ever seen any of these tags used in the wild: AlternateSeries, AlternateNumber, AlternateCount, StoryArc, StoryArcNumber, SeriesGroup, AgeRating.

Unless a new popular, available source of metadata appears paired with new popular, convenient tagging software, whatever tags you might dream up for a spec, however logical and appropriate, will end up being a mostly academic exercise.

@ajslater A lot of manga users also use My Anime List for tagging through programs like MangaTagger. There are other efforts to bring manga based metadata into ComicInfo, hence why I think a single additional tag of LocalizedSeries would be nice.

I know for my application, Kavita, that LocalizedSeries would be used and I'm working on external metadata support as well.

Does manga have a popular metadata file format that's stored with the archive analogous to comicinfo.xml?

It does not. The best we have is ComicInfo.xml and we have to use workarounds for how it works.

Like number of issues are used for number of volumes and not actual chapters.

What Majora2007 said, there is no specific manga metadata and thus we are trying to use what is available in ComicInfo.xml and sometimes bending the actual intention of what is defined for comics but then for the use of manga. Or trying to get some new fields in for the usage. As we work with manga, they are (always) translated and will have a title in the original language and a translated title and they are used both as much in the community. Chapter # and Volume # (collection of chapters) that manga uses is another thing.

I will say, and I think is starting to veer off into more of a philosophical discussion on the purpose of ComicInfo, that it would be really nice if ComicInfo can accommodate some stores of data for Manga itself. Currently, manga cannot fully use ComicInfo as it is designed with one metadata source in mind and a small set of producers. If the standard cannot be open enough to allow the potential to store some information even if there isn't a major player yet in the field, then this effort is pretty moot.

Kavita for example is looking to synchronize metadata from it's database into metadata in the file, to allow users the choice to move to another consuming application in the future and not be tied to Kavita. But without the means to write some essential information, another standard will have to be created and we will continue to be stuck in a future of fragmented software and lock-in.

We don't need to bloat the standard, but we should be able to put a few tags in where there is potential for automation, like LocalizedSeries, where the information is available from MAL, AL, MangaDex, all with APIs, but no program writes these fields because there is no standard there. But we don't need to be overly ambitious, looking at the Mangka tag, that is overly verbose for something that already exists (Writer) or Artist.

Just as a pedant ComicInfo wasn't designed with a data source in mind, it was created by ComicRack to store local data from the ComicRack internal metadata; the scrapers for different services came after as python plugins. This post on reddit has some of them and it has the manual for ComicRack.

This project is just a collection of people who have comics of any kind including manga who are trying to create new elements for a format that has been abandoned by it's creator.
I think that @ajslater and @gotson are just trying to determine the proper usage for elements before they are added and are just being extra cautious about not including bloat.

Honestly the format that ComicRack created doesn't even have all the tags needed to account for ComicVine nor GCD and neither agree on how to represent comic series

@ocgineer for the third time, can you provide detailed examples of real life cases for both scanlation and official translation, of what the different series titles are?

This Tag
I can't say I have feelings one way or the other about but as comicvine doesn't supply it, it won't be filled in by automated processes for western comics. If it did, I'd be more inclined to add it I'd be pretty sure it would get used en-mass.

Meta
I think if the purpose for comicinfo 2+ was more clear, then the issue of what elements to include or not would be easier.

Manga having no common embedded metadata format and being forced to abuse comicinfo seems less than ideal for manga readers. With that in mind I think I do think a mission of some future version of comicinfo could be to support Manga extensions where no comicinfo tag can do the job. But should that be v2.1?

Every bruce, dick and barbara is going to be requesting their own special tag on this project. A rubric or a statement of purpose for this format specification seems like a good idea. Not for the entire future of the project, but perhaps for the next version. Maybe supporting Manga isn't part of v2.1, idk. I think it would be nice if some future version did, but you might want some hard cutoffs and reasons to say no to ship this thing.

I can provide some examples:
English Title: Brynhildr in the Darkness
Localized Title: Gokukoku no Brynhildr

English Title: We Never Learn
Localized Title: Bokutachi wa Benkyou ga Dekinai!

This is why he is requesting it (and I am 100% advocating for it). Imagine you are searching for We never Learn but your series is tagged as Bokutachi wa Benkyou ga Dekinai! because that is what the metadata service provides.

I agree with ajslater, there seems to be no defined goal of what this spec is for and the rules around it. If we can't add new things because it has to be supported by ComicVine and has to suit only Comic users, then it really defeats the purpose in my eyes of pushing the spec forward. Yes, we need to be in agreement on how much we let through, but at the moment, it would be nice to have some basic support for Manga without having hard requirements that a program write the tags out explicitly (or already exist).

To answer the Meta part:

ComicInfo is not specific to Manga, Comics, or any other type of publication. The problem with Manga is that people started using the format without any guidance or documentation, and decided to use some fields that are actually intended for something else.

v2.1 should be an intermediate step between v2, and the target model we are discussing in https://github.com/anansi-project/rfcs

Not everything should be added to 2.1, only things that makes sense, that's why there are lots of questions asked, to make sure we get it right. It's not an easy process. Then you may ask, "what does make sense to be added?". I don't have an absolute answer. That's why we need the discussions, the arguing, and the constructive disagreement.

But to state it again, ComicInfo is not just for Comics. ComicInfo is not just for ComicVine.

I can provide some examples: English Title: Brynhildr in the Darkness Localized Title: Gokukoku no Brynhildr

English Title: We Never Learn Localized Title: Bokutachi wa Benkyou ga Dekinai!

Thanks, but that doesn't really explain the script part that @ocgineer mentioned above.

The naming also seems weird to me, localized would mean translated to the language of the publication, but here it's the original title.

Let's leave the script for some other PR. It's getting meddled with the discussion of this actual tag itself. @ocgineer please create a separate issue if you want the script included and have concrete examples.

The example is just that, you can swap them around based on your preferences. If writing from external sources (which exist), then you'll be forced into that convention.

But that's semantically wrong.

If you look at that book published in english, it should have:

  • English Title: Brynhildr in the Darkness
  • Original Title: Gokukoku no Brynhildr

But if you look at the original publication, in Japanese, it should only have the original title in the Title field.

I think all those are related, in the discussions above what i get is:

  • for translated publication, the original title is of interest (for example the japanese title)
  • for translated publication, the original title in its original script can also be of interest (mostly makes sense for asian languages)
  • for normal publications, the title could have multiple writings, depending on the scripts. So if you have a japanese manga, untranslated, it still has multiple titles, the romaji one, and the hiragana/kanji one for example.
  • there is also the matter of shorter names, which i would put into a synonym/alias basket

We could imagine having multiple synonyms/alias to dump all those notions, however it would make it difficult for consuming applications to consume if there is no intent or hint as to what the titles are.

@majora2007 notwithstanding what's in the ComicInfo.xml, how does Kavita handles those things in the internal metadata model ?

We provide the user 2 titles to work with: Title and the LocalizedTitle. We don't care what is in which. The title is what renders to the screen, but the user can lookup with either.

Usually, in manga, users will look up with English or Japanese. Different sites show different titles, but it's usually either the English or romanji (unless you are using pure Japanese, which is an edge case).

For example, I use a mix on what is the Title, depending on what I know it as. Sometimes the Japanese is not easy to remember and I use the English or some shorthand as the title to best suite my needs of finding and remembering something, but having the localized title available makes it really nice when I search based on reading about a series so i don't have to google to translate it then check on that.

My opinion is to not take care of every edge case, but the most common. Then the user can decide what system they want to represent their files. The consuming applications just need to respect their tagging choices for display.

We provide the user 2 titles to work with: Title and the LocalizedTitle. We don't care what is in which. The title is what renders to the screen, but the user can lookup with either.

So IIUC the LocalizedTitle is for search only ?

FYI Komga has a series title, used for display and search, and a series sort title, used for sorting and search.


If we don't need any specific meaning/hint/intent on the additional series title, we could use something like aliases or synonyms.

We could imagine something like that:

<xs:element minOccurs="0" maxOccurs="unbounded" default="" name="SeriesAlias" type="xs:string" />

Which would be used like that, for example for this series:

<Series>Is It Wrong to Try to Pick Up Girls in a Dungeon?</Series>
<SeriesAlias>Dungeon ni Deai wo Motomeru no wa Machigatteiru Darou ka</SeriesAlias>
<SeriesAlias>DanMachi</SeriesAlias>
<SeriesAlias>ダンジョンに出会いを求めるのは間違っているだろうか</SeriesAlias>

We could also use hints with XML attributes, for example:

<Series>Is It Wrong to Try to Pick Up Girls in a Dungeon?</Series>
<SeriesAlias hint="romaji">Dungeon ni Deai wo Motomeru no wa Machigatteiru Darou ka</SeriesAlias>
<SeriesAlias hint="short">DanMachi</SeriesAlias>
<SeriesAlias hint="original">ダンジョンに出会いを求めるのは間違っているだろうか</SeriesAlias>

The hint would not be typed, and would be free text. Not sure if that would bring a lot of value for consuming applications though.

Maybe hint is not a very good name, we can discuss about it if we want to go that route.

Note that the proposed <xs:element minOccurs="0" maxOccurs="unbounded" default="" name="SeriesAlias" type="xs:string" /> would be incompatible with #10, as xs:all cannot be unbounded.

Just want to point out to keep in mind, that it should use romanized instead of romaji to keep it global if you want to go this route. I like the original as well, it can then contain the original tittle of any language, the work originates from.

The examples given were Japanese manga (as that is the most prominent use) thus romaji would be correct but there are also Korean and Chinese 'manga' and webtoons that are starting to get officially translated.

Just want to point out to keep in mind, that it should use romanized instead of romaji to keep it global if you want to go this route. :)

The examples given were Japanese manga (as that is the most prominent use) thus romaji would be correct but there are also Korean and Chinese 'manga' and webtoons that are starting to get officially translated.

As I said, the hint would be free text, there wouldn't be any convention.

I know we had a bunch of back and forth on here, I decided to implement LocalizedSeries as a tag within Kavita (no one currently writes this) so I can get some basic functionality for Manga. I think we have to be cognizant of the medium we have and that it cannot solve all possible scenarios. Giving some functionality is better than holding back because we can't cover all use cases.

I know we had a bunch of back and forth on here, I decided to implement LocalizedSeries as a tag within Kavita (no one currently writes this) so I can get some basic functionality for Manga. I think we have to be cognizant of the medium we have and that it cannot solve all possible scenarios. Giving some functionality is better than holding back because we can't cover all use cases.

It seems like you are tying up Kavita's metadata model to ComicInfo ?

Not exactly tying it to ComicInfo, but ComicInfo and Epub are the only ways to have metadata in a self-contained system. When possible, I'd like to import data from ComicInfo, while Kavita offers metadata above what ComicInfo can provide. So for LocalizedSeries, this is a field I already had available, but wanted to allow users and myself the ability to set it in the ComicInfo and have it work between Kavita installs, without me having to find the series and update it in both install.

FWIW, I prefer lordwelch attribute's suggestions allowing multiple languages and sort schemes via attributes. While in the other thread, I know I voiced support for consistency of relying on more tags, it looks like both these cases should be tightly tied to the Series tag and lordwelch's example lets you specify as many language variations as you might like for a variety of consumers. With that schema the LanguageISO tag still represents the printed language found inside the comic.

<LanguageISO>eng</LanguageISO>
<Series lang="eng" sort="Batman">The Batman</Series>
<Series lang="spa" sort="Batman">El Batman</Series>
<Series lang="fra" sort="Batman">Le Batman</Series>
<Series>The Batman</Series>

If you wanted you could match the language tag to the series lang attribute and find the 'original language' series or use a tag without the lang attribute to represent the original. This schema has the benefit of being extensible to other tags like Title, AlternateSeries, Imprint and Publisher, Summary, Notes, and possibly other string tags.

The schema shown above is focused on language and localization solutions and does not take into account gotson's suggestion of a "short" hint.

<SeriesAlias hint="short">DanMachi</SeriesAlias>

which is interesting, but i'm guessing that such abbreviations aren't all that commonly desired for sorting or display?

I have implemented the tag on my metadata editor and my own fork of Manga-Tagger.

Also Manga-Tagger currently maps the English name to AlternateSeries.

I plan to add a setting to let the user use English by default in which case English would go to Series and romaji to LocalizedSeries

Since you wanted more use cases. This is what anilist provides through their api.
Romaji: Kage no Jitsuryokusha ni Naritakute!
English: The Eminence in Shadow
Native: 陰の実力者になりたくて!

As to the implementation of the tag I'd just leave it simple and assume that the LocalizedTitle language is the same as LanguageISO.

Personally I don't see myself adding how the serie is named in French if the content is in Spanish for example.

I think that this issue has been opened too long already for something that is really simple. I understand that there has to be agreement in what tags should be added and whatnot and to have that there has be some discussion. The problem is that few comments every few months will get us nowhere.

Edit:
I just recalled that I've been typing LocalizedTitle because it's the name of the issue but i'm refering to LocalizedSeries
I suggest renaming the name of the issue

Trying to pick this up again, as someone mentioned it today on a Komga issue.

There's a couple of requirements from what i can see:

  1. ability to specify alternative titles in different language/script. Script is important because some languages have multiple scripts, that would need differentiating (mostly for Japanese in Kanji/Hiragana/Katakana/Romaji)
  2. ability to specify alternative titles that are just aliases. It could be an accepted shortening (like SOA for Sword Art Online), or an alternative title altogether (example: Valérian, agent spatio-temporel was renamed to Valérian et Laureline)
  3. it also ties up to #4 , but given a series should only have a single sort, i don't see this working well together

As for the suggestions:

  • since it's a new tag, using multiple times the same element should be possible. We don't want to handle comma-separated values.
  • the element attributes is a nice addition to enrich the element's value
  • attributes proposed:
    • lang: If we want this to handle different scripts for a single language, we would need to use the BCP47 tags, which can have a subtag for script. For example for ja-Hira-JP stands for Japanese in Hiragana, ja-Kata-JP for Japanese in Katakana, and ja-Jpan-JP for Japanese in Han (Kanji), Hiragana and Katakana. It would not handle shortening or aliases.
    • hint or label: could be used to accomodate whatever. Could be language (French), script (Hiragana, Katakana), alias (alias, aka), or shortening (short).
    • sort: i don't see how that would work, as a series would have a single sort order. How would client applications be expected to handle that ? I think we should tackle series sort in #4

I would be in favor of using both label and lang.

I can't envision a use case for either "Japanese in Hiragana" or "Japanese in Katakana" except for sort ordering purposes.

Kavita has already implemented support for this since a while back. As mentioned in my post here, I do not believe it is ComicInfo's job to cater to all potential ways to represent Series data. Since this is mainly used for Manga, I believe offering an additional field LocalizedSeries is sufficient and their ingestion software of choice can implement alternative fields.

I'm not personally interested in adjusting the implementation in Kavita to support multiple languages as there is no added benefit to the user other than having it in ComicInfo.

I keep my position in my previous comment and agree with @majora2007.

LocalizedSeries is too limited in my opinion. Seems it was added in Manga Manager and Kavita because no consensus was reached, but doesn't mean we should go for that option now because it is already implemented somewhere.

I do not believe it is ComicInfo's job to cater to all potential ways to represent Series data. Since this is mainly used for Manga, I believe offering an additional field LocalizedSeries is sufficient and their ingestion software of choice can implement alternative fields.

It's a bit contradictory. My reading of this is "we don't need something so complex, but we already implemented something simpler that works for us, so we should use that instead".

Since this is mainly used for Manga,

Mainly but not only. The model, even though it was initially done for Comics by ComicRack, should aim to be agnostic as much as possible.

I'm not personally interested in adjusting the implementation in Kavita to support multiple languages

The way consuming applications handle the metadata is up to them. You could always decide to ignore that new field, or use the first one found.

there is no added benefit to the user other than having it in ComicInfo.

to Kavita users. But there is for users of other applications, or if they ever want to migrate from Kavita to something else.

I can't envision a use case for either "Japanese in Hiragana" or "Japanese in Katakana" except for sort ordering purposes.

Search is a very good use case. You could search by using whatever script. Some people may like the Hiragana/Kanji display, but could not write it, so they could search using romaji.

Sort ordering would most likely be done using the romaji titles, else the japanese characters title would always end up at the end.

Sort ordering would most likely be done using the romaji titles, else the japanese characters title would always end up at the end.

If you're talking about foreign-language libraries, that's probably correct. If you're talking about Japanese monolingual libraries, this will result in an incorrect sort order -- The only feasible way to sort Japanese is to sort based on kana readings.

Sort ordering would most likely be done using the romaji titles, else the japanese characters title would always end up at the end.

If you're talking about foreign-language libraries, that's probably correct. If you're talking about Japanese monolingual libraries, this will result in an incorrect sort order -- The only feasible way to sort Japanese is to sort based on kana readings.

that's why we also have a proposal for #4

In my opinion, a non-english manga book usually has its own official title in different language or areas which related to its publisher and isbn.
I prefer use its localized title as main title. In this case, a Native Title filed should be more useful than a Localized One especially if you have multiple versions like Japanese, Chinese and others in one Library.

It's going to be almost 2 years since opened and 10 months since re-discussed and to this day there is no consensus? You are like Ents.

In a serious note, please consider adding this to the spec, this is extremely useful even essential for foreign users, I could know a series for their original name but that is because I am involved in the world of metadata and I'm a nerd but a normal user is not going to recognize a series by its original name because many times the translation differs completely from the original title, and that is taking into account that we are talking about English titles that it is a language that most people could know, if for example we talk about titles in Japanese then 90% of people (at least in my country and I would even dare to say continent) are not going to recognize it at all. And you can't even say this is an exotic request, this is supported in media centers like Kodi (https://kodi.wiki/view/NFO_files/Templates) and Jellyfin (https://jellyfin.org/docs/general/server/metadata/nfo/).

It's going to be almost 2 years since opened and 10 months since re-discussed and to this day there is no consensus? You are like Ents.

In a serious note, please consider adding this to the spec, this is extremely useful even essential for foreign users, I could know a series for their original name but that is because I am involved in the world of metadata and I'm a nerd but a normal user is not going to recognize a series by its original name because many times the translation differs completely from the original title, and that is taking into account that we are talking about English titles that it is a language that most people could know, if for example we talk about titles in Japanese then 90% of people (at least in my country and I would even dare to say continent) are not going to recognize it at all. And you can't even say this is an exotic request, this is supported in media centers like Kodi (https://kodi.wiki/view/NFO_files/Templates) and Jellyfin (https://jellyfin.org/docs/general/server/metadata/nfo/).

Not sure about the status of this officially, but Kavita and Manga manager has added support for this due to the immense need like you mentioned for Manga users.

I can't speak of the proposal's getting merged as it is on Goston to give the final approval.

gotson commented

No consensus could be reached, so there's no decision and no merge.