euzu/m3u-filter

automatically map xml tvg attributes based on a config file

Closed this issue · 48 comments

kumy commented

I use tvHeadend as my local backend. I have to manually configure tvg-id to match my XMLTV provider. The same for tv channel numbers, logo url, and groups

Describe the solution you'd like
Given a custom mapping file, example bellow, I would like to fill the fields

  • tvg-id
  • tvg-name
  • group-title
  • tvg-chrono
  • tvg-logo

by static values from the mapping file. To match a line, I compare multiple regex to the actual tvg-name. When a match is found, replace the tvg-* by our values. For the tvg-group, the | is the field separator.

MAP = [
    {
        "tvg-name": "TF1",   # Name to display
        "tvg-names": [       # Name to match. multiple regex allowed
            r"^TF1$",
        ],
        "tvg-id": "TF1.fr",  # XMLTV id
        "tvg-chno": "1",     # Channel number
        "tvg-logo": "https://upload.wikimedia.org/wikipedia/commons/thumb/3/3c/TF1_logo_2013.svg/320px-TF1_logo_2013.svg.png",
        "group-title": [     # Groups to add to existing one
            "FR",
            "TNT",
        ],
    },
    {
        "tvg-name": "TF1 Series Films",
        "tvg-names": [rf"^TF1{SP}Series{SP}Films$"],
        "tvg-id": "TF1SeriesFilms.fr",
        "tvg-chno": "20",
        "tvg-logo": "https://upload.wikimedia.org/wikipedia/commons/thumb/3/3c/TF1_logo_2013.svg/320px-TF1_logo_2013.svg.png",
        "group-title": ["FR", "TNT"],
    },
    {
        "tvg-name": "TF1 +1",
        "tvg-names": [rf"^TF1{SP}\+1$"],
        "tvg-id": "TF1Plus1.fr",
        "tvg-chno": "1",
        "tvg-logo": "https://upload.wikimedia.org/wikipedia/commons/8/83/Logo_TF1_%2B1.png",
        "group-title": ["FR", "TNT"],
    },
# ...
]

The output would be:

#EXTM3U
#EXTINF:-1 tvg-id="TF1.fr" tvg-name="TF1 - ALI" group-title="|TNT|FR|FRANCE FHD" tvg-chno="1" tvg-logo="https://upload.wikimedia.org/wikipedia/commons/thumb/3/3c/TF1_logo_2013.svg/320px-TF1_logo_2013.svg.png",TF1 - ALI
http://foo.bar:80/user/pass/1
#EXTINF:-1 tvg-id="TF1SeriesFilms.fr" tvg-name="TF1 Series Films - ALI" group-title="|TNT|FR|FRANCE FHD" tvg-chno="20" tvg-logo="https://upload.wikimedia.org/wikipedia/commons/thumb/3/3c/TF1_logo_2013.svg/320px-TF1_logo_2013.svg.png",TF1 Series Films - ALI
http://foo.bar:80/user/pass/2
#...

Describe alternatives you've considered
I wrote a draft python script that does that, but it would be nice if all that logic could be handled by only one tool. Right now I should store temp files, and call multiple scripts.

m3u-mapper.zip

Additional context
My workflow:

  • download m3u using curl (save locally to prevent downloading while configuring m3u-filter)
  • use m3u-filter to get a subset of channels (output as temp file)
  • pass the temp files through "the script" (output as final file)
  • start iptv-proxy using the file from previous step
  • configure tvheadend to get from iptv-proxy
    • (without "the script", I have to manually set the tvg-* attributes for each line, and I have to start again and again while testing)
    • map channels

Thanks!

euzu commented

Please download and check the latest release 0.9.4.
Your mapping file is not in the release zip but in the repository. Download it from there mapping.yml.
dont forget to set the mapping property in the config file.

kumy commented

You are amazingly fast 🚀 🚀 🚀
I'll test that ASAP
Thank you!!

kumy commented

Unfortunately I'm not able to get the expected result yet.

From what I understand of my issue, each line matching a mapping is removed from the target file 😬 . It's really visible with such mapping, where I entry to have the same properties. But I get an empty file with just the m3u header.

mappings:
  - id: wildcard
    tag: ""
    mapper:
      - tvg_name: FOOBAR
        tvg_names:
          - .*
        tvg_id: FOO.BAR
        tvg_chno: "42"
        tvg_logo: ""
        group_title: 
          - TEST
euzu commented

can you give me an anonymized input m3u file, a config.yml, mapping.yml file and an expected result m3u file for testing?

The mapper matches the tvg_names regexps to each entry and replaces the attributes with the given properties in the mapper.

kumy commented

Here is a manually crafted config. I tried to include many cases so it could be used as Unit Test. the expected output may have some typo or human errors in it.

I've took some notes, for ideas and things I probably forgot to expose in first comment:

  1. SIMPLIFY REGEX: normalize the tvg_name input for comparison (éè -> e ; à -> a)
  2. SIMPLIFY REGEX: regex match as case insensitive?
  3. DRY: generic pattern for suffixes? (?P<quality>[\s_-]*(hd|lq|4k|uhd)?)
  4. DRY: generic pattern for separator? [\s_-]*
  5. DRY: define custom reusable patterns? PLUS1=[\s_-]*(\+|plus)1
  6. DRY: ignore spaces in front/end of tvg-name by default? but we can set that in the regex anyway
  7. BONUS: it actually check matches on tvg-name field (that is what was initially requested ;)), but could be nice to allow match on any field like it's done with filters?
  8. DRY: common/default group title to always add? Allow to add FR for all
  9. CLEANING: a way force remove groups (per mapper or globally)? FRANCE|FOOBAR|SOMETHING
  10. Also overwrite the Title (I think I forgot to ask for it)
  11. BONUS: Capture regex group (quality in this example) and have a syntax to replace pattern in Name+Tile if group is not empty (to prevent leading spaces or -)
    • mapping:TF1 + m3u:TF1 HD => TF1
    • mapping:TF1{ quality} + m3u:TF1 => TF1
    • mapping:TF1{ quality} + m3u:TF1_HD => TF1 HD
    • mapping:TF1{ - quality} + m3u:TF1 => TF1
    • mapping:TF1{ - quality} + m3u:TF1_HD => TF1 - HD

~Also, really nice idea to have mapping to be an array and having an easy way to use/enable it 👍 ❤️ ~

And what the purpose of .mappings.tag field? Could it be related to point 8 above?

Thanks a lot 🦸‍♂️

config.yml

api: { host: 127.0.0.1, port: 8901, web_root: ./web }
working_dir: ./data
sources:
  - input: { persist: 'input.m3u.bak', url: 'input.m3u' }
    targets:
      - filename: output.m3u
        type: M3u
        filter: 'Group ~ "^(\s|\|)?FR(\s|\|).*" AND Name ~ "(FR[:|])?TF1.*"'
        options: { ignore_logo: true }
        rename:
          - { field: Name, new_name: $2, pattern: "^(FR[ :|])?(.*)" }
        sort: { order: Asc }
        #mapping: France

mapping.yml

mappings:
  - id: France
    tag: ""
    mapper:
      ## That one empty the file
      #- tvg_name: FOOBAR
      #  tvg_names:
      #    - .*
      #  tvg_id: FOO.BAR
      #  tvg_chno: "42"
      #  tvg_logo: ""
      #  group_title: 
      #    - TEST
      - tvg_name: TF1{ quality}
        # https://regex101.com/r/UV233E/1
        tvg_names:
          - ^\s*TF1[\s_-]*(?P<quality>hd|lq|4k|uhd)?\s*$
        tvg_id: TF1.fr
        tvg_chno: "1"
        tvg_logo: https://emojipedia-us.s3.amazonaws.com/source/skype/289/shrimp_1f990.png
        group_title:
          - FR
          - TNT
      - tvg_name: TF1 Séries Films{ quality}
        # https://regex101.com/r/lU0hjK/1
        tvg_names:
          - ^\s*TF1[\s_-]*Series?[\s_-]*Films?([\s_-]*(?P<quality>hd|lq|4k|uhd)?)\s*$
        tvg_id: TF1SeriesFilms.fr
        tvg_chno: "20"
        tvg_logo: https://emojipedia-us.s3.dualstack.us-west-1.amazonaws.com/thumbs/120/google/350/shrimp_1f990.png
        group_title:
          - FR
          - TNT
      - tvg_name: TF1 +1{ - quality}
        # https://regex101.com/r/WGZlWa/1
        tvg_names:
          - ^\s*TF1[\s_-]*Series?[\s_-]*Films?[\s_-]*(\+|plus)1([\s_-]*(?P<quality>hd|lq|4k|uhd)?)\s*$
        tvg_id: TF1Plus1.fr
        tvg_chno: "1"
        tvg_logo: https://emojipedia-us.s3.amazonaws.com/source/skype/289/shrimp_1f990.png
        group_title:
          - FR
          - TNT
          - PLUS1
      - tvg_name: TF1 Séries Films+1{ - quality}
        # https://regex101.com/r/mV6zOc/1
        tvg_names:
          - ^\s*TF1[\s_-]*Series?[\s_-]*Films?[\s_-]*(\+|plus)1([\s_-]*(?P<quality>hd|lq|4k|uhd)?)\s*$
        tvg_id: TF1SeriesFilmsPlus1.fr
        tvg_chno: "20"
        tvg_logo: https://emojipedia-us.s3.dualstack.us-west-1.amazonaws.com/thumbs/120/google/350/shrimp_1f990.png
        group_title:
          - FR
          - TNT
          - PLUS1

input.m3u

#EXTM3U
#EXTINF:-1 tvg-id="TF1.mu" tvg-name="FR:TF1" group-title="|FR| FRANCE",FR:TF1
http://foo.bar:80/user/password/1
#EXTINF:-1 tvg-id="" tvg-name="FR:TF1_HD" group-title="|FR| FRANCE",FR TF1_HD
http://foo.bar:80/user/password/2
#EXTINF:-1 tvg-id="TF1plus1.fr" tvg-name="FR|TF1+1" tvg-logo="https://emojipedia-us.s3.dualstack.us-west-1.amazonaws.com/thumbs/120/google/350/shrimp_1f990.png" group-title="|FR| ",FR|TF1+1_UHD
http://foo.bar:80/user/password/3
#EXTINF:-1 tvg-id="TF1SeriesFilms.fr" tvg-name="TF1_Séries_Film" tvg-logo="" group-title="|FR| FRANCE",FR:TF1_Séries_Film
http://foo.bar:80/user/password/4
#EXTINF:-1 tvg-id="TF1SeriesFilms.fr" tvg-name="FR TF1_Séries_Film_HD" tvg-logo="" group-title="|FR| FRANCE",FR:TF1_Séries_Film_HD
http://foo.bar:80/user/password/5
#EXTINF:-1 tvg-id="" tvg-name="TF1_Séries_Film+1" tvg-logo="" group-title="FR|FRANCE",FR:TF1_Séries_Film+1 4K
http://foo.bar:80/user/password/6

expected-output.m3u

#EXTM3U
#EXTINF:-1 tvg-id="TF1.fr" tvg-name="TF1" tvg-logo="https://emojipedia-us.s3.amazonaws.com/source/skype/289/shrimp_1f990.png" tvg-chrono="1" group-title="FR|FRANCE|TNT",TF1
http://foo.bar:80/user/password/1
#EXTINF:-1 tvg-id="TF1.fr" tvg-name="TF1 HD" tvg-logo="https://emojipedia-us.s3.amazonaws.com/source/skype/289/shrimp_1f990.png" tvg-chrono="1" group-title="FR|FRANCE|TNT",TF1 HD
http://foo.bar:80/user/password/2
#EXTINF:-1 tvg-id="TF1Plus1.fr" tvg-name="TF1+1 - UHD" tvg-logo="https://emojipedia-us.s3.amazonaws.com/source/skype/289/shrimp_1f990.png" tvg-chrono="1" group-title="FR|TNT|PLUS1",TF1+1 - UHD
http://foo.bar:80/user/password/3
#EXTINF:-1 tvg-id="TF1SeriesFilms.fr" tvg-name="TF1 Séries Films" tvg-logo="https://emojipedia-us.s3.dualstack.us-west-1.amazonaws.com/thumbs/120/google/350/shrimp_1f990.png" tvg-chrono="20" group-title="FR|FRANCE|TNT",TF1 Séries Films
http://foo.bar:80/user/password/4
#EXTINF:-1 tvg-id="TF1SeriesFilms.fr" tvg-name="TF1 Séries Films HD" tvg-logo="https://emojipedia-us.s3.dualstack.us-west-1.amazonaws.com/thumbs/120/google/350/shrimp_1f990.png" tvg-chrono="20" group-title="FR|FRANCE|TNT",TF1 Séries Films HD
http://foo.bar:80/user/password/5
#EXTINF:-1 tvg-id="TF1SeriesFilmsPlus1.fr" tvg-name="TF1 Séries Films+1 - 4K" tvg-logo="https://emojipedia-us.s3.dualstack.us-west-1.amazonaws.com/thumbs/120/google/350/shrimp_1f990.png" tvg-chrono="20" group-title="FR|FRANCE|TNT|PLUS1",TF1 Séries Films+1 - 4K
http://foo.bar:80/user/password/6

How I used it:

# The output without mapping enable in config
$ rm data/output.m3u; ./m3u-filter -c config.yml -m mapping.yml ; cat data/output.m3u 
#EXTM3U
#EXTINF:-1 tvg-id="" tvg-name="TF1_Séries_Film+1" group-title="FR|FRANCE",FR:TF1_Séries_Film+1 4K
http://foo.bar:80/user/password/6
#EXTINF:-1 tvg-id="TF1plus1.fr" tvg-name="TF1+1" group-title="|FR| ",FR|TF1+1_UHD
http://foo.bar:80/user/password/3
#EXTINF:-1 tvg-id="TF1.mu" tvg-name="TF1" group-title="|FR| FRANCE",FR:TF1
http://foo.bar:80/user/password/1
#EXTINF:-1 tvg-id="" tvg-name="TF1_HD" group-title="|FR| FRANCE",FR TF1_HD
http://foo.bar:80/user/password/2
#EXTINF:-1 tvg-id="TF1SeriesFilms.fr" tvg-name="TF1_Séries_Film" group-title="|FR| FRANCE",FR:TF1_Séries_Film
http://foo.bar:80/user/password/4
#EXTINF:-1 tvg-id="TF1SeriesFilms.fr" tvg-name="TF1_Séries_Film_HD" group-title="|FR| FRANCE",FR:TF1_Séries_Film_HD
http://foo.bar:80/user/password/5

$ # now enable the mapping in config.yml ^C
$ rm data/output.m3u; ./m3u-filter -c config.yml -m mapping.yml ; cat data/output.m3u 
#EXTM3U
#EXTINF:-1 tvg-id="" tvg-name="TF1_Séries_Film+1" group-title="FR|FRANCE",FR:TF1_Séries_Film+1 4K
http://foo.bar:80/user/password/6
#EXTINF:-1 tvg-id="TF1plus1.fr" tvg-name="TF1+1" group-title="|FR| ",FR|TF1+1_UHD
http://foo.bar:80/user/password/3
#EXTINF:-1 tvg-id="TF1.mu" tvg-name="TF1" group-title="|FR| FRANCE",FR:TF1
http://foo.bar:80/user/password/1
#EXTINF:-1 tvg-id="" tvg-name="TF1_HD" group-title="|FR| FRANCE",FR TF1_HD
http://foo.bar:80/user/password/2
#EXTINF:-1 tvg-id="TF1SeriesFilms.fr" tvg-name="TF1_Séries_Film" group-title="|FR| FRANCE",FR:TF1_Séries_Film
http://foo.bar:80/user/password/4
#EXTINF:-1 tvg-id="TF1SeriesFilms.fr" tvg-name="TF1_Séries_Film_HD" group-title="|FR| FRANCE",FR:TF1_Séries_Film_HD
http://foo.bar:80/user/password/5

$ # now enable the wildcard in mapping.yml ^C
$ rm data/output.m3u; ./m3u-filter -c config.yml -m mapping.yml ; cat data/output.m3u 
#EXTM3U
euzu commented

Thank you, i will test it,

> And what the purpose of .mappings.tag field? Could it be related to point 8 above?
i have looked in your python example and saw that you add tag parameter as a suffix.
I think it is point 8

kumy commented

And what the purpose of .mappings.tag field? Could it be related to point 8 above?
i have looked in your python example and saw that you add tag parameter as a suffix.
I think it is point 8

ah, then it's another thing. It could be a 12. I've 3 distinct providers and I wanted to add a suffix to distinguish them in kodi playlist.
I was passing it as an optional parameter to my command, the expected tvg-name + title would have been:

Without parameter:

  • TF1
  • TF1 HD
  • TF1+1 - UHD
  • TF1 Séries Films
  • TF1 Séries Films HD
  • TF1 Séries Films+1 - 4K

With parameter: - TV1

  • TF1 - TV1
  • TF1 HD - TV1
  • TF1+1 - UHD - TV1
  • TF1 Séries Films - TV1
  • TF1 Séries Films HD - TV1
  • TF1 Séries Films+1 - 4K - TV1

With parameter: |TV2

  • TF1 |TV2

Not sure yet how to represent that properly in config

euzu commented

In your examples, none of your regexp's match any of the tvg-name attributes?
Did i miss something ?

kumy commented

In your examples, none of your regexp's match any of the tvg-name attributes?

hum, effectively in the example above no lines are removed as I experienced yesterday 🤔 But anyway the .* is emptying the file

On another side, the regex worked on links (ex: https://regex101.com/r/UV233E/1) (without the named capture, I added it at the last minute)

kumy commented

With really simplified regex, it seems to match but doesn't remove the line as yesterday. I'm confused 😕

        tvg_names:
          - ^.*TF1.*$
$ rm data/output.m3u; ./m3u-filter -c config.yml -m mapping.yml ; cat data/output.m3u 
#EXTM3U
#EXTINF:-1 tvg-id="TF1.fr" tvg-name="TF1{ quality}" group-title="FR|FRANCE|FR|TNT" tvg-chno="1",FR:TF1_Séries_Film+1 4K
http://foo.bar:80/user/password/6
#EXTINF:-1 tvg-id="TF1.fr" tvg-name="TF1{ quality}" group-title="|FR||FR|TNT" tvg-chno="1",FR|TF1+1_UHD
http://foo.bar:80/user/password/3
#EXTINF:-1 tvg-id="TF1.fr" tvg-name="TF1{ quality}" group-title="|FR|FRANCE|FR|TNT" tvg-chno="1",FR:TF1
http://foo.bar:80/user/password/1
#EXTINF:-1 tvg-id="TF1.fr" tvg-name="TF1{ quality}" group-title="|FR|FRANCE|FR|TNT" tvg-chno="1",FR TF1_HD
http://foo.bar:80/user/password/2
#EXTINF:-1 tvg-id="TF1.fr" tvg-name="TF1{ quality}" group-title="|FR|FRANCE|FR|TNT" tvg-chno="1",FR:TF1_Séries_Film
http://foo.bar:80/user/password/4
#EXTINF:-1 tvg-id="TF1.fr" tvg-name="TF1{ quality}" group-title="|FR|FRANCE|FR|TNT" tvg-chno="1",FR:TF1_Séries_Film_HD
http://foo.bar:80/user/password/5

kumy commented

Any idea why those doesn't match #EXTINF:-1 tvg-id="TF1.mu" tvg-name="TF1" group-title="|FR| FRANCE",FR:TF1?

          - ^TF1$
          - "^TF1$"
          - "^TF1.*$"
          - "^TF.$"
          - "^TF.*$"

But that one has interesting results (check the lines with # That line should have matched)

          - "^TF.*$"
$ rm data/output.m3u; ./m3u-filter -c config.yml -m mapping.yml ; cat data/output.m3u 
#EXTM3U
#EXTINF:-1 tvg-id="TF1.fr" tvg-name="TF1{ quality}" group-title="FR|FRANCE|FR|TNT" tvg-chno="1",FR:TF1_Séries_Film+1 4K
http://foo.bar:80/user/password/6
#EXTINF:-1 tvg-id="TF1plus1.fr" tvg-name="TF1+1" group-title="|FR| ",FR|TF1+1_UHD
http://foo.bar:80/user/password/3
#EXTINF:-1 tvg-id="TF1.mu" tvg-name="TF1" group-title="|FR| FRANCE",FR:TF1
http://foo.bar:80/user/password/1   # That line should have matched
#EXTINF:-1 tvg-id="" tvg-name="TF1_HD" group-title="|FR| FRANCE",FR TF1_HD
http://foo.bar:80/user/password/2
#EXTINF:-1 tvg-id="TF1.fr" tvg-name="TF1{ quality}" group-title="|FR|FRANCE|FR|TNT" tvg-chno="1",FR:TF1_Séries_Film
http://foo.bar:80/user/password/4
#EXTINF:-1 tvg-id="TF1SeriesFilms.fr" tvg-name="TF1_Séries_Film_HD" group-title="|FR| FRANCE",FR:TF1_Séries_Film_HD   # That line should have matched
http://foo.bar:80/user/password/5

Those are working

          - ".*TF1$"
          - "^.*TF1$"
kumy commented

wondering, does the mapping happen before the rename ? ;)

EDIT: It should be that, as this one match

          - "^FR:TF1$"

Now I get why I'm confused I'm looking at the result file to mentally check my regex, because I forgot about the rename...

euzu commented

yes the rename happens after the mapping, should it before ? i mean rename then map. Currently its is map then rename

kumy commented

rename then map was what in my workflow and what I expected, but someone else would prefer to work the other way, maybe it could be configurable? If I have to choose I prefer to stick with rename then map 🙏

euzu commented

i have now a new version which seems to work, can you test it, are you compiling your binary or should i provide you one ?
The processing order is now configurable for each target, default is Frm.

Frm, Fmr, Rfm, Rmf, Mfr, Mrf

kumy commented

I can try to compile :)

euzu commented

ok i have pushed the changes, let me know if you need the binary compiled. Dont test the .* regexp, i did not debug what the problem is. First we should make it run, then we can test the problem with the .* regexp. And use . for the french letters with accent in the regexp.

euzu commented

My test files:

Ok i see there is a problem if the named capture has no value, the variable is not replaced.

mapping.yml

mappings:
  - id: France
    tag: ""
    mapper:
      #- tvg_name: FOOBAR
      #  tvg_names:
      #    - .*
      #  tvg_id: FOO.BAR
      #  tvg_chno: "42"
      #  tvg_logo: ""
      #  group_title: 
      #    - TEST
      - tvg_name: TF1 $quality
        # https://regex101.com/r/UV233E/1
        tvg_names:
          - '^\s*(FR)?[: |]?TF1[\s_-]*(?P<quality>HD|hd|LQ|lq|4K|4k|UHD|uhd)?\s*$'
        tvg_id: TF1.fr
        tvg_chno: "1"
        tvg_logo: https://emojipedia-us.s3.amazonaws.com/source/skype/289/shrimp_1f990.png
        group_title:
          - FR
          - TNT
      - tvg_name: TF1 Séries Films $quality
        # https://regex101.com/r/lU0hjK/1
        tvg_names:
          - '^.*TF1[\s_-]*S.ries?[\s_-]*Films?([\s_-]*(?P<quality>HD|hd|LQ|lq|4K|4k|UHD|uhd)?)\s*$'
        tvg_id: TF1SeriesFilms.fr
        tvg_chno: "20"
        tvg_logo: https://emojipedia-us.s3.dualstack.us-west-1.amazonaws.com/thumbs/120/google/350/shrimp_1f990.png
        group_title:
          - FR
          - TNT
      - tvg_name: TF1 +1 - $quality
        # https://regex101.com/r/WGZlWa/1
        tvg_names:
          - '^.*TF1[\s_-]*S.ries?[\s_-]*Films?[\s_-]*(\+|plus)1([\s_-]*(?P<quality>hd|lq|4k|uhd)?)\s*$'
        tvg_id: TF1Plus1.fr
        tvg_chno: "1"
        tvg_logo: https://emojipedia-us.s3.amazonaws.com/source/skype/289/shrimp_1f990.png
        group_title:
          - FR
          - TNT
          - PLUS1

config.yml

api: { host: 127.0.0.1, port: 8901, web_root: ./web }
working_dir: /home/euzu/projects/m3u-test/data
sources:
  - input: { persist: , url: '/home/euzu/projects/m3u-test/kumy-input.m3u' }
    targets:
      - filename: output.m3u
        type: M3u
        filter: 'Group ~ "^(\s|\|)?FR(\s|\|).*" AND Name ~ "(FR[:|])?TF1.*"'
        options: { ignore_logo: true }
        rename:
          - { field: Name, new_name: $2, pattern: "^(FR[ :|])?(.*)" }
        sort: { order: Asc }
        mapping: France

input.m3u

#EXTM3U
#EXTINF:-1 tvg-id="TF1.mu" tvg-name="FR:TF1" group-title="|FR| FRANCE",FR:TF1
http://foo.bar:80/user/password/1
#EXTINF:-1 tvg-id="" tvg-name="FR:TF1_HD" group-title="|FR| FRANCE",FR TF1_HD
http://foo.bar:80/user/password/2
#EXTINF:-1 tvg-id="TF1plus1.fr" tvg-name="FR|TF1+1" tvg-logo="https://emojipedia-us.s3.dualstack.us-west-1.amazonaws.com/thumbs/120/google/350/shrimp_1f990.png" group-title="|FR| ",FR|TF1+1_UHD
http://foo.bar:80/user/password/3
#EXTINF:-1 tvg-id="TF1SeriesFilms.fr" tvg-name="TF1_Séries_Film" tvg-logo="" group-title="|FR| FRANCE",FR:TF1_Séries_Film
http://foo.bar:80/user/password/4
#EXTINF:-1 tvg-id="TF1SeriesFilms.fr" tvg-name="FR TF1_Séries_Film_HD" tvg-logo="" group-title="|FR| FRANCE",FR:TF1_Séries_Film_HD
http://foo.bar:80/user/password/5
#EXTINF:-1 tvg-id="" tvg-name="TF1_Séries_Film+1" tvg-logo="" group-title="FR|FRANCE",FR:TF1_Séries_Film+1 4K
http://foo.bar:80/user/password/6

output.m3u

#EXTM3U
#EXTINF:-1 tvg-id="TF1Plus1.fr" tvg-name="TF1 +1 - $quality" group-title="FR|FRANCE|FR|TNT|PLUS1" tvg-chno="1",FR:TF1_Séries_Film+1 4K
http://foo.bar:80/user/password/6
#EXTINF:-1 tvg-id="TF1plus1.fr" tvg-name="TF1+1" group-title="|FR| ",FR|TF1+1_UHD
http://foo.bar:80/user/password/3
#EXTINF:-1 tvg-id="TF1.fr" tvg-name="TF1 $quality" group-title="|FR|FRANCE|FR|TNT" tvg-chno="1",FR:TF1
http://foo.bar:80/user/password/1
#EXTINF:-1 tvg-id="TF1.fr" tvg-name="TF1 HD" group-title="|FR|FRANCE|FR|TNT" tvg-chno="1",FR TF1_HD
http://foo.bar:80/user/password/2
#EXTINF:-1 tvg-id="TF1SeriesFilms.fr" tvg-name="TF1 Séries Films $quality" group-title="|FR|FRANCE|FR|TNT" tvg-chno="20",FR:TF1_Séries_Film
http://foo.bar:80/user/password/4
#EXTINF:-1 tvg-id="TF1SeriesFilms.fr" tvg-name="TF1 Séries Films HD" group-title="|FR|FRANCE|FR|TNT" tvg-chno="20",FR:TF1_Séries_Film_HD
http://foo.bar:80/user/password/5
kumy commented

First time I compile rust :)

I have this error

$ cargo build --release
    Updating crates.io index
  Downloaded alloc-stdlib v0.2.2
  Downloaded jobserver v0.1.25
[…]
   Compiling actix-rt v2.7.0
   Compiling actix-codec v0.5.0
   Compiling enum-iterator v1.2.0
error[E0658]: use of unstable library feature 'array_from_fn'
   --> /home/kumy/.cargo/registry/src/github.com-1ecc6299db9ec823/enum-iterator-1.2.0/src/lib.rs:554:18
    |
554 |             Some(core::array::from_fn(|_| unreachable!()))
    |                  ^^^^^^^^^^^^^^^^^^^^
    |
    = note: see issue #89379 <https://github.com/rust-lang/rust/issues/89379> for more information

error[E0658]: use of unstable library feature 'array_from_fn'
   --> /home/kumy/.cargo/registry/src/github.com-1ecc6299db9ec823/enum-iterator-1.2.0/src/lib.rs:557:18
    |
557 |             Some(core::array::from_fn(|_| x.clone()))
    |                  ^^^^^^^^^^^^^^^^^^^^
    |
    = note: see issue #89379 <https://github.com/rust-lang/rust/issues/89379> for more information

error[E0658]: use of unstable library feature 'array_from_fn'
   --> /home/kumy/.cargo/registry/src/github.com-1ecc6299db9ec823/enum-iterator-1.2.0/src/lib.rs:563:18
    |
563 |             Some(core::array::from_fn(|_| unreachable!()))
    |                  ^^^^^^^^^^^^^^^^^^^^
    |
    = note: see issue #89379 <https://github.com/rust-lang/rust/issues/89379> for more information

error[E0658]: use of unstable library feature 'array_from_fn'
   --> /home/kumy/.cargo/registry/src/github.com-1ecc6299db9ec823/enum-iterator-1.2.0/src/lib.rs:566:18
    |
566 |             Some(core::array::from_fn(|_| x.clone()))
    |                  ^^^^^^^^^^^^^^^^^^^^
    |
    = note: see issue #89379 <https://github.com/rust-lang/rust/issues/89379> for more information

   Compiling pest v2.5.3
   Compiling zstd v0.11.2+zstd.1.5.2
For more information about this error, try `rustc --explain E0658`.
error: could not compile `enum-iterator` due to 4 previous errors
warning: build failed, waiting for other jobs to finish...
$ git lol |head -1
* 3bdbf88 (HEAD -> master, origin/master, origin/HEAD) refactored processing order and mapping

$ rustc --version
rustc 1.61.0

$ cargo --version
cargo 1.61.0

Am I doing something wrong?

EDIT: commit 04d32da is building fine

euzu commented

rustc --version Fr 13 Jan 2023 16:00:05 CET
rustc 1.66.1 (90743e729 2023-01-10)

kumy commented

ok, moving my rustc install from deb to curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh

EDIT: binary built 👍

kumy commented

I have initial requirements working 🙇 (except the tvg-logo)

Adjusted mapping.yml for case insensitive matches and the .

mappings:
  - id: France
    tag: ""
    mapper:
      ## That one empty the file
      #- tvg_name: FOOBAR
      #  tvg_names:
      #    - .*
      #  tvg_id: FOO.BAR
      #  tvg_chno: "42"
      #  tvg_logo: ""
      #  group_title: 
      #    - TEST
      - tvg_name: TF1{ quality}
        # https://regex101.com/r/UV233E/1
        tvg_names:
          - ^(?i)\s*TF1[\s_-]*(?P<quality>hd|lq|4k|uhd)?\s*$
        tvg_id: TF1.fr
        tvg_chno: "1"
        tvg_logo: https://emojipedia-us.s3.amazonaws.com/source/skype/289/shrimp_1f990.png
        group_title:
          - FR
          - TNT
      - tvg_name: TF1 Séries Films{ quality}
        # https://regex101.com/r/lU0hjK/1
        tvg_names:
          - ^(?i)\s*TF1[\s_-]*S.ries?[\s_-]*Films?([\s_-]*(?P<quality>hd|lq|4k|uhd)?)\s*$
        tvg_id: TF1SeriesFilms.fr
        tvg_chno: "20"
        tvg_logo: https://emojipedia-us.s3.dualstack.us-west-1.amazonaws.com/thumbs/120/google/350/shrimp_1f990.png
        group_title:
          - FR
          - TNT
      - tvg_name: TF1 +1{ - quality}
        # https://regex101.com/r/WGZlWa/1
        tvg_names:
          - ^(?i)\s*TF1[\s_-]*(\+|plus)1([\s_-]*(?P<quality>hd|lq|4k|uhd)?)\s*$
        tvg_id: TF1Plus1.fr
        tvg_chno: "1"
        tvg_logo: https://emojipedia-us.s3.amazonaws.com/source/skype/289/shrimp_1f990.png
        group_title:
          - FR
          - TNT
          - PLUS1
      - tvg_name: TF1 Séries Films+1{ - quality}
        # https://regex101.com/r/mV6zOc/1
        tvg_names:
          - ^(?i)\s*TF1[\s_-]*S.ries?[\s_-]*Films?[\s_-]*(\+|plus)1([\s_-]*(?P<quality>hd|lq|4k|uhd)?)\s*$
        tvg_id: TF1SeriesFilmsPlus1.fr
        tvg_chno: "20"
        tvg_logo: https://emojipedia-us.s3.dualstack.us-west-1.amazonaws.com/thumbs/120/google/350/shrimp_1f990.png
        group_title:
          - FR
          - TNT
          - PLUS1

Same input.yml and config.yml as in #3 (comment)

output

#EXTM3U
#EXTINF:-1 tvg-id="TF1SeriesFilmsPlus1.fr" tvg-name="TF1 Séries Films+1{ - quality}" group-title="FR|FRANCE|FR|TNT|PLUS1" tvg-chno="20",FR:TF1_Séries_Film+1 4K
http://foo.bar:80/user/password/6
#EXTINF:-1 tvg-id="TF1Plus1.fr" tvg-name="TF1 +1{ - quality}" group-title="|FR||FR|TNT|PLUS1" tvg-chno="1",FR|TF1+1_UHD
http://foo.bar:80/user/password/3
#EXTINF:-1 tvg-id="TF1.fr" tvg-name="TF1{ quality}" group-title="|FR|FRANCE|FR|TNT" tvg-chno="1",FR:TF1
http://foo.bar:80/user/password/1
#EXTINF:-1 tvg-id="TF1.fr" tvg-name="TF1{ quality}" group-title="|FR|FRANCE|FR|TNT" tvg-chno="1",FR TF1_HD
http://foo.bar:80/user/password/2
#EXTINF:-1 tvg-id="TF1SeriesFilms.fr" tvg-name="TF1 Séries Films{ quality}" group-title="|FR|FRANCE|FR|TNT" tvg-chno="20",FR:TF1_Séries_Film
http://foo.bar:80/user/password/4
#EXTINF:-1 tvg-id="TF1SeriesFilms.fr" tvg-name="TF1 Séries Films{ quality}" group-title="|FR|FRANCE|FR|TNT" tvg-chno="20",FR:TF1_Séries_Film_HD
http://foo.bar:80/user/password/5
euzu commented

options: { ignore_logo: true }

euzu commented

You should put all regular expressions into ' single quotes to avoid any misinterpretations.

kumy commented

Ok, I've added the single quotes ✔️

I just found that changing the case of a single letter in the input file, makes it disappear from the output

--- data/input.m3u.orig	2023-01-13 17:11:27.989780320 +0100
+++ data/input.m3u	2023-01-13 17:11:31.141756979 +0100
@@ -1,5 +1,5 @@
 #EXTM3U
-#EXTINF:-1 tvg-id="TF1.mu" tvg-name="FR:TF1" group-title="|FR| FRANCE",FR:TF1
+#EXTINF:-1 tvg-id="TF1.mu" tvg-name="FR:Tf1" group-title="|FR| FRANCE",FR:TF1
 http://foo.bar:80/user/password/1
 #EXTINF:-1 tvg-id="" tvg-name="FR:TF1_HD" group-title="|FR| FRANCE",FR TF1_HD
 http://foo.bar:80/user/password/2
$ rm data/output.m3u; ./m3u-filter -c config.yml -m mapping.yml -v; cat data/output.m3u 
working dir: "/srv/IPTV/m3u-filter_v0.9.4_linux_x86_64/data"
persist file: Some("/srv/IPTV/m3u-filter_v0.9.4_linux_x86_64/data/input.m3u.bak")
#EXTM3U
#EXTINF:-1 tvg-id="TF1SeriesFilmsPlus1.fr" tvg-name="TF1 Séries Films+1{ - quality}" group-title="FR|FRANCE|FR|TNT|PLUS1" tvg-chno="20" tvg-logo="https://emojipedia-us.s3.dualstack.us-west-1.amazonaws.com/thumbs/120/google/350/shrimp_1f990.png",FR:TF1_Séries_Film+1 4K
http://foo.bar:80/user/password/6
#EXTINF:-1 tvg-id="TF1Plus1.fr" tvg-name="TF1 +1{ - quality}" group-title="|FR||FR|TNT|PLUS1" tvg-chno="1" tvg-logo="https://emojipedia-us.s3.amazonaws.com/source/skype/289/shrimp_1f990.png",FR|TF1+1_UHD
http://foo.bar:80/user/password/3
#EXTINF:-1 tvg-id="TF1.fr" tvg-name="TF1{ quality}" group-title="|FR|FRANCE|FR|TNT" tvg-chno="1" tvg-logo="https://emojipedia-us.s3.amazonaws.com/source/skype/289/shrimp_1f990.png",FR TF1_HD
http://foo.bar:80/user/password/2
#EXTINF:-1 tvg-id="TF1SeriesFilms.fr" tvg-name="TF1 Séries Films{ quality}" group-title="|FR|FRANCE|FR|TNT" tvg-chno="20" tvg-logo="https://emojipedia-us.s3.dualstack.us-west-1.amazonaws.com/thumbs/120/google/350/shrimp_1f990.png",FR:TF1_Séries_Film
http://foo.bar:80/user/password/4
#EXTINF:-1 tvg-id="TF1SeriesFilms.fr" tvg-name="TF1 Séries Films{ quality}" group-title="|FR|FRANCE|FR|TNT" tvg-chno="20" tvg-logo="https://emojipedia-us.s3.dualstack.us-west-1.amazonaws.com/thumbs/120/google/350/shrimp_1f990.png",FR:TF1_Séries_Film_HD
http://foo.bar:80/user/password/5
euzu commented

You have a filter:
filter: 'Group ~ "^(\s|\|)?FR(\s|\|).*" AND Name ~ "(FR[:|])?TF1.*"'

which eleminates anything not matching this filter.

kumy commented

🤦‍♂️ 🤦‍♂️ 🤦‍♂️

Added (?i) to make it case insensitive:
filter: 'Group ~ "^(\s|\|)?FR(\s|\|).*" AND Name ~ "(?i)(FR[:|])?TF1.*"'

kumy commented

Congratulation 🎉 you did that so quickly 🙇 🦸‍♂️ !
Looking forward the polishing with the fresh ideas from this morning ❤️

euzu commented

if you want case insensitive regexp you need to add the (?i) option.

regex crate

(?flags) set flags within current group
i - case-insensitive: letters match both upper and lower case

kumy commented

Thanks, I already did it :) and here #3 (comment)
image

euzu commented

You need to replace {quality} with $quality.

- tvg_name: TF1 Séries Films $quality

regex crate named capture groups

kumy commented

Working as per the initial comment specs now.

I made some minor fixes to the input file

--- data/input.m3u.bak	2023-01-13 17:42:39.519068112 +0100
+++ data/input.m3u	2023-01-13 17:44:58.105257188 +0100
@@ -3,11 +3,11 @@
 http://foo.bar:80/user/password/1
 #EXTINF:-1 tvg-id="" tvg-name="FR:TF1_HD" group-title="|FR| FRANCE",FR TF1_HD
 http://foo.bar:80/user/password/2
-#EXTINF:-1 tvg-id="TF1plus1.fr" tvg-name="FR|TF1+1" tvg-logo="https://emojipedia-us.s3.dualstack.us-west-1.amazonaws.com/thumbs/120/google/350/shrimp_1f990.png" group-title="|FR| ",FR|TF1+1_UHD
+#EXTINF:-1 tvg-id="TF1plus1.fr" tvg-name="FR|TF1+1_UHD" tvg-logo="https://emojipedia-us.s3.dualstack.us-west-1.amazonaws.com/thumbs/120/google/350/shrimp_1f990.png" group-title="|FR| ",FR|TF1+1_UHD
 http://foo.bar:80/user/password/3
-#EXTINF:-1 tvg-id="TF1SeriesFilms.fr" tvg-name="TF1_Séries_Film" tvg-logo="" group-title="|FR| FRANCE",FR:TF1_Séries_Film
+#EXTINF:-1 tvg-id="TF1SeriesFilms.fr" tvg-name="FR TF1_Séries_Film" tvg-logo="" group-title="|FR| FRANCE",FR:TF1_Séries_Film
 http://foo.bar:80/user/password/4
 #EXTINF:-1 tvg-id="TF1SeriesFilms.fr" tvg-name="FR TF1_Séries_Film_HD" tvg-logo="" group-title="|FR| FRANCE",FR:TF1_Séries_Film_HD
 http://foo.bar:80/user/password/5
-#EXTINF:-1 tvg-id="" tvg-name="TF1_Séries_Film+1" tvg-logo="" group-title="FR|FRANCE",FR:TF1_Séries_Film+1 4K
+#EXTINF:-1 tvg-id="" tvg-name="FR:TF1_Séries_Film+1 4K" tvg-logo="" group-title="FR|FRANCE",FR:TF1_Séries_Film+1 4K
 http://foo.bar:80/user/password/6
euzu commented

Your ideas:

  1. SIMPLIFY REGEX: normalize the tvg_name input for comparison (éè -> e ; à -> a) (done) 👍

i have two solutions for this:
a) use . instead of the letter (nothing to do)
b) use of unidecode before regexp matching. (done) 👍

  1. SIMPLIFY REGEX: regex match as case insensitive? (done) 👍
  2. DRY: generic pattern for suffixes? (?P<quality>[\s_-]*(hd|lq|4k|uhd)?) (done) 👍
  3. DRY: generic pattern for separator? [\s_-]* (done) 👍
  4. DRY: define custom reusable patterns? PLUS1=[\s_-]*(\+|plus)1 (done) 👍
  5. DRY: ignore spaces in front/end of tvg-name by default? but we can set that in the regex anyway
  6. BONUS: it actually check matches on tvg-name field (that is what was initially requested ;)), but could be nice to allow match on any field like it's done with filters?
  7. DRY: common/default group title to always add? Allow to add FR for all
  8. CLEANING: a way force remove groups (per mapper or globally)? FRANCE|FOOBAR|SOMETHING
  9. Also overwrite the Title (I think I forgot to ask for it)
  10. BONUS: Capture regex group (quality in this example) and have a syntax to replace pattern in Name+Tile if group is not empty (to prevent leading spaces or -)
    • mapping:TF1 + m3u:TF1 HD => TF1
    • mapping:TF1 $quality + m3u:TF1 => TF1
    • mapping:TF1 $quality + m3u:TF1_HD => TF1 HD
    • mapping:TF1 - $quality + m3u:TF1 => TF1
    • mapping:TF1 - $quality + m3u:TF1_HD => TF1 - HD

i dont understand this?
Also, really nice idea to have mapping to be an array and having an easy way to use/enable it

What is with the tag parameter ? Currently if set it is a suffix for the tvg-name.

kumy commented

i dont understand this?

Also, really nice idea to have mapping to be an array and having an easy way to use/enable it

I meant, that have the ability to have multiple mappings rules in the mapping.yml file is cool 👍 I like that feature

mappings:
  - id: France
    tag: ""
    mapper:
  - id: Belgium
    tag: ""
    mapper:
  - id: United Kingdom
    tag: ""
    mapper:

What is with the tag parameter ? Currently if set it is a suffix for the tvg-name.

To link to #3 (comment)

Could the tag be moved to config.yml?

api: { host: 127.0.0.1, port: 8901, web_root: ./web }
working_dir: ./data
sources:
  - input: { persist: 'input.m3u.bak', url: 'input.m3u' }
    targets:
      - filename: output.m3u
        type: M3u
        filter: 'Group ~ "^(\s|\|)?FR(\s|\|).*" AND Name ~ "(?i)(FR[:|])?TF1.*"'
        options: { ignore_logo: true }
        rename:
          - { field: Name, new_name: $2, pattern: "^(FR[ :|])?(.*)" }
        sort: { order: Asc }
        mapping: France
        tag: 'TV1'
euzu commented

Added mapping parameter decode_to_ascii. Default is false.
If true before regexp matching the matching text will be converted to ascii. unidecode

If you set it to true you can use e instead of . for é inside the regexp.

mappings:
  - id: kumy
    tag: ""
    decode_to_ascii: true
    mapper:
      - tvg_name: TF1
euzu commented

Ok i understand, you mean you want to apply more than one mapping to each target.

It is possible to define more than one mapping n the mapping.yml file, but you can only assign one to a target.

You want to assign mutliple mappings to one target ?

kumy commented

I'm sorry for the confusion, I was not requesting more code for that, I was just saying it was not in my original spec and I think it's a nice addition

You want to assign multliple mappings to one target ?

I never thought about it before, but now you mention it, it may be useful ;)

euzu commented
  • mapping attribute for target is now a list. You can assign multiple mapper to a target.
mapping:
  - France
  - Belgium
  - Germany
kumy commented

On commit 4178777:

  • ✔️ the final tvg-name seems to be the string from the input and no more the tvg_name value from the mapping.yml. Before if was tvg-name="TF1 $quality", now it's tvg-name="Tf1". That good to not see the $quality anymore, but the output is not aligned "Tf1" vs "TF1 HD". looks like a regression. Fixed
#EXTINF:-1 tvg-id="TF1.fr" tvg-name="tf1" group-title="|FR|FRANCE|FR|TNT" tvg-chno="1",FR:TF1
http://foo.bar:80/user/password/1
#EXTINF:-1 tvg-id="TF1.fr" tvg-name="TF1 HD" group-title="|FR|FRANCE|FR|TNT" tvg-chno="1",FR TF1_HD
http://foo.bar:80/user/password/2
  • ✔️ can't get the decode_to_ascii working. EDIT solved by #3 (comment)
    my regex are now
'^(?i)\s*TF1[\s_-]*Series?[\s_-]*Films?[\s_-]*(\+|plus)1([\s_-]*(?P<quality>hd|lq|4k|uhd)?)\s*$'
'^(?i)\s*TF1[\s_-]*Series?[\s_-]*Films?[\s_-]*(\+|plus)1([\s_-]*(?P<quality>hd|lq|4k|uhd)?)\s*$'

but the output is

#EXTM3U
#EXTINF:-1 tvg-id="" tvg-name="TF1_Séries_Film+1 4K" group-title="FR|FRANCE",FR:TF1_Séries_Film+1 4K
http://foo.bar:80/user/password/6
#EXTINF:-1 tvg-id="TF1Plus1.fr" tvg-name="TF1+1 - UHD" group-title="|FR||FR|TNT|PLUS1" tvg-chno="1",FR|TF1+1_UHD
http://foo.bar:80/user/password/3
#EXTINF:-1 tvg-id="TF1.fr" tvg-name="Tf1" group-title="|FR|FRANCE|FR|TNT" tvg-chno="1",FR:TF1
http://foo.bar:80/user/password/1
#EXTINF:-1 tvg-id="TF1.fr" tvg-name="TF1 HD" group-title="|FR|FRANCE|FR|TNT" tvg-chno="1",FR TF1_HD
http://foo.bar:80/user/password/2
#EXTINF:-1 tvg-id="TF1SeriesFilms.fr" tvg-name="TF1_Séries_Film" group-title="|FR| FRANCE",FR:TF1_Séries_Film
http://foo.bar:80/user/password/4
#EXTINF:-1 tvg-id="TF1SeriesFilms.fr" tvg-name="TF1_Séries_Film_HD" group-title="|FR| FRANCE",FR:TF1_Séries_Film_HD
http://foo.bar:80/user/password/5
mapping.yml

mappings:
  - id: France
    tag: ""
    decode_to_ascii: true
    mapper:
      ## That one empty the file
      #- tvg_name: FOOBAR
      #  tvg_names:
      #    - .*
      #  tvg_id: FOO.BAR
      #  tvg_chno: "42"
      #  tvg_logo: ""
      #  group_title: 
      #    - TEST
      - tvg_name: TF1 $quality
        # https://regex101.com/r/UV233E/1
        tvg_names:
          - '^(?i)\s*TF1[\s_-]*(?P<quality>hd|lq|4k|uhd)?\s*$'
        tvg_id: TF1.fr
        tvg_chno: "1"
        tvg_logo: https://emojipedia-us.s3.amazonaws.com/source/skype/289/shrimp_1f990.png
        group_title:
          - FR
          - TNT
      - tvg_name: TF1 Séries Films $quality
        # https://regex101.com/r/lU0hjK/1
        tvg_names:
          - '^(?i)\s*TF1[\s_-]*Series?[\s_-]*Films?([\s_-]*(?P<quality>hd|lq|4k|uhd)?)\s*$'
        tvg_id: TF1SeriesFilms.fr
        tvg_chno: "20"
        tvg_logo: https://emojipedia-us.s3.dualstack.us-west-1.amazonaws.com/thumbs/120/google/350/shrimp_1f990.png
        group_title:
          - FR
          - TNT
      - tvg_name: TF1+1 - $quality
        # https://regex101.com/r/WGZlWa/1
        tvg_names:
          - '^(?i)\s*TF1[\s_-]*(\+|plus)1([\s_-]*(?P<quality>hd|lq|4k|uhd)?)\s*$'
        tvg_id: TF1Plus1.fr
        tvg_chno: "1"
        tvg_logo: https://emojipedia-us.s3.amazonaws.com/source/skype/289/shrimp_1f990.png
        group_title:
          - FR
          - TNT
          - PLUS1
      - tvg_name: TF1 Séries Films+1 - $quality
        # https://regex101.com/r/mV6zOc/1
        tvg_names:
          - '^(?i)\s*TF1[\s_-]*Series?[\s_-]*Films?[\s_-]*(\+|plus)1([\s_-]*(?P<quality>hd|lq|4k|uhd)?)\s*$'
        tvg_id: TF1SeriesFilmsPlus1.fr
        tvg_chno: "20"
        tvg_logo: https://emojipedia-us.s3.dualstack.us-west-1.amazonaws.com/thumbs/120/google/350/shrimp_1f990.png
        group_title:
          - FR
          - TNT
          - PLUS1

config.yml

api: { host: 127.0.0.1, port: 8901, web_root: ./web }
working_dir: ./data
sources:
  - input: { persist: 'input.m3u.bak', url: 'input.m3u' }
    targets:
      - filename: output.m3u
        type: M3u
        filter: 'Group ~ "^(\s|\|)?FR(\s|\|).*" AND Name ~ "(?i)(FR[:|])?TF1.*"'
        options: { ignore_logo: true }
        rename:
          - { field: Name, new_name: $2, pattern: "^(FR[ :|])?(.*)" }
        sort: { order: Asc }
        mapping: France
        tag: 'TV1'

input

#EXTM3U
#EXTINF:-1 tvg-id="TF1.mu" tvg-name="FR:Tf1" group-title="|FR| FRANCE",FR:TF1
http://foo.bar:80/user/password/1
#EXTINF:-1 tvg-id="" tvg-name="FR:TF1_HD" group-title="|FR| FRANCE",FR TF1_HD
http://foo.bar:80/user/password/2
#EXTINF:-1 tvg-id="TF1plus1.fr" tvg-name="FR|TF1+1_UHD" tvg-logo="https://emojipedia-us.s3.dualstack.us-west-1.amazonaws.com/thumbs/120/google/350/shrimp_1f990.png" group-title="|FR| ",FR|TF1+1_UHD
http://foo.bar:80/user/password/3
#EXTINF:-1 tvg-id="TF1SeriesFilms.fr" tvg-name="FR TF1_Séries_Film" tvg-logo="" group-title="|FR| FRANCE",FR:TF1_Séries_Film
http://foo.bar:80/user/password/4
#EXTINF:-1 tvg-id="TF1SeriesFilms.fr" tvg-name="FR TF1_Séries_Film_HD" tvg-logo="" group-title="|FR| FRANCE",FR:TF1_Séries_Film_HD
http://foo.bar:80/user/password/5
#EXTINF:-1 tvg-id="" tvg-name="FR:TF1_Séries_Film+1 4K" tvg-logo="" group-title="FR|FRANCE",FR:TF1_Séries_Film+1 4K
http://foo.bar:80/user/password/6

euzu commented

sorry it is match_ascii not decode_to_ascii.
i should rename it to match_as_ascii.

euzu commented

i have pushed a new version,

use match_as_ascii: true.

i have added templates for regexp:

i have used !VAR_NAME! inside the regexp.

should we use nameinstead of key for the property ?

mappings:
  - id: France
    tag: ""
    match_as_ascii: true
    templates:
      - key: delimiter
        value: '[\s_-]*'
      - key: quality
        value: '(?i)(?P<quality>HD|LQ|4K|UHD)?'
    mapper:
      - tvg_name: TF1 $quality
        # https://regex101.com/r/UV233E/1
        tvg_names:
          - '^\s*(FR)?[: |]?TF1!delimiter!!quality!\s*$'
        tvg_id: TF1.fr
        tvg_chno: "1"
        tvg_logo: https://emojipedia-us.s3.amazonaws.com/source/skype/289/shrimp_1f990.png
        group_title:
          - FR
          - TNT
      - tvg_name: TF1 Séries Films $quality
        # https://regex101.com/r/lU0hjK/1
        tvg_names:
          - '^.*TF1!delimiter!Series?!delimiter!Films?(!delimiter!!quality!)\s*$'
        tvg_id: TF1SeriesFilms.fr
        tvg_chno: "20"
        tvg_logo: https://emojipedia-us.s3.dualstack.us-west-1.amazonaws.com/thumbs/120/google/350/shrimp_1f990.png
        group_title:
          - FR
          - TNT 
      - tvg_name: TF1 +1 - $quality
        # https://regex101.com/r/WGZlWa/1
        tvg_names:
          - '^.*TF1!delimiter!Series?!delimiter!Films?!delimiter!(\+|plus)1(!delimiter!!quality!)\s*$'
        tvg_id: TF1Plus1.fr
        tvg_chno: "1"
        tvg_logo: https://emojipedia-us.s3.amazonaws.com/source/skype/289/shrimp_1f990.png
        group_title:
          - FR
          - TNT
          - PLUS1
kumy commented

Actually trying with real data and it's working fine until now.

  1. ✔️ There seems to be a small issue with the decode_to_ascii it's not working anymore. -> fixed: match_as_ascii parameter

  2. New idea: It would be nice if the templates could be nested, see the commented example for nested version

    templates:
      - key: '~'
        value: '[\s_-]*'
      - key: 'delimiter'
        value: '[\s_-]*'
        # value: '!~!'
      - key: quality
        value: '[\s_-]*(?i)(?P<quality>HD|LQ|4K|UHD)?[\s_-]*'
        # value: '!~!(?i)(?P<quality>HD|LQ|4K|UHD)?!~!'
      - key: plus1
        value: '[\s_-]*(\+|plus)1'
        # value: '!~!(\+|plus)1'
      - key: start
        value: '^(?i)[\s_-]*'
        # value: '^(?i)!~!'
      - key: end
        value: '[\s_-]*(?i)(?P<quality>HD|LQ|4K|UHD)?[\s_-]*$'
        # value: '!quality!$'

And some sample usage:

          - '!start!TF1!end!$'
          - '!start!TF1!plus1!!end!$'
          - '!start!TF1!~!S.ries?!~!Films?!end!'
          - '!start!TF1!~!S.ries?!~!Films?!plus1!!end!'
euzu commented

decode_to_ascii is now match_as_ascii

kumy commented
  1. New idea: Could you please make the capture groups usable in the tag 🙏
euzu commented

what do you mean with usable in the tag? Can you give me an example?

kumy commented

Sorry, sure, here is an example:

With such config:

mappings:
  - id: France
    tag: "$quality"
    match_as_ascii: true
    mapper:
      - tvg_name: TF1

$quality is not replaced

#EXTINF:-1 tvg-id="TF1.fr" tvg-name="TF1- $quality" group-title="FR|TNT" tvg-chno="1" tvg-logo="https://broccoli.tvchannellists.com/images/8/85/TF1.svg",TF1

Expected result

#EXTINF:-1 tvg-id="TF1.fr" tvg-name="TF1 - HD" group-title="FR|TNT" tvg-chno="1" tvg-logo="https://broccoli.tvchannellists.com/images/8/85/TF1.svg",TF1

or if quality was not captured

#EXTINF:-1 tvg-id="TF1.fr" tvg-name="TF1" group-title="FR|TNT" tvg-chno="1" tvg-logo="https://broccoli.tvchannellists.com/images/8/85/TF1.svg",TF1
euzu commented

yes that is a good idea.
I have opened a new issue: tag with captures

euzu commented

I want to close this issue because it is more than one issue. Can you open issues for each open request.
i have pushed a new version for issue tag with captures. Please can you test and answer to the issue.