prof18/RSS-Parser

Unexpected token (position:TEXT ���@1:4 in java.io.InputStreamReader@7982e8c)

skeie opened this issue · 5 comments

skeie commented

Hey man, really good job with this library, the API is 🔥 !

Describe the bug
When I try to parse this URL: https://podkast.nrk.no/program/loerdagsraadet.rss I get
Unexpected token (position:TEXT ���@1:4 in java.io.InputStreamReader@7982e8c)
any idea on how to solve this?

The link of the RSS Feed
https://podkast.nrk.no/program/loerdagsraadet.rss

Thank you!

The issue is the presence of the BOM char, which isn't something not necessary on UTF-8 😅

Screenshot 2023-01-24 at 22 03 46

I'll check if I can do something during the parsing, but I can't promise anything since this is something that should be fixed by the feed

skeie commented

Ah, good point!

I'm been playing a bit around with this and this might be a very naive way of doing it

val url = URL(channelUri)
                val connection = url.openConnection()
                val inputStream = connection.getInputStream()
                val byteArray = inputStream.readBytes()
                val bom = byteArrayOf(0xEF.toByte(), 0xBB.toByte(), 0xBF.toByte())

                var hasBoom = true;

                for (i in 0..2) {
                    if(byteArray[i] != bom[i]) {
                        hasBoom = false
                        break
                    }
                }

                val contentWithoutBom = if(hasBoom) {
                    byteArray.copyOfRange(3, byteArray.size)
                } else {
                    byteArray
                }

                val contentWithoutBoom = String(contentWithoutBom, Charsets.UTF_8)
                parser.parse(contentWithoutBoom)

I would love to make it more robust if it needs to in order to get it into the library, if you still think it makes sense that the library handles this :)

I probably found an optimization that fixes this issue and improve the performance as well!

skeie commented

Amazing - thank you!

Do you have any ETA when you will do a new release? :)
No rush, just curious!

In the next two weeks, hopefully! :)