feed pubDate RFC format
Closed this issue · 1 comments
Describe the bug
You assume that the feed will have RFC1123 date format on pudDate node while parsing the feed.
I've seen feeds that come as RFC1123Z which will cause time.Parse to fail and skip the episodes all together.
To Reproduce
Try parsing a feed with:
<pubDate>Thu, 10 Nov 2016 19:41:48 GMT</pubDate>
Expected behavior
Error on parsing the episode and presumably the whole feed (assuming the feed is consistent on date format)
Additional context
Patch below provides a quick fix.
Some notes:
- Episode Date is now of type string, not an issue as it was printed with the original rss.Layout anyway
- Not so great error handling if none of the specs are available
- Somewhat more robust sorting of feed dates in case the order is completely mixed instead of reversed
diff --git a/pcd.go b/pcd.go
index 73773b2..a5bb20d 100644
--- a/pcd.go
+++ b/pcd.go
@@ -11,7 +11,6 @@ import (
"os"
"path/filepath"
"strings"
- "time"
"github.com/kvannotten/pcd/rss"
"github.com/pkg/errors"
@@ -33,7 +32,7 @@ type Podcast struct {
type Episode struct {
Title string
- Date time.Time
+ Date string
URL string
Length int
}
@@ -158,7 +157,7 @@ func (p *Podcast) String() string {
title = fmt.Sprintf("%s...", episode.Title[0:(titleLength-4)])
}
formatStr := fmt.Sprintf("%%-4d %%-%ds %%20s\n", tl)
- sb.WriteString(fmt.Sprintf(formatStr, index+1, title, episode.Date.Format(rss.Layout)))
+ sb.WriteString(fmt.Sprintf(formatStr, index+1, title, episode.Date))
}
return sb.String()
@@ -217,14 +216,10 @@ func parseEpisodes(content io.Reader) ([]Episode, error) {
var episodes []Episode
for _, item := range feed.Channel.Items {
- t, err := time.Parse(rss.Layout, item.Date.Date)
- if err != nil {
- log.Printf("Could not parse episode: %#v", err)
- continue
- }
+
episode := Episode{
Title: item.Title.Title,
- Date: t,
+ Date: item.Date.Date,
URL: item.Enclosure.URL,
Length: item.Enclosure.Length,
}
diff --git a/rss/parser.go b/rss/parser.go
index 17774f2..565ccec 100644
--- a/rss/parser.go
+++ b/rss/parser.go
@@ -6,6 +6,7 @@ import (
"io"
"io/ioutil"
"log"
+ "sort"
"time"
)
@@ -84,20 +85,22 @@ func Parse(content io.Reader) (*PodcastFeed, error) {
return &feed, nil
}
-const Layout string = "Mon, _2 Jan 2006 15:04:05 -0700"
+func stringToDate(d string) time.Time {
+ var t time.Time
+ var err error
-func sortFeedByDate(feed *PodcastFeed) {
- if len(feed.Channel.Items) < 1 {
- return
+ t, err = time.Parse(time.RFC1123, d)
+ if err != nil {
+ t, _ = time.Parse(time.RFC1123Z, d)
}
+ return t
+}
- firstDate, _ := time.Parse(Layout, feed.Channel.Items[0].Date.Date)
- lastDate, _ := time.Parse(Layout, feed.Channel.Items[len(feed.Channel.Items)-1].Date.Date)
+func sortFeedByDate(feed *PodcastFeed) {
+ sort.Slice(feed.Channel.Items, func(i, j int) bool {
+ d1 := stringToDate(feed.Channel.Items[i].Date.Date)
+ d2 := stringToDate(feed.Channel.Items[j].Date.Date)
- if firstDate.After(lastDate) {
- // reverse the feed
- for i, j := 0, len(feed.Channel.Items)-1; i < j; i, j = i+1, j-1 {
- feed.Channel.Items[i], feed.Channel.Items[j] = feed.Channel.Items[j], feed.Channel.Items[i]
- }
- }
+ return d2.After(d1)
+ })
}
Thanks for the report! You are right, my assumption is incorrect! Upon investigating a little more, I've also found documentation that RFC2822 might be used (here)
You're also right into having added a more sturdy sort! That's definitely a mistake on my part.
Would you mind putting this into a pull request? That way you can be properly credited + I might extend the stringToDate method a bit too to try even more 'standards'.