grafov/m3u8

move away from RegExp

3052 opened this issue · 0 comments

3052 commented

I like this package because it has no third party imports:

m3u8/go.mod

Lines 1 to 3 in 5abb127

module github.com/grafov/m3u8
go 1.12

but I noticed today the parsing is based on regular expression:

m3u8/reader.go

Line 25 in 5abb127

var reKeyValue = regexp.MustCompile(`([a-zA-Z0-9_-]+)=("[^"]+"|[^",]+)`)

using this file:

package hls

import (
   "regexp"
   "strings"
   "testing"
   "text/scanner"
   "unicode"
)

const media = `#EXT-X-MEDIA:TYPE=AUDIO,GROUP-ID="audio",NAME="English",LANGUAGE="eng",DEFAULT=YES,AUTOSELECT=YES,URI="QualityLevels(192000)/Manifest(audio_eng_aacl,format=m3u8-aapl,filter=desktop)"`

var reKeyValue = regexp.MustCompile(`([a-zA-Z0-9_-]+)=("[^"]+"|[^",]+)`)

func Benchmark_RegExp(b *testing.B) {
   for n := 0; n < b.N; n++ {
      _ = reKeyValue.FindAllStringSubmatch(media, -1)
   }
}

func Benchmark_Scanner(b *testing.B) {
   var s scanner.Scanner
   s.IsIdentRune = func(r rune, i int) bool {
      return r == '-' || unicode.IsLetter(r)
   }
   for n := 0; n < b.N; n++ {
      s.Init(strings.NewReader(media))
      for s.Scan() != scanner.EOF {
         _ = s.TokenText()
      }
   }
}

I get this result:

goos: windows
goarch: amd64
cpu: Intel(R) Core(TM) i7-8700 CPU @ 3.20GHz
Benchmark_RegExp-12               282069   4286 ns/op   914 B/op   15 allocs/op
Benchmark_Scanner-12              858847   1491 ns/op   224 B/op   16 allocs/op

so the RegExp option is using 4 times the memory, while the scanner option is double to triple the speed. its possible other options are better as well, the above implementation is just what I came up with.