dgryski/semgrep-go

yaml: errorprone with []byte

ainar-g opened this issue · 3 comments

Consider the code:

package main

import (
	"encoding/json"
	"fmt"
	"strings"

	"gopkg.in/yaml.v2"
)

type T struct {
	B []byte `json:"b" yaml:"b"`
}

const jsonData = `{"b":"aGVsbG8K"}`

const yamlData = `b: aGVsbG8K`

func main() {
	var err error
	var t T

	err = json.NewDecoder(strings.NewReader(jsonData)).Decode(&t)
	fmt.Printf("%v %v\n", err, t)

	t = T{}
	err = yaml.NewDecoder(strings.NewReader(yamlData)).Decode(&t)
	fmt.Printf("%v %v\n", err, t)
}

Here, the programmer assumed that []byte fields in gopkg.in/yaml.v2 behave the same way as in encoding/json. But they don't:

<nil> {[104 101 108 108 111 10]}
yaml: unmarshal errors:
  line 1: cannot unmarshal !!str `aGVsbG8K` into []uint8 {[]}

It seems like you still can use []byte with that module, but only if you actually use a YAML array, which is probably not something most people want:

b:
- 104
- 101
- 108
- 108
- 111
- 10

@disconnect3d, I think it might work in some libraries if they parse YAML 1.1 as opposed to YAML 1.2. In fact, the link to the !!binary type is for YAML 1.1, and YAML 1.2 has explicitly dropped it.

I'm not sure semgrep has enough type information to figure this out. Ruleguard might. If you can figure out a way to detect this with one of the tools, so ahead and open a PR.