jhump/protoreflect

might not be bug: false duplication report due to use of relative path instead of absolute path

MUCZ opened this issue · 6 comments

MUCZ commented

to reproduce this, here is my folder

├── test
│   └── f
│       ├── f1
│       │   ├── f1.proto
│       │   └── f2.proto
│       └── f2
│           └── f2.proto

with f1/f1.proto

syntax = "proto3";

package test;

option go_package = "test/f1";


message Msg {
    uint64 msg =1;
}

f1/f2.proto


syntax = "proto3";

package test;

option go_package = "test/f1";

import "test/f/f2/f2.proto";

message Msg3 {
    uint64 msg =1;
    Msg2 msg2 = 2;
}

and f2/f2.proto

syntax = "proto3";

package test;

option go_package = "test/f2";

import "test/f/f1/f1.proto";

message Msg2 {
    uint64 msg =1;
    Msg msg1 = 2;
}

when i tried to parse the protos with code

package main

import (
	"fmt"

	"github.com/jhump/protoreflect/desc/protoparse"
)

func main() {
	fmt.Println("")
	dashIs := []string{
		"./",
		"./test",
	}
	parser := protoparse.Parser{
		IncludeSourceCodeInfo: true,
		ImportPaths:           dashIs,
		InferImportPaths:      true,
		WarningReporter: func(e protoparse.ErrorWithPos) {
			fmt.Println("WarningReporter e:", e)
		},
	}
	files := []string{
		"f/f1/f1.proto",
		"f/f1/f2.proto",
	}
	fds, err := parser.ParseFiles(files...)
	if err != nil {
		panic(err)
	}
	if len(files) != len(fds) {
		panic("len(files) != len(fds)")
	}
}

there will show error:

panic: test/f/f1/f1.proto:8:9: symbol "test.Msg" already defined at f/f1/f1.proto:8:9

this can be reproduce with github.com/jhump/protoreflect v1.15.1

but there is in fact no duplication, so this is a false error report.

root cause

I dug a little bit into the implementation, here's my understanding

the dependency relation is f1/f2.proto-> f2/f2.proto->f1/f1.proto, and i'm trying to parse f1/f1.proto & f1/f2.proto

the parser parse f/f1/f1.proto, f1/f2.proto, f2/f2.proto, f1/f1.proto in order, the f/f1/f1.proto will be parsed twice

and the first parse result of f1/f1.proto will have name 'f1/f1.proto', as i passed in the main.go code

(the second parse happens because the parser is recursive, it will try to parse all the imports after it parsed the target file.)
the second parse result of f1/f1.proto will have name 'f/f1/f1.proto', cause this's the way it get imported.

the parser and the linker don't know they are the same file as no absolute path are stored in any of the data structure, they all use relative paths in data structures

fix suggestion

save somehow the absolute paths of the files during the parsing and/or in the parse result, so that either

  • one file will not be read and parsed twice,
  • or one file might be read twice like this example, but when the linker links them, the linker knows that they are the same file and do not report duplication definition errors
MUCZ commented

when i run protoc -I.. -I. --go_out=. f/f1/*.proto inside folder test, i found the same error, so it turns out that this is not a problem of protoreflect/parse but of the protoc design. it might not be the right place but i do want to ask this question: why not store the absolute path?

MUCZ commented

an interesting thing is that i can parse "f/f1/f1.proto" or "f/f1/f2.proto" singly with no problem, using the main.go above or with protoc, both work, but i can not parse both at the same time... i can understand how but not why it's designed this way...

jhump commented

The file's relative path is part of its identity and is also used for reflection purposes at runtime. For this reason, the absolute path is not suitable, because then the file's identity could change based on the environment in which it was compiled -- one machine could have the repo checked out into a different directory, so the absolute path is completely different from another user. Since the paths are baked into the resulting generated code, this would lead to runtime problems. For more information, check out this page: https://buf.build/docs/reference/protobuf-files-and-packages#imports

The protoparse package in this repo does actually provide a flag that may fix what you are seeing, though it is not a feature supported by protoc. On the Parser, set the InferImportPaths field to true, and it will try to rewrite imports so that file names and imports all match, even if you use different relative paths to refer to the same file.

MUCZ commented

The file's relative path is part of its identity and is also used for reflection purposes at runtime. For this reason, the absolute path is not suitable, because then the file's identity could change based on the environment in which it was compiled -- one machine could have the repo checked out into a different directory, so the absolute path is completely different from another user. Since the paths are baked into the resulting generated code, this would lead to runtime problems. For more information, check out this page: https://buf.build/docs/reference/protobuf-files-and-packages#imports

The protoparse package in this repo does actually provide a flag that may fix what you are seeing, though it is not a feature supported by protoc. On the Parser, set the InferImportPaths field to true, and it will try to rewrite imports so that file names and imports all match, even if you use different relative paths to refer to the same file.

Thank you for your reply! The doc you referred is really helpful, I was facing exactly what it describes:

If you're using protoc, this is an easy mistake to make: the flexibility of the command-line interface actually lends itself to doing this and ending up compiling with mismatching paths.

The introduce of InferImportPaths and it 'rewriting imports paths to fix mismatching ones' sounds a good idea to me, though it didn't work for this case, I'll look into it.

jhump commented

The introduce of InferImportPaths and it 'rewriting imports paths to fix mismatching ones' sounds a good idea to me, though it didn't work for this case, I'll look into it.

@MUCZ, it is possible that the InferImportPaths functionality might now work for your case, as of #575.

MUCZ commented

The introduce of InferImportPaths and it 'rewriting imports paths to fix mismatching ones' sounds a good idea to me, though it didn't work for this case, I'll look into it.

@MUCZ, it is possible that the InferImportPaths functionality might now work for your case, as of #575.

That's great! Thank you for the fix! I think this can be closed now.