might not be bug: false duplication report due to use of relative path instead of absolute path
MUCZ opened this issue · 6 comments
to reproduce this, here is my folder
├── test
│ └── f
│ ├── f1
│ │ ├── f1.proto
│ │ └── f2.proto
│ └── f2
│ └── f2.proto
with f1/f1.proto
syntax = "proto3";
package test;
option go_package = "test/f1";
message Msg {
uint64 msg =1;
}
f1/f2.proto
syntax = "proto3";
package test;
option go_package = "test/f1";
import "test/f/f2/f2.proto";
message Msg3 {
uint64 msg =1;
Msg2 msg2 = 2;
}
and f2/f2.proto
syntax = "proto3";
package test;
option go_package = "test/f2";
import "test/f/f1/f1.proto";
message Msg2 {
uint64 msg =1;
Msg msg1 = 2;
}
when i tried to parse the protos with code
package main
import (
"fmt"
"github.com/jhump/protoreflect/desc/protoparse"
)
func main() {
fmt.Println("")
dashIs := []string{
"./",
"./test",
}
parser := protoparse.Parser{
IncludeSourceCodeInfo: true,
ImportPaths: dashIs,
InferImportPaths: true,
WarningReporter: func(e protoparse.ErrorWithPos) {
fmt.Println("WarningReporter e:", e)
},
}
files := []string{
"f/f1/f1.proto",
"f/f1/f2.proto",
}
fds, err := parser.ParseFiles(files...)
if err != nil {
panic(err)
}
if len(files) != len(fds) {
panic("len(files) != len(fds)")
}
}
there will show error:
panic: test/f/f1/f1.proto:8:9: symbol "test.Msg" already defined at f/f1/f1.proto:8:9
this can be reproduce with github.com/jhump/protoreflect v1.15.1
but there is in fact no duplication, so this is a false error report.
root cause
I dug a little bit into the implementation, here's my understanding
the dependency relation is f1/f2.proto-> f2/f2.proto->f1/f1.proto, and i'm trying to parse f1/f1.proto & f1/f2.proto
the parser parse f/f1/f1.proto, f1/f2.proto, f2/f2.proto, f1/f1.proto in order, the f/f1/f1.proto will be parsed twice
and the first parse result of f1/f1.proto will have name 'f1/f1.proto', as i passed in the main.go code
(the second parse happens because the parser is recursive, it will try to parse all the imports after it parsed the target file.)
the second parse result of f1/f1.proto will have name 'f/f1/f1.proto', cause this's the way it get imported.
the parser and the linker don't know they are the same file as no absolute path are stored in any of the data structure, they all use relative paths in data structures
fix suggestion
save somehow the absolute paths of the files during the parsing and/or in the parse result, so that either
- one file will not be read and parsed twice,
- or one file might be read twice like this example, but when the linker links them, the linker knows that they are the same file and do not report duplication definition errors
when i run protoc -I.. -I. --go_out=. f/f1/*.proto
inside folder test, i found the same error, so it turns out that this is not a problem of protoreflect/parse but of the protoc design. it might not be the right place but i do want to ask this question: why not store the absolute path?
an interesting thing is that i can parse "f/f1/f1.proto" or "f/f1/f2.proto" singly with no problem, using the main.go above or with protoc
, both work, but i can not parse both at the same time... i can understand how but not why it's designed this way...
The file's relative path is part of its identity and is also used for reflection purposes at runtime. For this reason, the absolute path is not suitable, because then the file's identity could change based on the environment in which it was compiled -- one machine could have the repo checked out into a different directory, so the absolute path is completely different from another user. Since the paths are baked into the resulting generated code, this would lead to runtime problems. For more information, check out this page: https://buf.build/docs/reference/protobuf-files-and-packages#imports
The protoparse
package in this repo does actually provide a flag that may fix what you are seeing, though it is not a feature supported by protoc
. On the Parser
, set the InferImportPaths
field to true, and it will try to rewrite imports so that file names and imports all match, even if you use different relative paths to refer to the same file.
The file's relative path is part of its identity and is also used for reflection purposes at runtime. For this reason, the absolute path is not suitable, because then the file's identity could change based on the environment in which it was compiled -- one machine could have the repo checked out into a different directory, so the absolute path is completely different from another user. Since the paths are baked into the resulting generated code, this would lead to runtime problems. For more information, check out this page: https://buf.build/docs/reference/protobuf-files-and-packages#imports
The
protoparse
package in this repo does actually provide a flag that may fix what you are seeing, though it is not a feature supported byprotoc
. On theParser
, set theInferImportPaths
field to true, and it will try to rewrite imports so that file names and imports all match, even if you use different relative paths to refer to the same file.
Thank you for your reply! The doc you referred is really helpful, I was facing exactly what it describes:
If you're using protoc, this is an easy mistake to make: the flexibility of the command-line interface actually lends itself to doing this and ending up compiling with mismatching paths.
The introduce of InferImportPaths
and it 'rewriting imports paths to fix mismatching ones' sounds a good idea to me, though it didn't work for this case, I'll look into it.
The introduce of InferImportPaths and it 'rewriting imports paths to fix mismatching ones' sounds a good idea to me, though it didn't work for this case, I'll look into it.
@MUCZ, it is possible that the
InferImportPaths
functionality might now work for your case, as of #575.
That's great! Thank you for the fix! I think this can be closed now.