The problem of scanning the file with Chinese filename
N0body007 opened this issue · 14 comments
Hello, I found a problem that go-yara can't scan the file with Chinese filename. I have tried the yara of new version,yara v4.2.2,on my windows 10 with Chinese char set machine,but when I use go-yara to scan the file with Chinese filename,such as 123.txt and 123测试.txt, it doesn't work. Could you help me out?
Sure. Could you please provide a small Go program that fails to scan your file using a dummy ruleset? I am interested to see what exact function calls you are using.
Oh, sorry, I forgot to provide the demonstration program. When I change the target file name to Chinese filename,it goes wrong. The content of target file(123.txt) is "abc".
package main
import (
yara "github.com/hillu/go-yara"
"fmt"
)
func main() {
rule := "rule test : tag1 { strings: $a = \"abc\" condition: $a }"
compiler, err := yara.NewCompiler()
if compiler == nil || err != nil {
return
}
if err = compiler.AddString(rule, ""); err != nil {
return
}
rules, err := compiler.GetRules()
if err != nil {
return
}
s, err := yara.NewScanner(rules)
if err != nil {
return
}
targetFile:="123.txt"
//targetFile:="123测试.txt"
var m yara.MatchRules
if err := s.SetCallback(&m).ScanFile(targetFile); err != nil {
fmt.Println("ScanFile err :",err)
return
} else {
fmt.Printf("Matches: %+v", m)
}
}
Great. What error are you getting when scanning the file that contains Chinese characters?
The error is “could not open file” from the source code of yara. You can try the code above. When the filename is 123.txt, the result is nomal. But when the filename contains Chinese characters, it will get error.
AFAIK, yara 4.1 does not support unicode file names. You can open the file with Go’s os.Open and provide file descriptor to go-yara to scan. I did it like this and solved it before.
@N0body007 what @ozanh said is correct. Go's internal filename representation is UTF-8 which works for your Chinese characters just fine. However, the filename is then passed as a C string to yr_scanner_scan_file
, yr_filemap_map
, yr_filemap_map_ex
where it is finally passed as-is to the CreateFileA
API. The A stands for ASCII and the UTF-8 multibyte representations of your non-ASCII characters will likely be misinterpreted … the file is simply not found.
I suggest the following change:
s.SetCallback(&m)
f, err := os.Open(targetFile)
if err != nil {
fmt.Println("Open: err: ", err)
return
}
defer f.Close()
if err := s.ScanFileDescriptor(f.Fd()); err != nil {
fmt.Println("ScanFileDescriptor: err: ", err)
return
} else {
fmt.Printf("Matches: %+v", m)
}
AFAIK, yara 4.1 does not support unicode file names. You can open the file with Go’s os.Open and provide file descriptor to go-yara to scan. I did it like this and solved it before.
Thank you. Maybe now I understand where the reason is.
@N0body007 what @ozanh said is correct. Go's internal filename representation is UTF-8 which works for your Chinese characters just fine. However, the filename is then passed as a C string to
yr_scanner_scan_file
,yr_filemap_map
,yr_filemap_map_ex
where it is finally passed as-is to theCreateFileA
API. The A stands for ASCII and the UTF-8 multibyte representations of your non-ASCII characters will likely be misinterpreted … the file is simply not found.I suggest the following change:
s.SetCallback(&m) f, err := os.Open(targetFile) if err != nil { fmt.Println("Open: err: ", err) return } defer f.Close() if err := s.ScanFileDescriptor(f.Fd()); err != nil { fmt.Println("ScanFileDescriptor: err: ", err) return } else { fmt.Printf("Matches: %+v", m) }
Thank you. It really works. But I want to solve this problem by changing the source code of yara. Do you have any good suggestion?
@N0body007 There is an issue open in YARA: VirusTotal/yara#1487.
A workaround similar to what I described above has been implemented in yara.c (the command line program), but there hasn't been any feedback whether this works or not. Note that currently, only the builds done with Visual Studio would be affected by this fix.
Or do you want to fix this in the libyara API?
Yeah, I want to fix this in the libyara API.
Yeah, I want to fix this in the libyara API.
OK, in that case we ought to continue the conversation in a new issue within the YARA project. Would you like to open an issue there?
Hmm, I'll solve it by myself first, and then I'll open a new issue within the YARA project if I can't. Thank you.
Hmm, I'll solve it by myself first, and then I'll open a new issue within the YARA project if I can't. Thank you.
My advice: The issue, at its core, is simple to fix: In yr_filemap_map_ex
, replace the CreateFileA
with CreateFileW
. Assume that the string that got passed in is encoded as UTF-8; convert that to UTF-16 so it can be used by CreateFileW
. I am pretty sure that a pull request that implements this change would be accepted.
But since this should really be discussed as a YARA issue (or pull request), so I am closing this issue.
I get it. Thank you.