hillu/go-yara

The problem of scanning the file with Chinese filename

N0body007 opened this issue · 14 comments

Hello, I found a problem that go-yara can't scan the file with Chinese filename. I have tried the yara of new version,yara v4.2.2,on my windows 10 with Chinese char set machine,but when I use go-yara to scan the file with Chinese filename,such as 123.txt and 123测试.txt, it doesn't work. Could you help me out?

hillu commented

Sure. Could you please provide a small Go program that fails to scan your file using a dummy ruleset? I am interested to see what exact function calls you are using.

Oh, sorry, I forgot to provide the demonstration program. When I change the target file name to Chinese filename,it goes wrong. The content of target file(123.txt) is "abc".

package main
import (
	yara "github.com/hillu/go-yara"
	"fmt"
)
func main() {
	rule := "rule test : tag1 { strings: $a = \"abc\"  condition: $a }"
	compiler, err := yara.NewCompiler()
	if compiler == nil || err != nil {
		return
	}
	if err = compiler.AddString(rule, ""); err != nil {
		return
	}
    rules, err := compiler.GetRules()
    if err != nil {
	return
	}
	s, err := yara.NewScanner(rules)
	if err != nil {
		return
	}
	targetFile:="123.txt"
	//targetFile:="123测试.txt"
	var m yara.MatchRules
	if err := s.SetCallback(&m).ScanFile(targetFile); err != nil {
		fmt.Println("ScanFile err :",err)
		return
	} else {
       fmt.Printf("Matches: %+v", m)
	}

}
hillu commented

Great. What error are you getting when scanning the file that contains Chinese characters?

The error is “could not open file” from the source code of yara. You can try the code above. When the filename is 123.txt, the result is nomal. But when the filename contains Chinese characters, it will get error.

ozanh commented

AFAIK, yara 4.1 does not support unicode file names. You can open the file with Go’s os.Open and provide file descriptor to go-yara to scan. I did it like this and solved it before.

hillu commented

@N0body007 what @ozanh said is correct. Go's internal filename representation is UTF-8 which works for your Chinese characters just fine. However, the filename is then passed as a C string to yr_scanner_scan_file, yr_filemap_map, yr_filemap_map_ex where it is finally passed as-is to the CreateFileA API. The A stands for ASCII and the UTF-8 multibyte representations of your non-ASCII characters will likely be misinterpreted … the file is simply not found.

I suggest the following change:

s.SetCallback(&m)
f, err := os.Open(targetFile)
if err != nil {
    fmt.Println("Open: err: ", err)
    return
}
defer f.Close()
if err := s.ScanFileDescriptor(f.Fd()); err != nil {
    fmt.Println("ScanFileDescriptor: err: ", err)
    return
} else {
    fmt.Printf("Matches: %+v", m)
}

AFAIK, yara 4.1 does not support unicode file names. You can open the file with Go’s os.Open and provide file descriptor to go-yara to scan. I did it like this and solved it before.

Thank you. Maybe now I understand where the reason is.

@N0body007 what @ozanh said is correct. Go's internal filename representation is UTF-8 which works for your Chinese characters just fine. However, the filename is then passed as a C string to yr_scanner_scan_file, yr_filemap_map, yr_filemap_map_ex where it is finally passed as-is to the CreateFileA API. The A stands for ASCII and the UTF-8 multibyte representations of your non-ASCII characters will likely be misinterpreted … the file is simply not found.

I suggest the following change:

s.SetCallback(&m)
f, err := os.Open(targetFile)
if err != nil {
    fmt.Println("Open: err: ", err)
    return
}
defer f.Close()
if err := s.ScanFileDescriptor(f.Fd()); err != nil {
    fmt.Println("ScanFileDescriptor: err: ", err)
    return
} else {
    fmt.Printf("Matches: %+v", m)
}

Thank you. It really works. But I want to solve this problem by changing the source code of yara. Do you have any good suggestion?

hillu commented

@N0body007 There is an issue open in YARA: VirusTotal/yara#1487.

A workaround similar to what I described above has been implemented in yara.c (the command line program), but there hasn't been any feedback whether this works or not. Note that currently, only the builds done with Visual Studio would be affected by this fix.

Or do you want to fix this in the libyara API?

Yeah, I want to fix this in the libyara API.

hillu commented

Yeah, I want to fix this in the libyara API.

OK, in that case we ought to continue the conversation in a new issue within the YARA project. Would you like to open an issue there?

Hmm, I'll solve it by myself first, and then I'll open a new issue within the YARA project if I can't. Thank you.

hillu commented

Hmm, I'll solve it by myself first, and then I'll open a new issue within the YARA project if I can't. Thank you.

My advice: The issue, at its core, is simple to fix: In yr_filemap_map_ex, replace the CreateFileA with CreateFileW. Assume that the string that got passed in is encoded as UTF-8; convert that to UTF-16 so it can be used by CreateFileW. I am pretty sure that a pull request that implements this change would be accepted.

But since this should really be discussed as a YARA issue (or pull request), so I am closing this issue.

I get it. Thank you.