Checkmarx/2ms

Improve filesystem scanning with parallelism

baruchiro opened this issue · 1 comments

To scan with the filesystem plugin, we first walk over the whole directory tree, collect all the file names, and then loop over them.

err := filepath.Walk(p.Path, func(path string, fInfo os.FileInfo, err error) error {
if err != nil {
log.Fatal().Err(err).Msg("error while walking through the directory")
}
for _, ignoredFolder := range ignoredFolders {
if fInfo.Name() == ignoredFolder && fInfo.IsDir() {
return filepath.SkipDir
}
}
for _, ignoredPattern := range p.Ignored {
matched, err := filepath.Match(ignoredPattern, filepath.Base(path))
if err != nil {
return err
}
if matched && fInfo.IsDir() {
return filepath.SkipDir
}
if matched {
return nil
}
}
if fInfo.Size() == 0 {
return nil
}
if !fInfo.IsDir() {
fileList = append(fileList, path)
}
return err
})

Instead, we can immediately handle every file we find, by sending it into a channel or launching a new gorutine.

(I'm not sure what will be the best approach. I think maybe launching a new routine for each file will be the most parallelism degree, but maybe to control all the routines and wait for them, we need to use a channel from the walk function to one goroutine that will trigger all the goroutines per file)

Hi, please assign for me the issue.