Netflix/suro

Bug Found when using hadoop with localFileSink

Closed this issue · 2 comments

with hadoop's configuration file in classpath, even the nested localFileSink will using hadoop's remote file system.

when deep into the code ,i found there has no judge on whether to use localFileSystem or not.

the FileWriterBase's constructor should be changed by adding a localFileSystem flag to control this.

    public FileWriterBase(String codecClass, Logger log, Configuration conf,Boolean localFileSystem) {
        this.conf = conf;

        try {
            if(localFileSystem == null) localFileSystem = false;
            fs = localFileSystem ? FileSystem.getLocal(conf) : FileSystem.get(conf);
            fs.setVerifyChecksum(false);
            if (codecClass != null) {
                codec = createCodecInstance(codecClass);
                log.info("Codec:" + codec.getDefaultExtension());
            } else {
                codec = null;
            }
        } catch (Exception e) {
            throw new RuntimeException(e);
        }
    }

i'll give a pull request for this ASAP.

FileWriterBase's constructor is called from TextFileWriter constructor or SequenceFileWriter constructor. Configuration conf is created by new Configuration() which should denote its file system as the local one by default. If this is still pointing to HDFS file system, that's what I have missed.

Suro does not need to run with remote file system directly without any reason. So, instead of localFileSystem boolean flag, you can feel free to send PR with the fix FileSystem.getLocal(conf).

thanks for your review, FileSystem.getLocal(conf) will be clear if Suro does not use remote file system.