facebook/rocksdb

Strange leak of open file handles in Java using BlobDB

vmv890 opened this issue · 3 comments

I am noticing a strange issue with open file handles for deleted blob files in java using rocksdbjni. I do not see this issue in 8.11.x but I am seeing this issue in 9.5.x and 9.6.x. Running lsof shows growing number to deleted blob files. I do not create those files, so I am not sure how to properly close them or if something in my API is supposed to close/clear them? Is this expected in the 9.x series vs 8.x ?

Tested in JDK 17 and 21

Expected behavior

0 file handles for deleted blob files (or at least not growing)

Actual behavior

Run lsof -p <PID> | grep deleted | wc -l to see open file handles grow in 9.5.x and 9.6.x but not 8.11.x

-- lsof showing handles to deleted files (not on disk) --
java ... /data/testLeakingFileHandles/000106.blob (deleted)
java ... /data/testLeakingFileHandles/000088.blob (deleted)
java ... /data/testLeakingFileHandles/000101.blob (deleted)
java ... /data/testLeakingFileHandles/000095.blob (deleted)

Steps to reproduce the behavior

package test;

import org.rocksdb.Options;
import org.rocksdb.RocksDB;
import org.rocksdb.RocksDBException;

import java.lang.management.ManagementFactory;
import java.util.Random;
import java.util.concurrent.LinkedBlockingQueue;
import java.util.concurrent.ThreadPoolExecutor;
import java.util.concurrent.TimeUnit;

public class RocksDeletedFileHandleTest {

    static {
        RocksDB.loadLibrary();
    }

    public static void main(String[] argv) {
        var random = new Random();
        var dbPath = "/tmp/testLeakingFileHandles"; // <-- Use whatever directory works on your system with enough space

        var dbQueryThread = new ThreadPoolExecutor(1, 1, 100L, TimeUnit.MILLISECONDS, new LinkedBlockingQueue<>(10));
        try (var opts = new Options().setCreateIfMissing(true).setEnableBlobFiles(true).setMinBlobSize(0).setBlobFileSize(1024 * 1024); // 1mb
             var db = RocksDB.open(opts, dbPath)) {

            System.out.println("----------- Running Test - PID: " + ManagementFactory.getRuntimeMXBean().getName().replace("@", " Host: "));

            // Run `lsof -p <PID> | grep deleted | wc -l` to see open file handles grow in 9.6.1 but not 8.11.4

            // -- Run queries in background thread
            dbQueryThread.submit(() -> {
                while (true) {
                    try {
                        String randomKeyToQuery = "key." + random.nextInt(1_000_000);
                        db.get(randomKeyToQuery.getBytes());
                        Thread.sleep(100);
                    } catch (Exception e) {
                        System.out.println("----------- Exiting due to Error: " + e.getMessage());
                        return;
                    }
                }
            });

            // -- Insert Data in a loop
            for (int loop = 0; loop < 1_000_000; loop++) {
                long start = System.currentTimeMillis();
                for (int k = 0; k <= 1_000_000; k++) {
                    db.put(("key." + k).getBytes(), ("value." + k).getBytes());
                }
                System.out.println("----------- Inserted 1M keys in " + ((System.currentTimeMillis() - start) / 1000) + " seconds");
            }

            dbQueryThread.shutdown();

        } catch (RocksDBException ex) {
            ex.printStackTrace();
        }
    }
}

Hi @vmv890 - thanks for the report. I don't think this should be expected. I bisected it, and it looks like it was introduced in 9.4.0 , by the commit b34cef5

@pdillinger I presume this is not intended behaviour of the change. Do you think increasing uncache_aggressiveness would mitigate it ? We could think about adding it to the Java API..

The immediate cause of the issue is most likely the change to VersionSet::AddObsoleteBlobFile. Blob files do live in the same file cache (confusingly still called TableCache) as SST files because they are subject to the combined max_open_files limit. Cc @pdillinger

EDIT: Or rather, TableCache and BlobFileCache use the same cache under the hood.

Fixed in v9.8.1, v9.7.4, v9.6.2