dain/snappy

Framed IO doesn't work with DataInputStream/DataOutputStream

lwhite1 opened this issue · 2 comments

Hi @dain

Sorry if I'm doing something stupid, but I can't see what it is.

I'm trying to write compressed longs to a file using java.io.DataOutputStream. The write seems to work, but the read fails without an exception. For example, in the test below, the snappy version produces no output when reading, but the plain buffered IO version does:

  @Test 
  public void testSnappy() throws Exception {

    System.out.println("With snappy:");

    String fileName = "foob.bar";
    try (FileOutputStream fos = new FileOutputStream(fileName);
         SnappyFramedOutputStream sos = new SnappyFramedOutputStream(fos);
         DataOutputStream dos = new DataOutputStream(sos)) {
      for(long i = 0; i < 3; i++) {
        dos.writeLong(i);
        System.out.println(i);
      }
      dos.flush();
    }

    try (FileInputStream fis = new FileInputStream(fileName);
         SnappyFramedInputStream sis = new SnappyFramedInputStream(fis, true);
         DataInputStream dis = new DataInputStream(sis)) {
      while(dis.available() > 0) {
        long cell = dis.readLong();
        System.out.println(cell);
      }
    }

    System.out.println("Now with bufferedIO and no snappy:");
    fileName = "foob.bar1";
    try (FileOutputStream fos = new FileOutputStream(fileName);
         BufferedOutputStream sos = new BufferedOutputStream(fos);
         DataOutputStream dos = new DataOutputStream(sos)) {
      for(long i = 0; i < 3; i++) {
        dos.writeLong(i);
        System.out.println(i);
      }
      dos.flush();
    }

    try (FileInputStream fis = new FileInputStream(fileName);
         BufferedInputStream sis = new BufferedInputStream(fis);
         DataInputStream dis = new DataInputStream(sis)) {
      while(dis.available() > 0) {
        long cell = dis.readLong();
        System.out.println(cell);
      }
    }
  }

In the Snappy version, it produces no output because the call to available() returns 0 as the values of both position and valid are 0 in AbstractSnappyInputStream.

Calling .available is not appropriate to loop over. The javadoc[1] is pretty vague about what this means. The framed implementation here returns how much data is available without having to perform additional decompression.

[1] - https://docs.oracle.com/javase/8/docs/api/java/io/InputStream.html#available--

Thanks for the clear explanation and the quick reply.