When coping from S3 to S3 an empty file is created for each folder copied
Spritefarmer opened this issue · 11 comments
/example/file01.txt
/example/subfolder/file02.txt
example is a folder that contains a file named file01.txt and a folder named subfolder that contains a file named file02.txt
to copy the contents of example to newfolder run this
# s3cp "s3://bucket/example/" "s3://bucket/newfolder/" -r
new folder will now contain two file (file01.txt and a 0 byte file named subfolder ) and a folder called subfolder that contains a file named file02.txt
Not sure why an empty file is created with the name of the folder. It doesn't seem to effect anything just looks out of place.
Sorry, I can't reproduce this.
Here's a sample session I just ran,
boisvert@honeybrown:~$ s3ls bizo-dev:alex/example/
s3://bizo-dev/alex/example/file01.txt
s3://bizo-dev/alex/example/subfolder/file02.txt
boisvert@honeybrown:~$ s3cp -r bizo-dev:alex/example/ bizo-dev:alex/example2
Copy s3://bizo-dev/alex/example/file01.txt to s3://bizo-dev/alex/example2/file01.txt
Copy s3://bizo-dev/alex/example/subfolder/file02.txt to s3://bizo-dev/alex/example2/subfolder/file02.txt
boisvert@honeybrown:~$ s3ls bizo-dev:alex/example2
s3://bizo-dev/alex/example2/file01.txt
s3://bizo-dev/alex/example2/subfolder/file02.txt
Here is the screen grab from the AWS Console showing the empty file that was create with the same name as the folder.
Sorry, the screen grab doesn't help.
How can I reproduce this using s3cp tools? Can you include the output of s3cp? (s3cp outputs one line for every "copy" it makes so it should output a line for the empty file as well...)
It looks like these files can only be seen in the AWS Console ( https://console.aws.amazon.com ). They are not listed with s3ls tool. BUT maybe they are?
Below is a test I did creating a bunch of files and subfolders in a "from" folder. Listing them with the s3ls tool. Coping them to a "to" folder and then listing the contents to see what the differences would be. It looks like the subfolders in the "to" copied directory don't have a trailing forward slash. They are seen as files instead of folder?
robbinsn-osx:~ nrobbins$ s3ls "s3://s3cp_example/from/"
s3://s3cp_example/from/
s3://s3cp_example/from/file.txt
s3://s3cp_example/from/test01/
s3://s3cp_example/from/test01/file.txt
s3://s3cp_example/from/test01/test02/
s3://s3cp_example/from/test01/test02/file.txt
s3://s3cp_example/from/test01/test02/test03/
s3://s3cp_example/from/test01/test02/test03/file.txt
robbinsn-osx:~ nrobbins$ s3cp "s3://s3cp_example/from/" "s3://s3cp_example/to/" -r
Copy s3://s3cp_example/from/ to s3://s3cp_example/to/
Copy s3://s3cp_example/from/file.txt to s3://s3cp_example/to/file.txt
Copy s3://s3cp_example/from/test01/ to s3://s3cp_example/to/test01
Copy s3://s3cp_example/from/test01/file.txt to s3://s3cp_example/to/test01/file.txt
Copy s3://s3cp_example/from/test01/test02/ to s3://s3cp_example/to/test01/test02
Copy s3://s3cp_example/from/test01/test02/file.txt to s3://s3cp_example/to/test01/test02/file.txt
Copy s3://s3cp_example/from/test01/test02/test03/ to s3://s3cp_example/to/test01/test02/test03
Copy s3://s3cp_example/from/test01/test02/test03/file.txt to s3://s3cp_example/to/test01/test02/test03/file.txt
robbinsn-osx:~ nrobbins$ s3ls "s3://s3cp_example/to/"
s3://s3cp_example/to/
s3://s3cp_example/to/file.txt
s3://s3cp_example/to/test01
s3://s3cp_example/to/test01/file.txt
s3://s3cp_example/to/test01/test02
s3://s3cp_example/to/test01/test02/file.txt
s3://s3cp_example/to/test01/test02/test03
s3://s3cp_example/to/test01/test02/test03/file.txt
I'm guessing that following mean test01 is a folder that contains a file called file.txt
s3://s3cp_example/from/test01/
s3://s3cp_example/from/test01/file.txt
...and this mean test01 is a file AND test01 is a folder that contains a file called file.txt. So this is actually three object not two.
s3://s3cp_example/to/test01
s3://s3cp_example/to/test01/file.txt
I seem to have a similar issue when I do:
s3cp -r s3://test.beta/uploads .
Copy s3://test.beta/uploads/ to /Users/busker/tmp/uploads
uploads: 100% |oooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooo| 0B 0B/s Time: 0:00:00
s3cp: [RuntimeError] Destination path is not a directory: /Users/busker/tmp/uploads
Basically s3cp tries to download a file instead of creates a directory.
Gerd
I think a file named /Users/busker/tmp/uploads preexisted running this command -- it may have been created by a previous command you ran? Can you remove this file and try running "s3cp -r s3://test.beta/uploads ." again?
Nah, it's actually s3cp which creates the file that should be a directory
It's even better when I run (with a slash at the end):
s3cp -r s3://test.beta/uploads/ .
Then it creates a file called "uploads" and a file with the name of the first dir in my remote uploads dir
I'm happy to have a good dig in the code this weekend but don't have the time right now.
1.1.26 is the latest one right? I admit I'm using the gem and haven't yet pulled it from github.
I'm using this on OSX 10.8.1
Gerd.
Ok, I'll take another look today. Can you confirm you are using the latest release? (1.1.26)
I'm using the latest 1.1.26 gem (confirmed with s3cp --version)
Hmmmm... can't reproduce. Is it possible you have a fake/zero-length/marker file indicating a directory on S3? Maybe you can provide the output of:
% s3ls -l s3://test.beta/uploads
Please reopen if you have a reproductible case.