MichaelAquilina/S4

S4 Syncing Error

Closed this issue · 22 comments

Hi,
I not able to sync with s4. Initially it uploads all folders to s3 and then it start throwing error. Detail is below;

#s4 sync
Syncing testfold01 [/root/testfold/ <=> s3://s4synctest/testfold01/]
There was an error syncing 'testfold01': ('Unknown content type for index', 'application/octet-stream')

#s4 version
0.2.11

#pip3 -V
pip 1.5.4 from /usr/lib/python3/dist-packages (python 3.4)

#python3
Python 3.4.3 (default, Nov 28 2017, 16:41:13)

OS: Ubuntu 14.04.5 LTS

s4 is not upgrading with "pip3 install s4 -U" as it says all requirements are already up-to-date. It worked on Ubuntu 16 but my production servers are Ubuntu 14 and I have tested on 3 different Ubuntu 14 servers but same results.

Any suggestions to make it work? I appreciate your help. Thanks.

Hi @saqib01 could you run the following command and paste the output to this ticket?

s4 --log-level=DEBUG sync
DEBUG:local:159 Detected gzip encoding for reading index
INFO:sync_command:97 Syncing testfold01 [/root/testfold/ <=> s3://s4synctest/testfold01/]
DEBUG:local:74 Locking /root/testfold/.s4lock
DEBUG:sync:81 Generating deferred calls based on client states
DEBUG:local:40 Ignoring <DirEntry '.index'>
DEBUG:local:40 Ignoring <DirEntry '.s4lock'>
DEBUG:local:85 Releasing lock /root/testfold/.s4lock
ERROR:sync_command:105 ('Unknown content type for index', 'application/octet-stream')
Traceback (most recent call last):
  File "/usr/local/lib/python3.4/dist-packages/s4/commands/sync_command.py", line 101, in run
    dry_run=self.args.dry_run,
  File "/usr/local/lib/python3.4/dist-packages/s4/sync.py", line 37, in sync
    resolutions, unhandled_events = self.get_sync_states(keys)
  File "/usr/local/lib/python3.4/dist-packages/s4/sync.py", line 82, in get_sync_states
    for key, state_1, state_2 in self.get_states(keys):
  File "/usr/local/lib/python3.4/dist-packages/s4/sync.py", line 239, in get_states
    client_2_actions = self.client_2.get_all_actions()
  File "/usr/local/lib/python3.4/dist-packages/s4/clients/__init__.py", line 169, in get_all_actions
    index_local_timestamps = self.get_all_index_local_timestamps()
  File "/usr/local/lib/python3.4/dist-packages/s4/clients/s3.py", line 238, in get_all_index_local_timestamps
    return {key: value.get('local_timestamp') for key, value in self.index.items()}
  File "/usr/local/lib/python3.4/dist-packages/s4/clients/s3.py", line 92, in index
    self._index = self.load_index()
  File "/usr/local/lib/python3.4/dist-packages/s4/clients/s3.py", line 149, in load_index
    raise ValueError('Unknown content type for index', content_type)
ValueError: ('Unknown content type for index', 'application/octet-stream')

could you output the following from that folder?

file .index
zcat .index

also you mentioned you cant update s4. What is the output of s4 version?

#file .index
.index: gzip compressed data, was "tmpvgifms80", last modified: Wed Apr 11 08:23:10 2018, max compression

#zcat .index
{"abc": {"remote_timestamp": 1523434926, "local_timestamp": 1523434926.2841802}}

#s4 version
0.2.11

so your s4 version is definitely at the latest.

Potentially, the index stored on s3 is somehow corrupted.

Maybe I can add code that offers to clean the corrupted index if detected. Would you mind downloading the .index from your s3 bucket and running the same commands again on is? (file and zcat)

It is not showing proper outputs for downloaded .index file.

#file .index
.index: zlib compressed data

#zcat .index
gzip: .index: not in gzip format

That's basically the problem :)

somehow you have an index which is in the incorrect format. If its not important to you, I would just delete it and see if syncing works after that ;)

Okay I just deleted .index from s3. Rerun 's4 sync'. First time it worked fine and uploaded files to s3 with new .index file.
But after that it is again having same issue. Now running 's4 sync' again giving me same error.

Should both of .index files be the same? If I upload local .index file to s3 manually then should it work?

Should both of .index files be the same? If I upload local .index file to s3 manually then should it work?

Yes, but re-visiting the code, it seems like zlib is actually the expected format.

Could you try running this on the s3 index file?:

cat .index | openssl zlib -d

openssl:Error: 'zlib' is an invalid command.

Do I have to install any specific package/library for zlib on Ubuntu?

seems like ubuntu doesnt have that installed for some odd reason.

Try this in a python console instead:

In [1]: import zlib
In [2]: zlib.decompress(open('.index', 'rb').read())

Hope I did it right. Again it's error. Please see below;

#python3
Python 3.4.3 (default, Nov 28 2017, 16:41:13)
[GCC 4.8.4] on linux
Type "help", "copyright", "credits" or "license" for more information.

import zlib
zlib.decompress(open('.index', 'rb').read())
Traceback (most recent call last):
File "", line 1, in
zlib.error: Error -3 while decompressing data: incorrect header check

seems like the index is curropted then. Seeing as this a test folder. Could you try setup another test folder scratch and try syncing it?
Each time you do, please run with s4 --log-level=DEBUG sync

Okay now everything from scratch with new s3 bucket. First sync went successful but second is showing errors. See below;

#s4 targets
test002: [/root/test002 <=> s3://s4synctest002/test002]

#s4 --log-level=DEBUG sync
INFO:sync_command:97 Syncing test002 [/root/test002/ <=> s3://s4synctest002/test002/]
DEBUG:local:74 Locking /root/test002/.s4lock
DEBUG:sync:81 Generating deferred calls based on client states
DEBUG:local:40 Ignoring <DirEntry '.s4lock'>
DEBUG:sync:246 1 keys in total (1 for /root/test002/ and 0 for s3://s4synctest002/test002/)
DEBUG:sync:83 abc: SyncState<CREATED, local=2018-04-11 15:24:56, remote=None> SyncState<DOESNOTEXIST, local=None, remote=None>
DEBUG:sync:191 Action=Resolution<action=CREATE, to=s3://s4synctest002/test002/, from=/root/test002/, key=abc, timestamp=1523460296>
DEBUG:sync:40 There are 0 unhandled events for the user to solve
DEBUG:sync:43 There are 1 automatically resolved calls
DEBUG:sync:197 There are 1 total deferred calls
INFO:sync_command:125 Creating abc (/root/test002/ => s3://s4synctest002/test002/)
INFO:sync:229 Flushing Index to Storage
DEBUG:local:170 Using gzip encoding for writing index
DEBUG:s3:159 Using zlib encoding for writing index
DEBUG:local:85 Releasing lock /root/test002/.s4lock

#s4 --log-level=DEBUG sync
DEBUG:local:159 Detected gzip encoding for reading index
INFO:sync_command:97 Syncing test002 [/root/test002/ <=> s3://s4synctest002/test002/]
DEBUG:local:74 Locking /root/test002/.s4lock
DEBUG:sync:81 Generating deferred calls based on client states
DEBUG:local:40 Ignoring <DirEntry '.index'>
DEBUG:local:40 Ignoring <DirEntry '.s4lock'>
DEBUG:local:85 Releasing lock /root/test002/.s4lock
ERROR:sync_command:105 ('Unknown content type for index', 'application/octet-stream')
Traceback (most recent call last):
File "/usr/local/lib/python3.4/dist-packages/s4/commands/sync_command.py", line 101, in run
dry_run=self.args.dry_run,
File "/usr/local/lib/python3.4/dist-packages/s4/sync.py", line 37, in sync
resolutions, unhandled_events = self.get_sync_states(keys)
File "/usr/local/lib/python3.4/dist-packages/s4/sync.py", line 82, in get_sync_states
for key, state_1, state_2 in self.get_states(keys):
File "/usr/local/lib/python3.4/dist-packages/s4/sync.py", line 239, in get_states
client_2_actions = self.client_2.get_all_actions()
File "/usr/local/lib/python3.4/dist-packages/s4/clients/init.py", line 169, in get_all_actions
index_local_timestamps = self.get_all_index_local_timestamps()
File "/usr/local/lib/python3.4/dist-packages/s4/clients/s3.py", line 238, in get_all_index_local_timestamps
return {key: value.get('local_timestamp') for key, value in self.index.items()}
File "/usr/local/lib/python3.4/dist-packages/s4/clients/s3.py", line 92, in index
self._index = self.load_index()
File "/usr/local/lib/python3.4/dist-packages/s4/clients/s3.py", line 149, in load_index
raise ValueError('Unknown content type for index', content_type)
ValueError: ('Unknown content type for index', 'application/octet-stream')

could you upload the .index file from s3 here? A simple fix would be to just use gzip compression on both the local and s3 index. Which I will work on as soon as I get the chance (I have a relatively busy week though I'm afraid)

I'm suspicious that the fact you dont have the openssl zlib command installed means that your OS (image I guess?) does not support zlib compression.

It's in the attached zipped folder. This is s3 version of .index file.

index.zip

cat index | openssl zlib -d                   
{"abc": {"remote_timestamp": 1523460296, "local_timestamp": 1523460716.0}}%  

seems like your Ubuntu image does not support zlib.

Will update the code to stick to Gzip

Makes sense as it is working fine on Ubuntu 16. My version is Ubuntu 14.04.
Thanks for your time.

@saqib01 please test out the latest version of s4 (0.2.12)and see if this now works for you

I have verified on my all Ubuntu14 servers and it's working fine now. Thanks for your efforts.