kevinzg/facebook-scraper

I don't know what is this issue

NguyenDrasp opened this issue · 1 comments

in extract_comment_replies
data = json.loads(response.text[prefix_length:]) # Strip 'for (;;);'
File "/usr/lib/python3.10/json/init.py", line 346, in loads
return _default_decoder.decode(s)
File "/usr/lib/python3.10/json/decoder.py", line 340, in decode
raise JSONDecodeError("Extra data", s, end)
json.decoder.JSONDecodeError: Extra data: line 1 column 30442 (char 30441)

This issue comes from LN1139 of facebook_scraper/extractors.py that loads the response text directly into JSON, but as it happens, sometimes there are two JSON objects inside the response without it being wrapped in an array.

I made a hot fix for the issue by restructuring the json string to be wrapped in an array in cases where there are multiple json objects in an invalid format.

Line 1138
            json_str = response.text[prefix_length:].strip()  # Strip 'for (;;);'
            
            if "}{" in json_str:
                # multiple json objs can come without being wrapped in an array
                json_str = f"[{json_str.replace('}{', '},{')}]"
            
            data = json.loads(json_str)
            
            if isinstance(data, list):
                for i, subdata in enumerate(data):
                    if i == 0:
                        continue
                    data[0]['payload']['actions'].extend(subdata['payload']['actions'])
                data = data[0]
Line 1159

i.e.

image

It would be helpful if you rename your issue as "JSONDecodeError "extra data" in extract_comment_replies"
@NguyenDrasp