BaseMax/GooglePlayWebServiceAPI

Fetch reviews of an app

BaseMax opened this issue · 13 comments

Hi there,

Gift

I'm here to say I will give a gift to anyone who can add this feature to the project and library.

Description

The feature is to fetch a list of reviews of an application. For example, we are going to fetch the list of all reviews one by one and handle pagination to fetch all reviews.

Sample page: https://play.google.com/store/apps/details?id=com.king.crash&hl=en&gl=US

Best,
M.

Well, as you pinged me on that and even assigned it to me… I had this on my list already indeed. Just need some time to implement. Where to get the data from is already marked in a note I have here, back from May 2022 (never got to that as I myself didn't need it).

Are you in a hurry with that so I shall name the hints here? Or is it "whenever time permits" and would at least have time until, say May 2023?

Just to save time: https://github.com/Ne-Lexa/google-play-scraper may be useful

Thanks, but why? As pointed out, I've already marked it here, just needed to implement (PR on its way). Thanks though, maybe that link can fill some gaps.

Well, as you pinged me on that and even assigned it to me… I had this on my list already indeed. Just need some time to implement. Where to get the data from is already marked in a note I have here, back from May 2022 (never got to that as I myself didn't need it).

Are you in a hurry with that so I shall name the hints here? Or is it "whenever time permits" and would at least have time until, say May 2023?

Great. do it when you can. It's okay with me. unless someone else does it faster :)

unless someone else does it faster :)

Too late for that now 🤣 Your turn, please test!

OK, checked your reference. Looks like my implementation has more details already 😉 Theirs:

        return new Review(
            $reviewId,
//            $reviewUrl,
            $userName,
            $text,
            $avatar,
            $date,
            $score,
            $likeCount,
            $reply,
            $appVersion
        );

Mine puts the reviewer's data into its own array (including userID, name, avatar, background image – ID+BG are missing "over there"). If you want the fields could still be renamed with my implementation: thumbs=>likeCount for example, or naming the id field in the user record userId, user_id, reviewer_id or what you like. Currently, for easy comparison:

[
   "review_id",
   "reviewed_version",
   "review_date",
   "text",
   "stars",
   "thumbs",
   "reviewer" = [
          "id",
          "name",
          "avatar",
          "bg_image"
   ]
]

So maybe text=>review_text as well…

Hm, $reply? That one I don't have. Might need to check what they put in there…

Wow, nice.

I wonder if we can implement a function to get all reviews with a solution to get all, one by one or pagination.

I do not forget what I said.
As I said, I want to pay for a gift. Show me possible ways for it.

As it seems you did it so you win it.

I am going to have a trip to different countries soon. It seems you are in Munich. I was invited to give a talk at a university in Belgium. It seems Munich is near if it was, I would like to meet you and give you a special gift from my country.
If that not works. I will do it online.

Best,
M.

I wonder if we can implement a function to get all reviews with a solution to get all, one by one or pagination.

For that I'd first need to figure where that XHR goes to (e.g. by watching network traffic in the browser console while triggering such an XHR), and next how to parametrize it (how much can it pull? how often do I need to loop? what other request parameters (e.g. referrer) would be needed?). Just tried that, only see requests to load images (JPG, PNG) – later followed by some POST requests where Google wants to log stuff (haha, failing as I blocked XHR to gstatic, LOL – and yikes, seems like those were not even encrypted – or the lock icon is missing because the connection was not established).

image

As it's loaded dynamically into the very same page I expect the returned structures to be similar to what's already there, so the logic implemented could be reused. So I'd move the parsing part to a separate (protected?) method to be called in both places, leaving the "initial set" with the general app data while having the "full list" retrieved by a separate method.

That said and combined: there seem to be more than the 40 reviews initially collected, but I do not yet have an idea on how to fetch those. Maybe that's the reason the other scraper has no such reference either (or I missed it). But wait:

$nextToken = $json[1][1] ?? null;
return [$reviews, $nextToken];

That looks like it would return a pointer on where to get more. Let's see what $json[1][1] looks like:

Array
(
    [0] => 
    [1] => CmgKZgpkMCwxMDAxMDAwLjQ2MTM5NzI2MDQsMTg4Nzg1MTU3NzEyLCJodHRwOi8vbWFya2V0LmFuZHJvaWQuY29tL2RldGFpbHM_aWQ9djI6Y29tLmtpbmcuY3Jhc2g6MSIsMSxmYWxzZQ
)

So there is something.

        $reviews = $this->gplay->getReviews(
            $appId,
            $limit = 555,
            SortEnum::NEWEST()
        );

Do you think the same when seeing $limit there? So here's the entry-point – and here's where we can find the needed query parameters and the call. Would still be some work to cobble that together… This part looks familiar to me, though:

        $formParams = [
            'f.req' => '[[["' . self::RPC_ID_REVIEWS . '","[null,null,[2,' . $sort->value(
                ) . ',[' . $limit . ',null,' . ($token === null ? 'null' : '\\"' . $token . '\\"')
                . ']],[\\"' . $requestApp->getId() . '\\",7]]",null,"generic"]]]',
        ];

I remember having used that f.req in some other context as well…

As it seems you did it so you win it.

Uh? Wow, thanks!

I am going to have a trip to different countries soon. It seems you are in Munich. I was invited to give a talk at a university in Belgium. It seems Munich is near if it was, I would like to meet you and give you a special gift from my country.
If that not works. I will do it online.

Munich is correct – but "near" depends on your point of view. It's about 700 km. I'd be happy to meet you, though, if you'll be in Munich!

If you're still looking for a full list, I've just stumbled upon this: https://gist.github.com/kamoo1/af655f05700eb76bb29aec876493ed90 (which is Python, but might fit your use case).

Yeaaaaaaaah! Nice. It will be good if we can have this one here in PHP to make it possible to iterate and get all reviews on most pages.

I unfortunately lack the time to implement this (at least currently™). Can hardly keep up with my existing queue. But the current code already contains the "token" needed to fetch more results, so our code could use that probably. With a separate method, it might be more performant (and avoiding unneeded traffic) to start at a dedicated place (without obtaining the "base data" for the app), though.