ControlNet/wt-data-project.data

Bypass Thunderskill

Opened this issue · 24 comments

As there is a lot of complaints about the nature of the data in thunderskill being selective and not representative of the actual general performance.
Would you be interested in bypassing thunderskill and collecting the data directly?
This way all games could be parsed and we could avoid arguments against the validity of the data.

Hi Bearddyy, is there any way can do that legally?

@ControlNet We can extract scores etc from replay files downloaded from their website,
No less legal than doing the same thing from thunderskill?

Kind of interesting on this. any update?
My opinion is find a way to extract data from original API, for example, in game client, we can see someone's profile, it's data must come from somewhere.
I once obtained the real API url of the game data through packet capture, but I don't know how to use it

@axiangcoding Actually, there is someone having tried that, but it's actually not feasible here. If you want to try downloading all the replay files from WT's official replay website, you need around 2~4 gbits per second to download it. And they will ban the ip if downloading too much. So it's not a good way.

From that person's analysis, the data from the official replay website and the data collected from thunderskill strongly correlated, so currently the data is still fine for some analysis.

@axiangcoding Actually, there is someone having tried that, but it's actually not feasible here. If you want to try downloading all the replay files from WT's official replay website, you need around 2~4 gbits per second to download it. And they will ban the ip if downloading too much. So it's not a good way.

From that person's analysis, the data from the official replay website and the data collected from thunderskill strongly correlated, so currently the data is still fine for some analysis.

i believe you reply to the wrong guy...

@axiangcoding Actually, there is someone having tried that, but it's actually not feasible here. If you want to try downloading all the replay files from WT's official replay website, you need around 2~4 gbits per second to download it. And they will ban the ip if downloading too much. So it's not a good way.
From that person's analysis, the data from the official replay website and the data collected from thunderskill strongly correlated, so currently the data is still fine for some analysis.

i believe you reply to the wrong guy...

No... I guess that person who contacted me via email is Breaddyy, so I share the information here to let you know.

@ControlNet okey then. I remember that the replay file is binary or encrypted. Is there a way to decrypt it now?

@axiangcoding I see the Bearddyy's repository can handle it. Please have a look https://github.com/Bearddyy/wtparser

@axiangcoding I see the Bearddyy's repository can handle it. Please have a look https://github.com/Bearddyy/wtparser

Thanks. He really make some progress on this

Kind of interesting on this. any update? My opinion is find a way to extract data from original API, for example, in game client, we can see someone's profile, it's data must come from somewhere. I once obtained the real API url of the game data through packet capture, but I don't know how to use it

I have found a way to get full player data without even needing an auth header, buuutttt it uses protobuf, and I need to transform compiled definitions to a file to be able to use it.

I've still found some very useful endpoints through, such as searching for player names (I've also scraped for them too) and fetching news.

I have found a way to get full player data without even needing an auth header, buuutttt it uses protobuf, and I need to transform compiled definitions to a file to be able to use it.

Mind if share what is and how to use that API? I used tried to capture network packet, but it's a cdn url base on AWS, not sure i can use it.

I've still found some very useful endpoints through, such as searching for player names (I've also scraped for them too, I'll add the link soon) and fetching news.

Looking forward to see those links!

@axiangcoding
It's from the assistant

I'll make a public postman workspace and link it o7

@RaidFourms Thanks advance for sharing! It helps a lot.

Thanks for sharing. Looking forward to your works.

@axiangcoding Actually, there is someone having tried that, but it's actually not feasible here. If you want to try downloading all the replay files from WT's official replay website, you need around 2~4 gbits per second to download it. And they will ban the ip if downloading too much. So it's not a good way.
From that person's analysis, the data from the official replay website and the data collected from thunderskill strongly correlated, so currently the data is still fine for some analysis.

i believe you reply to the wrong guy...

No... I guess that person who contacted me via email is Breaddyy, so I share the information here to let you know.

I didn't email you, must have been someone else.
As for data rate limit, it's potentially able to be circumvented by distribution of scripts to VMs as each processes less. Also I have found each replay has 2 types of files that alternate so I suspect the data rate is further reduced. But again, could just be blocked by gaining, would need a fair amount of automation.

@axiangcoding Actually, there is someone having tried that, but it's actually not feasible here. If you want to try downloading all the replay files from WT's official replay website, you need around 2~4 gbits per second to download it. And they will ban the ip if downloading too much. So it's not a good way.
From that person's analysis, the data from the official replay website and the data collected from thunderskill strongly correlated, so currently the data is still fine for some analysis.

i believe you reply to the wrong guy...

No... I guess that person who contacted me via email is Breaddyy, so I share the information here to let you know.

I didn't email you, must have been someone else. As for data rate limit, it's potentially able to be circumvented by distribution of scripts to VMs as each processes less. Also I have found each replay has 2 types of files that alternate so I suspect the data rate is further reduced. But again, could just be blocked by gaining, would need a fair amount of automation.

I would imagine proxies would be much more efficient

Also here is the username/userid scraper. Very inefficient through 😭

Also here is the username/userid scraper. Very inefficient through 😭

Thanks for this, I didn't even know the app existed.
I think there's potentially a fair amount of data that could be scrapped from that endpoint, but the specific data I was interested in, like specific vehicle or national performance doesn't appear as if it would be available from navigating around the app. It looks like it's similar data to the user page.

Also here is the username/userid scraper. Very inefficient through 😭

Thanks for this, I didn't even know the app existed. I think there's potentially a fair amount of data that could be scrapped from that endpoint, but the specific data I was interested in, like specific vehicle or national performance doesn't appear as if it would be available from navigating around the app. It looks like it's similar data to the user page.

wdym?

@axiangcoding Any update on it yet?

@axiangcoding Any update on it yet?

Sorry, I'm not very familiar with protobuf, I need some time to try it out. I don’t have much time recently, I will share with you any progress

@axiangcoding Any update on it yet?

Sorry, I'm not very familiar with protobuf, I need some time to try it out. I don’t have much time recently, I will share with you any progress

I don't know much either lol

i'll still keep you updated o7

hi guys, i have started a repo https://github.com/axiangcoding/wt-profile-tool to create a wt profile parse library base on @RaidFourms's information and help. But for now this is just the beginning.

For the player vehicle data, I think I've been able to parse it out, in the near future.

Thank you all for sharing the information, it will greatly advance this work.


In fact, I am good at web server development. I know nothing about parsing apk and stuff like that. Special thanks to @RaidFourms's hard work on this.