dividedby/General-URL-Cleaner-Revived

Regex does not match all Amazon TLDs

Closed this issue · 2 comments

Environment

  • Browser: Google Chrome v120.0.6099.130 (Official Build) (64-bit)
  • Tampermonkey BETA 5.0.6192
  • General URL Cleaner Revived v4.1.9

Issue

The regex in @include and const amazon (and possibly elsewhere) would not match all Amazon domains/URLs. Consequently, unmatched URLs will not be cleaned.

In @include

// @include     /^https:\/\/[a-z]+\.amazon\.(?:[a-z]{2,3}|[a-z]{2}\.[a-z]{2})\/.*$/

In const amazon

const amazon = /^[a-z]+\.amazon\.(?:[a-z]{2,3}|[a-z]{2}\.[a-z]{2})$/;

Example URLs provided below. There could be others.

To Reproduce

Request the following example Product Links / URLs, and note that some won't be cleaned:

// Amazon US
www.amazon.com/Anker-2-Pack-Charging-MacBook-Samsung/dp/B09LCJPZ1P/ref=sr_1_3?keywords=cable&qid=1703823446&sr=8-3

// Amazon AU
www.amazon.com.au/Cable-Matters-Certified-Thunderbolt-Supporting/dp/B01AS8U9KE/ref=rvi_d_sccl_4/358-0955261-5175331?pd_rd_w=jE5Pz&content-id=amzn1.sym.27aa7904-0762-4d7d-89ab-5c11fd177d32&pf_rd_p=27aa7904-0762-4d7d-89ab-5c11fd177d32&pf_rd_r=82SKANWYE8YDYM4KRMF1&pd_rd_wg=0L3ff&pd_rd_r=8bdbcfdc-f69d-4f51-a22b-fbc95ea61ffb&pd_rd_i=B01AS8U9KE&psc=1

// Amazon JP
www.amazon.co.jp/-/en/SALONIA-Speedy-Negative-Lightweight-Foldable/dp/B07VCP8JXS/?_encoding=UTF8&pd_rd_w=MWKUe&content-id=amzn1.sym.b173a2df-735b-4d3d-8f1b-9b9a5b29af69&pf_rd_p=b173a2df-735b-4d3d-8f1b-9b9a5b29af69&pf_rd_r=AEWWR0THENKYHHTZ2084&pd_rd_wg=ee1Af&pd_rd_r=762d014e-9c47-4a03-893e-8d416893eb45&ref_=pd_gw_exports_top_sellers_unrec_jp

// Amazon SG
https://www.amazon.sg/deal/808f2618/?_encoding=UTF8&_encoding=UTF8&ref_=dlx_gate_sd_dcl_tlt_808f2618_dt_pd_gw_unk&pd_rd_w=7rJxH&content-id=amzn1.sym.34d6404b-2180-4a12-87df-dc40292e4443&pf_rd_p=34d6404b-2180-4a12-87df-dc40292e4443&pf_rd_r=CZ34YWCXG59WCKTR9974&pd_rd_wg=V9v7k&pd_rd_r=b13a0d66-09d9-43d9-ad32-1491342d6021

Expected Behavior

All the above URLs should be cleaned.

Attempted Fixes/Workarounds

Explicitly adding the following in "User includes" in Tampermonkey would specifically match Amazon AU only:

/^https:\/\/[a-z]+\.amazon\.com\.au\/.*$/

The following example regex will match the US, JP, SG, AU URLs above. Some variant of such might work in const amazon with the scheme (ie https) removed.

^https:\/\/[a-z]+\.amazon\.(?:[a-z]{2,3}|[a-z]{2,3}\.[a-z]{2})\/.*$

Seemingly fixed by commit nestedfunction@974fc23

PR #13

Tested on the following URLs:

// Amazon US
https://www.amazon.com/Bulletproof-Ketogenic-Friendly-Responsibly-Coconuts/dp/B00R7FFYO8/?_encoding=UTF8&pd_rd_w=gYmjE&content-id=amzn1.sym.0e739659-7c9d-4d82-868c-90015618ffcc&pf_rd_p=0e739659-7c9d-4d82-868c-90015618ffcc&pf_rd_r=Q5AC3GCV7WW0GS64WVHH&pd_rd_wg=Hqnx8&pd_rd_r=0dd0c997-0d17-4e6f-bc63-97c1513ceb33&ref_=pd_gw_exports_top_sellers_unrec&th=1

// Amazon AU
https://www.amazon.com.au/Abireiv-Cable-Waterproof-Buried-able-Compatible/dp/B09M5XYXHQ/ref=sr_1_1_sspa?crid=18TIC6JJWNTLF&keywords=cable&qid=1703829614&sprefix=cab%2Caps%2C294&sr=8-1-spons&sp_csd=d2lkZ2V0TmFtZT1zcF9hdGY&th=1

// Amazon JP
https://www.amazon.co.jp/-/en/Panasonic-WH4415P-Better-Small-White/dp/B008IDMFEU/?_encoding=UTF8&pd_rd_w=5qVAk&content-id=amzn1.sym.1ff70ab5-f36d-477f-9aa1-2ee545862c19&pf_rd_p=1ff70ab5-f36d-477f-9aa1-2ee545862c19&pf_rd_r=N7ABHTTGYVF9SD209XYT&pd_rd_wg=SIe9J&pd_rd_r=45fa7af7-505c-4634-9c8f-73056dd9a15d&ref_=pd_gw_exports_top_sellers_unrec_jp&th=1

// Amazon SG
https://www.amazon.sg/2-Pack-Bluetooth-Tracker-Water-Resistant-Compatible/dp/B09B2XXBFR?ref_=Oct_DLandingS_D_cc1e5ed0_1&th=1

Merged your fix, thanks again!