Cookies don't seem to be working..
Closed this issue ยท 15 comments
I'm trying to grab the contents of a private Google group we've been using as a group inbox, and create an mbox file so we can import the messages back into an IMAP account.
I've followed the instructions, and even when I grab the cookies via multiple ways (firefox with cookie exporter, chrome with cookies.txt plugin), then set my wget options, i always get the same response from wget:
: Creating './devs//threads/t.3' with 'forum/devs'
:: Fetching data from 'https://groups.google.com/a/mycompany.com/d/__FRAGMENT__?_escaped_fragment_=forum/devs'...
--2018-03-22 19:40:27-- https://groups.google.com/a/mycompany.com/d/__FRAGMENT__?_escaped_fragment_=forum/devs
Resolving groups.google.com (groups.google.com)... 108.177.112.113, 108.177.112.139, 108.177.112.102, ...
Connecting to groups.google.com (groups.google.com)|108.177.112.113|:443... connected.
HTTP request sent, awaiting response... 302 Moved Temporarily
Location: https://accounts.google.com/AccountChooser?continue=https://groups.google.com/a/mycompany.com/d/__FRAGMENT__?_escaped_fragment_%3Dforum/devs&hl=en&service=groups2&hd=mycompany.com [following]
...
It get's stuck in this loop because it's not authenticating and getting redirected to the AccountChooser page.
I can access the https://groups.google.com/a/mycompany.com/d/__FRAGMENT__?_escaped_fragment_=forum/devs
URL in my browser, but i can't with wget
even directly in the command line (same error).
Any ideas would be appreciated!
BTW used this Firefox extension: https://addons.mozilla.org/en-US/firefox/addon/cookies-txt/?src=search
And this Chrome one: https://chrome.google.com/webstore/detail/cookiestxt/njabckikapfpffapmjgojcnbfjonfjfg
and this one: http://www.editthiscookie.com/
No dice...
Hi @rhukster ,
I'm sorry for any inconvenience. Did you use _GROUP
variable to specify your company information? (e.g, export _GROUP=mycompany.com
).
I will give some tests with private group in a organization today.
Thanks
No, I used _ORG
for that:
export _GROUP="devs"
export _ORG="mycompany.com"
export _WGET_OPTIONS="--load-cookies /my/path/to/cookies.txt --keep-session-cookies --verbose"
I can reproduce the problem now (=_ORG`'s value is lowercase). I am taking further look at this issue. Thanks for your patience
I'm pretty sure that the script will not work with (new) Organization groups: They are written in new web framework (single-page application). This is similar to issue reported on #14. Let me see if there is any work-around.
@rhukster Good news for you. The addons https://addons.mozilla.org/en-US/firefox/addon/cookies-txt/?src=search generates some weird output. You can fix as below
- Generate cookie file by using that addons (
cookies-txt
) - Open the file, and remove all strings
#HttpOnly_
- Remove all temporary directory (the script would create
devs
directory in your working directory), and try again.
I have tested and it's working very well at my side. Hope this also helps you :)
Changes:
- Detect loopping issue
- Improved documentation (Remove
#HttpOnly_
strings from cookie file.
Feel free to reopen the ticket if there is any looping issue. Thanks a lot.
I seem to be encountering this issue as well as of this morning, which is strange since I was able to get this to work without error on 10/2/18.
It seems like this was just an issue with my cookies.txt file- I was missing the groupsloginpref cookie for some reason and that seemed to be the source of my issue (which was more or less identical to the first code block in this issue).
It might be worth mentioning in the Readme the exact cookies that are needed for private group scraping to work, according to here, these are: SID, HSID, SSID, and groupsloginpref.
Thanks a lot for your very useful feedback @jpellman . I will update README
accordingly.
Ach- I finally figured out what this was. Basically, my issue was that I wasn't reading the instructions properly. I somehow misconstrued "When you have the file, please open it and remove all #HttpOnly_ strings." in the README to mean "remove all lines starting with #HttpOnly_" when it meant "find all instances of #HttpOnly_ and replace them with an empty string". It might be worth adding a sed command under there to reinforce that you're doing string replacement and not line removal. Maybe something like:
sed -i -e 's/#HttpOnly_//g' cookies.txt
Sorry for any noise.
Never mind @jpellman . English is not my primary language and I may always confuse anyone ;) I've updated README as you suggested :) Thx again.
Cookies don't seem to be working... Google now has denied to crawler lolz