jarun/googler

TODO list

zmwangx opened this issue ยท 65 comments

Rolling TODO list thread No. 2. Previous thread is #69.


Possible improvements we'd like to see:

And moonshots:

  • Support DDG (possibly in a separate project?)

PRs welcome!


Archive:

@jarun I would like to add the ability to open Google's cached results. Pretty useful when you encounter things like this:

screen shot 2016-05-17 at 1 38 22 pm

which honestly appears quite often, or when you hit servers located god-knows-where that just won't serve the page.

The parsing strategy is to watch for li.action-menu-item inside div.s (abstract scope), and for a inside such an li, test for https://webcache.googleusercontent.com/search?q=cache: at the beginning of href. The parsing load is minimal. Result will receive one additional kwarg as a result.

The CLI will be designed to be c index, with c for cache.

Thoughts/objections?

It may be too niche of a feature and we might as well leave it to o. Not sure.

jarun commented

I'm fine with c. However, we need to indicate if cache is available or not, maybe with a (C) along with the rest of the result?

It may be too niche of a feature and we might as well leave it to o.

I mean, cache will be available if you open the search in Google. What I wasn't sure about is whether we should add this feature at all.

However, we need to indicate if cache is available or not, maybe with a (C) along with the rest of the result?

Just print an error message if cache is not available.

jarun commented

Please gimme some time to think on this feature.

jarun commented

I believe we can skip the google cache feature. It's an add-on feature that doesn't add much value.

I would love to see a cached page option, it would useful in seeing pages that i may not trust.

Another suggestion would be an option to allow omitted results, particularly handy in site searches or filetype searches.

an example of this is the search

site:blog.checkpoint.com/wp-content/uploads/ filetype:pdf

which returns 1 result or 17 results.

Re cache: @jarun Thoughts on fulfilling user request?

Re omitted: the latest parser logic doesn't omit any potentially valid result. I've seen the same query returning different numbers of results across runs too, but didn't look into it. Could be a bug (although I can't see any race condition), or could be Google returning different results. I'm not at the computer now, but I'll look deeper asap.

jarun commented

@Gilepo

it would useful in seeing pages that i may not trust

I didn't think from this perspective earlier, but you can always use the o key at the omniprompt to open the search in the browser. One additional step in the workflow but what is the percentage of pages you do not trust? I'ld say it's a rare case and we already have a way.

@zmwangx

the latest parser logic doesn't omit any potentially valid result

You are probably chasing another issue. @Gilepo is talking about

In order to show you the most relevant results, we have omitted some entries very similar to the 46 already displayed.
If you like, you can repeat the search with the omitted results included."

We can support this at the omniprompt if our parser finds it. We can show a short message and provide a omniprompt key.

Sorry, im new and may be imprecise.
By omitted results I was referring to results that google omits with the following message.

In order to show you the most relevant results, we have omitted some entries very similar to the 1 already displayed. If you like, you can repeat the search with the omitted results included.

The link included to the additional results had the var
&filter=0
Which i believe includes the full set.

I had not experienced results changing with googler until after you mentioned it.

googler -w 'blog.checkpoint.com/wp-content/uploads' filetype:pdf
returns 1 result
If I then input 2 into the omniprompt returns 10 results (one of which is not a pdf)

Perhaps i am misunderstanding this

@jarun
That is easy enough to use as a solution, my initial mention was prompted as a use case not really a request. I have yet to actually need the feature.

You are probably chasing another issue.

Yes I was. I was on my phone, and I knew a symptom that did match the description, so I didn't check.

We can support this at the omniprompt if our parser finds it.

TL;DR: I need more samples.

It's definitely in #extrares, but #extrares is also used for "searches related to..." (sample query: google) so that's not definitive. In the case above, the "omitted search" is also in #ofr, but I need more samples to decide if that's the key.

Turning filtering off is trivial, it's a matter of adding a filter=0 query.

But this brings us to something bigger: should we allow dynamic option changes within a single invocation (e.g., dynamically enabling/changing time limit in the prompt), just like Google's search tools? The UI could be as simple as entering the options as is at the prompt, and replace - with + to turn off an option. It would be great in a sense, but it might or might not be confusing to the user, and either case it will be quite some work for us.

I believe #ofr is the correct selector. I reached the conclusion by searching for very long sentences from Wikipedia. The last page usually has this "omitted results" thing, always enclosed by #ofr.

Interestingly, I also discovered that results taken down by DMCA are enclosed by #mfr. Try this: https://goo.gl/DYcHD2

jarun commented

should we allow dynamic option changes within a single invocation

No only for this scenario. We will skip the command option, only a key at omniprompt if we find omitted results are available. We would have support toggle and discard it on a new search (new keywords).

I also discovered that results taken down by DMCA

There's a reason for the takedown. We wouldn't touch DMCA.

There's a reason for the takedown. We wouldn't touch DMCA.

That's just a curious discovery. You can't show those even if you want to.

No only for this scenario.

Then I doubt its value. It's another solved-by-o scenario. Most of the time hidden-due-to-repetition results are genuinely worthless.

jarun commented

Then I doubt its value.

Thanks! I needed your opinion on whether to do it or not. I have no intentions to re-write complete Google-search engine from the cmdline. Between you and me, we can't handle that much. Let's save some time for ourselves. ๐Ÿ‘ฏ

@Gilepo

If I then input 2 into the omniprompt returns 10 results (one of which is not a pdf)

Do you mean If I then input n ...?

@zmwangx sounds from your (you and @Gilepo) inputs we might be skipping results (unintentionally) or showing wrong results in some cases.

Do you mean If I then input n ...?

He literally entered 2 into the omniprompt, which triggered an unrelated query for 2 in the same site. He might be confusing 2 for n.

we might be skipping results (unintentionally) or showing wrong results in some cases.

There could be a bug that occasionally leads to skipped results, but I don't believe there could be "wrong results". Anyway, not in this case; everything is correct. Again, I'll keep an eye on the curious behavior I described in #83 (comment). Please let me know if you see it. We can't do anything without samples.

Actually, IIRC I've seen only one or two results with googler -N google when I was rewriting and testing the parser, but now I can't reproduce. Maybe just transient Google problems.

jarun commented

He literally entered 2 into the omniprompt

We should consider bringing back the wrong index warning in this case. Numbers can always be searched with the g keyword. Entering wrong index is a common thing (I do it myself). But the problems are serious:

  • It'll be awkward for new users just testing out googler and unaware of all features.
  • He can't go back to his earlier search easily. I know that's how googler works but this is annoying.
  • Who searches numbers everyday?

He literally entered 2 into the omniprompt

Correct.

which triggered an unrelated query for 2 in the same site.

Ahh okay so it did a second search without filetype:pdf but used 2 as a search term.

This makes sense, and is interesting on googles part as just including 2 in the search (on the website) removes the need for filter=0...

Entering wrong index is a common thing (I do it myself).

Usually there are ten results, and usually you're entering a single digit index. If you enter the wrong digit, you're most likely opening the wrong result, instead of being out of bound.

All other cases of entering a wrong index, say entering 11 instead of 10 is as likely as entering 1p instead of 10, and you can't do anything about the latter, so it makes little sense to make an exception for the former.

jarun commented

This makes sense

Only when explained. That's a problem. Earlier we used to show a message (wrong index)

and is interesting on googles part as just including 2 in the search (on the website) removes the need for filter=0...

Please explain.

jarun commented

Usually there are ten results

Because of terminal size, all my query aliases are set to 5.

Because of terminal size, all my query aliases are set to 5.

Then again, mistyping 5 as 6 is as likely as t (t is more likely if you're a touch typist), so it wouldn't solve the typo problem entirely.

As a compromise, we could print "new search term: blah blah" when the search term changes. It could be helpful especially for new users when don't bother to read and understand help.

so it wouldn't solve the typo problem entirely.

By the way, if you're in favor of ad-hoc partial solutions, then sure, I'm happy to trap single or double-digit numbers that are out of bound. It should be clear that there's never going to be a three-digit index.

I would also propose another ad-hoc thing that might help with usability: enter twice to quit. After removing enter to quit I actually kinda miss it, and enter twice should be a nice middle ground.

These ad-hoc things need not be documented, by the way, which would save us a lot of trouble.

and is interesting on googles part as just including 2 in the search (on the website) removes the need for filter=0...

Err sorry it does not remove the need, its that google is deciding that with no search term most of the pdf results should be omitted. But if we provide a search term that is presumably in each document (the number 1 or letter a) google does not omit them.

This is all using googles webpage.

jarun commented

It should be clear that there's never going to be a three-digit index.

One of our very early users use this tool to dump google results. So please don't assume anything.

How frequently do you search numbers?

After removing enter to quit I actually kinda miss it

you stole my thought from last night. I miss it too. It was damn convenient. :(
2 Enters would be great. We can document briefly, these won't change much now that we know the bliss of having it.

One of our very early users use this tool to dump google results.

Dump hundreds or thousands results interactively, find the index, then enter at the prompt? That's definitely safe to ignore. Result dumping should be done through --noprompt, probably with --json too for better tooling.

jarun commented

One of our very early users use this tool to dump google results.

Ahh sorry! It wouldn't hurt him. He uses a direct search.

you stole my thought from last night.

To be fair, I got the idea almost immediately after it was removed ๐Ÿ˜‰

We can document briefly

Documentation is a pain to write (especially if you want to be precise), and they get outdated all the time especially if you're precise. I would leave edge cases to the actual program.

jarun commented

We can safely assume there can't be a 3-digit index.

Documentation is a pain to write

I'll do it. Don't worry. Just bring it back to life with 2 touches.

An ad-hoc patch to bring it back:

From 95b3c1609513e61119d288b067cc7bd8e2866de0 Mon Sep 17 00:00:00 2001
From: Zhiming Wang <zmwangx@gmail.com>
Date: Thu, 19 May 2016 23:30:50 -0700
Subject: [PATCH] googler: Enter twice to quit

---
 googler | 9 +++++++++
 1 file changed, 9 insertions(+)

diff --git a/googler b/googler
index 44d5c39..f5ff29f 100755
--- a/googler
+++ b/googler
@@ -1255,6 +1255,15 @@ while True:
     except EOFError:
         break

+    if not nav:
+        try:
+            nav = show_omniprompt()
+        except EOFError:
+            break
+        if not nav:
+            # Two consecutive enters
+            break
+
     if nav == "n":
         if len(results) == 0:
             nav = ""
-- 
2.8.2

By the way, just noticed the condition of the penultimate branch:

elif len(nav):

In Python

elif nav:

is enough.

jarun commented

is enough

Please feel free to change.

Regarding the ad-hoc patch, does ^D work fine still? Just test it out.

Regarding the ad-hoc, does ^D work fine still? Just test it out.

Of course, ^D -> EOFError -> break.

jarun commented

๐Ÿ‘ forgot that we handle it in EOF except.

jarun commented

I'm happy to trap single or double-digit numbers that are out of bound. It should be clear that there's never going to be a three-digit index

Please push this. I have incorporated the Enter twice snippet already.

How about a release this weekend? Can be a bit difficult for me (sat morn here) but we already have 89 pending changes so I'll manage.

Please push this.

How about

diff --git a/googler b/googler
index 2788d7b..fd7e3da 100755
--- a/googler
+++ b/googler
@@ -1314,6 +1314,8 @@ while True:
         start = basestart
     elif nav in urlindex:
         open_url(urlindex[nav])
+    elif nav.isdigit() and int(nav) < 100:
+        printerr("Index out of bound. Use g to search for the number.")
     elif nav:
         trimsearch = nav.strip().replace(" ", "+")
         if trimsearch == "" or trimsearch == "g":

How about a release this weekend?

You'll take care of all the work, so why would I object? ๐Ÿ˜‰

jarun commented

You'll take care of all the work, so why would I object?

In case there are any questions on stability or any features in mind.

Will add the change.

I want to show you my "script" before making the recording. Let me know if you're happy with it. Specifically, I'm showing off some new features, e.g., color and site links.

googler --count 3 --exact googler

1 Urban Dictionary: Googler
http://www.urbandictionary.com/define.php?term=Googler
Googler. An employee working at Google. Employee benefits include free massages, gourmet food, no set working hours, constant
talks from presidential ...

2 Googler - Wiktionary
https://en.wiktionary.org/wiki/Googler
Noun[edit]. Googler   (plural Googlers). A full-time Google corporation employee. A regular or habitual user of the Google search
engine.

3 GitHub - jarun/googler: Google from the command-line
https://github.com/jarun/googler
Asciicast. googler is a power tool to Google (Web & News) and Google Site Search from the terminal. It shows the title, URL and
text context for each result, ...

googler (? for help) 3
googler (? for help) 4
Index out of bound. To search for the number, use g.
googler (? for help) n

1 Googler dictionary definition | Googler defined
http://www.yourdictionary.com/googler
(1) A person who is an employee of Google. See Xoogler. (2) A person who uses Google to search the Internet. See Google. Computer
Desktop Encyclopedia ...

2 Googler | Article about Googler by The Free Dictionary
http://encyclopedia2.thefreedictionary.com/Googler
Looking for Googler? Find out information about Googler. A person who is an employee of Google. See Xoogler. A person who uses
Google to search the ...

3 Hello, Googler - Google Careers
http://www.google.com/intl/en/about/careers/lifeatgoogle/hello-googler.html
Hello, Googler. ELLE Magazine takes a look at what it's like to work for Google China, calling out that Google's creative and
inclusive work environment is built ...

googler (? for help) site:github.com buku

1 GitHub - jarun/Buku: Powerful command-line bookmark manager ...
https://github.com/jarun/Buku
Asciicast. buku is a powerful cmdline bookmark management utility written in Python3 and SQLite3. When I started writing it, I
couldn't find a flexible cmdline ...

2 linuxbrew/buku.rb at master ยท Linuxbrew/linuxbrew ยท GitHub
https://github.com/Linuxbrew/linuxbrew/blob/master/Library/Formula/buku.rb
class Buku < Formula. desc "Command-line bookmark manager". homepage "https://github.com/jarun/Buku". url
"https://github.com/jarun/Buku/archive/1.8.tar.gz".

3 Buku/buku.1 at master ยท jarun/Buku ยท GitHub
https://github.com/jarun/Buku/blob/master/buku.1
Powerful cmdline bookmark manager. Your mini web! Contribute to Buku development by creating an account on GitHub.

googler (? for help) q
export GOOGLER_COLORS=bjdxxy
googler google

1 Google
https://www.google.com/
Search the world's information, including webpages, images, videos and more. Google has many special features to help you find
exactly what you're looking ...

    1a Google Maps
    https://maps.google.com/
    Find local businesses, view maps and get driving directions in ...

    1b Google Earth
    https://earth.google.com/
    Google Earth lets you fly anywhere on Earth to view satellite ...

    1c Google News
    https://news.google.com/
    Comprehensive up-to-date news coverage, aggregated from ...

    1d Google Accounts
    https://accounts.google.com/
    Sign in with your Google Account ... One Google Account for ...

    1e Gmail
    https://mail.google.com/
    Google-owned, web-based email service provides details of ...

    1f Google Images
    https://images.google.com/
    Google Images. The most comprehensive image search ...

2 Google (@google) | Twitter
https://twitter.com/google?ref_src=twsrc%5Egoogle%7Ctwcamp%5Eserp%7Ctwgr%5Eauthor

3 Google - Facebook
https://www.facebook.com/Google/
Today in our Mountain View backyard we kicked off Google I/O, our annual developer conference. Catch up on all the announcements
from this morning's ...

4 Google - YouTube
https://www.youtube.com/google
Experience the world of Google on our official YouTube channel. Watch videos about our products, technology, company happenings
and more. Subscribe to ...

5 Google - Wikipedia, the free encyclopedia
https://en.wikipedia.org/wiki/Google
Google is an American multinational technology company specializing in Internet-related services and products. These include
online advertising technologies, ...

6 What Google Learned From Its Quest to Build the Perfect Team - The ...
http://www.nytimes.com/2016/02/28/magazine/what-google-learned-from-its-quest-to-build-the-perfect-team.html
Five years ago, Google โ€” one of the most public proselytizers of how studying workers can transform productivity โ€” became focused
on ...

7 Is Google Making Us Stupid? - The Atlantic
http://www.theatlantic.com/magazine/archive/2008/07/is-google-making-us-stupid/306868/
A few Google searches, some quick clicks on hyperlinks, and I've got the telltale fact or pithy quote I was after. Even when I'm
not working, I'm as likely as not to ...

8 The 10 biggest announcements from Google I/O 2016 | The Verge
http://www.theverge.com/2016/5/18/11701030/google-io-2016-keynote-highlights-announcements-recap
At I/O this year, Google displayed its vision for a more ubiquitous and conversational way of interacting with technology. Its
Assistant is chattier, ...

googler (? for help) q
googler --site tuxdiary.com --count 3 --time y1 cmdline utility

1 Find fastest Ubuntu mirror from cmdline โ€“ TuxDiary
https://tuxdiary.com/2015/06/13/find-fastest-ubuntu-mirror-cmdline/
Jun 13, 2015 - A neat hack using Debian tool netselect to find the fastest Ubuntu mirrors. ... select it, but here's a cool trick
to do it faster and with more control from the cmdline.

2 megatools: Mega.nz from cmdline โ€“ TuxDiary
https://tuxdiary.com/2016/02/02/megatools-mega-nz-from-cmdline/
Feb 2, 2016 - Recently the team has released a number of cmdline utilities to manage Mega cloud storage including direct media
streaming support. The utilities released so ...

3 Capture, upload screenshots from cmdline โ€“ TuxDiary
https://tuxdiary.com/2015/06/14/capture-upload-screenshots-cmdline/
Jun 14, 2015 - Cmdline utilities to capture and upload screenshots to image hosting services like imgur.

googler (? for help) linux encoder

1 Unlock Linux.Encoder.1 encrypted files โ€“ TuxDiary
https://tuxdiary.com/2015/11/18/unlock-linux-encoder-encrypted-files/
Nov 18, 2015 - We wrote about Linux.Encoder.1, the first known Linux ransomware a week back. Luckily for those who are affected,
BitDefender has published a Python script ...

2 Zenefits: free payroll management โ€“ TuxDiary
https://tuxdiary.com/2015/11/17/zenefits/
Nov 17, 2015 - Linux, open source, cmdline, leisure. Menu and ... "HACKS/UTILS". Linux distros for businessIn "HACKS/UTILS" ...
post: Unlock Linux.Encoder.1 encrypted files.

3 Archives โ€“ TuxDiary
https://tuxdiary.com/archives/
Apr 11, 2016 - Monthly archives All posts Latest 32 Most popular 32 Most beautiful Linux distros yavide: modern C C++ IDE over
vim wifiphisher: automated WPA phishing ...

googler (? for help) q

I'm going to use Tango Dark again for the recording, by the way. Let me know if you want something else. Here's a list of themes supported by asciinema web:

screen shot 2016-05-21 at 10 41 07 pm

jarun commented

I use Tango too. I'm good!

I'm showing off some new features

Exactly what I had in mind. I have these in the release notes too.

I use Solarized Dark, but the default color scheme looks pretty horrible in Solarized Dark. It looks reasonable in Tango Dark.

Anyway, there you go. https://asciinema.org/a/46340

The custom color scheme I showed off there was designed for Solarized Dark. It actually looks pretty reasonable in Tango Dark locally, but for whatever reason red is too red in asciinema web. I doesn't matter much.

It's also too wide and too tall to be comfortable, but I need to fit that much stuff into the static screenshot.

diff --git a/README.md b/README.md
index 9e3a5ea..fc30a4b 100644
--- a/README.md
+++ b/README.md
@@ -9,7 +9,7 @@
 </p>

 <p align="center">
-<a href="https://asciinema.org/a/43222"><img src="https://asciinema.org/a/43222.png" alt="Asciicast" width="600"/></a>
+<a href="https://asciinema.org/a/46340"><img src="https://asciinema.org/a/46340.png" alt="Asciicast" width="734"/></a>
 </p>

 `googler` is a power tool to Google (Web & News) and Google Site Search from the terminal. It shows the title, URL and text context for each result, which can be directly opened in a browser from the terminal. Results are fetched in pages (with page navigation). Supports sequential searches in a single `googler` instance.

In hindsight I should have used slightly fewer than 100 columns.

jarun commented

Looks great!

jarun commented

Can you please raise a PR or push directly?

Could you please fill CHANGELOG to 80 columns, squash the fixup commit, and rebase? I can then push the README commit.

I unfilled the notice in https://github.com/jarun/googler/releases/tag/v2.4 because this is HTML, not plain text.

And good to see automatic deb package upload working as expected: https://github.com/jarun/googler/releases/download/v2.4/googler_2.4-1_all.deb.

jarun commented

Thanks! :)

jarun commented

๐Ÿ‘ฏ

I found this bullet point a bit troublesome:

  • Basic support for ConEmu terminal on Windows

It's not just ConEmu; any terminal emulator with ANSI escape sequence support should work. (Now that you mentioned it, I think I forgot to test open in browser.) Anyway, I don't really care about Windows, so whatever.

jarun commented

I am changing the rel-note. But it's also in the changelog, which I'll leave untouched.

Sure.

jarun commented

Done! Nice to see the deb package. :)

jarun commented

I'll roll the ToDo list. It's a happy ending for this one! ;)

Cool.

And moonshots:

Support DDG (possibly in a separate project?)

If there is someone here interested in this feature, two months ago I created a Python program to run DuckDuckGo searches from the command line. It's not like like exactly like googler, but I would like to add an interactive mode similar to how Googler works. Here is the project repository: https://notabug.org/Ducker/ducker

There is also a pip package (pip install ducker), a website and simple documentation http://www.freakspot.net/programas/ducker/docs/ (included PDF, HTML, man and info pages).

If anyone would like to help me add the interactive mode, it would be great. :)

Cool! I can add now add the interactive mode which is the feature I missed for my program using the code of ddgr. @zmwangx, thanks.

jarun commented

Cool! I can add now add the interactive mode which is the feature I missed for my program using the code of ddgr. @zmwangx, thanks.

@jorgesumle Be my guest! Also, from a quick look into ducker it seems you are borrowing some aspects of googler already (like the suppress-browser-output option). ddgr has a great framework because of the pieces from googler. Please consider contributing to ddgr directly. I can add you as a collaborator.

Also, from a quick look into ducker it seems you are borrowing some aspects of googler

Yes, I got the function open_url from googler, it's exactly the same. Apart from that they are different I think, but the idea is very similar. Well, mine is simpler: I just build a URL and open it with the browser.

With Ducker though I can open multiples DuckDuckGo searches at the same time using the -m flag. When googler shows the results, it would be great that you could open various results in different tabs. googler (? for help) 1 2 3 would open the first three results in three different tabs of the web browser. What do you think about this idea?

Please consider contributing to ddgr directly. I can add you as a collaborator.

ddgr is great and if I had knew about it before I would have probably forked it to add the functionality of Ducker, but two months ago I added Ducker to PyPI and I am very comfortable upgrading and installing it with pip in different computers. Also I don't like using Github.

As the code I will add is really big and important, I will add you in the program header (# Copyright (C) 2016 Arun Prakash Jana <engineerarun@gmail.com>). We both are using the GPLv3 so there is no problem.

When I make any improvement to the code taken from ddgr, I'll make a pull request to ddgr repository. ;)

jarun commented

When googler shows the results, it would be great that you could open various results in different tabs.

This is available in latest master. You can open multiple results from omniprompt in a go.

Regarding contributing to ddgr:

I just left the note as an option for consideration. In any case, I can't work on it right away myself so your approach is quite reasonable.