jarun/googler

TODO list

jarun opened this issue ยท 42 comments

jarun commented

Rolling TODO list thread No. 4. Previous thread is #87.


Possible improvements we'd like to see:

  • google scholar search [ref: #213]
  • Support all options at omniprompt
  • Ubuntu PPA
  • Show YouTube-specific results abstract (more details)
  • Document APIs in NumPy format [owner @jarun]

Suggestions and PRs welcome!

I dropped my old personal wishlist from the pinned list. I doubt anyone wants to invest that much into those items at this point.

Can I get the stock price using Google Finance?

jarun commented

No, googler doesn't have support for Google Finance.

I'd like to see Scholar support. What would be involved?

A patch would do. You just fetch the poorly structured HTML, and parse out structured data. Existing code should give you a clue. Beware that (1) we stick to PSL and nothing more (no third party deps); (2) it could be a pretty substantial patch.

Hello -- Thank you for this powerful tool. Question: is it possible to exclude words from a search (as you would during a regular Google search by typing e.g. "searchTerm -excludedTerm")? -- Steven

jarun commented

It has to do with python argparse. Try this:

googler hello " -world"

note: the space before - is necessary.

Arun, fantastic, thank you for the quick reply! -- Steven

I've met with a problem: when I try to google something, for example
googler hello
I receive No results message. I've looked into code but can't understand, why there is a connection error (I have no restrictions) or something like this.

jarun commented

You are facing #249. Did you try master?

You are facing #249. Did you try master?

I use version install through apt from Ubuntu repository. Yes, problem can be solved using --noua key, so it's something with googler's user agent.

jarun commented

so it's something with googler's user agent.

it's fixed on master and on release 3.7.1.

Can this be used to retrieve Google Image results?

Short answer: no.

Is it possible to show the chosen result directly in the terminal instead of opening it into browser (whether that's a graphical one or one in the terminal)?

jarun commented

Sorry, googler doesn't render the web pages and depends on the browser to do that. Basically we don't think we can do it better than the terminal/web browsers without serious efforts into that area.

I fail to see why we should reinvent a textual web browser. There are even terminal web browsers that can render JavaScript and graphics, e.g. browsh.

Of course, you could also use a third-party content extraction library, or maybe a Readability-like web service to extract content from URLs. There are quite a few such libraries available in Python (can't think of names right off the top of my head, just Google), and probably even more in other languages. Since BROWSER and --url-handler are flexible enough, you can easily supply your own script to "show the chosen result directly in the terminal."

Actually I just coded up a quick example: https://github.com/jarun/googler/wiki/Directly-print-content-of-results-to-terminal. See if this is what you need. @ifohancroft

screen shot 2019-01-19 at 9 26 24 pm

This is exactly what I need! Thank you so much @zmwangx !

jarun commented

Someone bring me a gun, I wanna shoot @zmwangx. ;)

jarun commented

Added to the Curiosity nails it section. ;)

Works almost perfect.

Just like with the reader mode, alias can also be used with the dump-content to make things shorter:

alias search='googler --url-handler dump-content google' (search can be replaced with whatever you want, that is not occupied by another executable or the system and is not commonly used, so there's no chance to get occupied via another software).
Then you can just invoke search [your search term] in terminal and get the results. Then just hit the number of the result you want to get.
Currently, the only thing that seems to be missing is a way to return back to the results index without having to search the term again.
Occasionally, there's some pages who's content doesn't show that well (https://en.wikipedia.org/wiki/Index - I only get:
6.4

Other uses in science and technology

P.S. Added alias section to the wiki entry for dump-content (Won't content-dump be a better name?)

jarun commented

@ifohancroft we can omit the instructions on adding the alias. People who use googler would know about aliases by this time.

Any chance we'll see an option to open the search directly in browser? Just to be clear
googler --some-option my query
should open https://www.google.com/search?q=my+query in the browser.

I know this can be easily done externally from googler, but it'd be nice to have everything in one place.

I'm against this. There was a PR for this that turned sour so I wouldn't link it. Basically, it's pointless to load tons of useless Python modules just to percent encode a query; a trivial script should do. Also, this is already possible in googler with two extra keystrokes (O, <enter>) and two more to quit (q, <enter>); an option wouldn't save much (well, you can add a shell alias, but you can also write the aforementioned script in roughly the time you add the shell alias). Or, if you want to do everything in the browser, why not start in the browser in the first place... Piling on yet another feature that's disconnected from every other feature (with the possible exception of country and language selection, but even that is a tenuous connection, and you didn't mention that explicitly so I wouldn't assume you need it) doesn't feel right.

However, if there's strong demand (say, we get two more independent requests), maybe we can add it.

jarun commented

@renyhp no plans to implement this. Defeats the purpose of having googler.

@zmwangx I mailed you 1/2 days back regarding meta info, please take a look.

I'm looking to finally close this issue. Thoughts:

  1. google scholar search

    I checked it out, it doesn't fit into our model.

    scholar google com_scholar_hl=en as_sdt=0%2C5 q=ads-cft btnG=

    Note the "[PDF] arxiv.org" link to the right. It's different from the primary link, but the primary link is often a paywalled journal link (to people without an institutional subscription) and most people would go for the arXiv link instead[1]. This is obviously field-specific, but when a secondary link is available, which is very welcome, googler isn't equipped to handle multiple links for a single entry.

    In addition, the "cited by" count may be useful, but googler isn't equipped with a secondary metadata field either. Limiting date (year) range works completely differently, too.

    So, in order to support a niche feature we need to devise an expanded schema, (!) an expanded control set, and a different year limiting syntax. Apparently not worth the trouble.

    Finally, I personally use inspirehep.net instead of Google Scholar so the incentive for myself is even lower.

    [1] To non-academic people or academic people not familiar with arXiv, it's a preprint server where math/physics/theoretical computer science etc. researchers post preprints/published manuscripts for open access. Anyone can access any paper for free. Traditional journals, controlled by goliaths like Elsevier, on the other hand, cost $$$. Either you or your institution has to pay for it, and like cable TV, content is often bundled, so your institution needs to pay for a lot of extra crap to get the good stuff. To make matters worse, bundle prices are usually negotiated behind closed doors and institutions has to sign NDAs on the pricing. See Tim Gowers's blog https://gowers.wordpress.com/category/elsevier/ on the evil practices and how mathematicians fought back.

  2. Support all options at omniprompt

    Not gonna happen. No one asked for it, it was always a completeness thing, and it's certainly not a "good first issue" kind of issue where you just accept someone else's patch right away.

I think the TODO list has served its purpose. Nothing actually happened (old item implemented, or new item added and implemented) here in the past two years. This issue could be closed instead of attracting more feature requests that make searching harder.

jarun commented

inspirehep.net is cool!

Yes, I agree we can close this. No one is missing these proposed features for while now.

In the future if a TODO list is ever needed, we could use https://github.com/jarun/googler/projects instead. It's not 2015 anymore so issues don't need to be organized in a meta-issue.

jarun commented

It's not 2015 anymore so issues don't need to be organized in a meta-issue.

Yes, but that's a overkill for small work items. I would prefer raising a defect and leaving it open till someone picks.

jarun commented

Those issues could be labeled feature requests.

jarun commented

or enhancements...

Yes, but that's a overkill for small work items. I would prefer raising a defect and leaving it open till someone picks.

Of course, that's what I was saying. It makes searching easier than commenting on random threads that happen to be open. Project board is for organizing the issues in some way, and I doubt we'll ever have some many items to need organization other than a plain listing under /issues, hence the adverb "ever" before "needed".

jarun commented

I see, I thought the proposal was to use it the next time we need a ToDo list...

And we'll never need it ;)

jarun commented

Hehe... BTW, I checked the possibility of using CircleCI for nnn and I realized what you meant earlier. One account can have either Linux or macOS env. Earlier I was thinking one can have either Linux or macOS in a project.

Oh crap, just realized I forgot to reply there.

Since this thread is still active...
Suggestion: the number of results shown for a page (set of results) will be dependent on the terminal height unless specified in -n.

This is a wont fix. See #207, #218.

EDIT: Actually, we can review a patch if someone submits one.

How can we export the search result in a json file ?? is it possible ? can you do this feature in a next version of googler ?

jarun commented

This option is available already.