hafen/htmlwidgetsgallery

discussion/feedback

Opened this issue · 56 comments

This is great. Let me know how I can help. Only 9 for me? on week 26 :)

hafen commented

@timelyportfolio, you're on top of things! I pushed this out earlier today and didn't have time to comment on it.

I had mentioned this approach a while back, here ramnathv/htmlwidgets#63, building on work by @jpmarindiaz, and finally found a few minutes to play around with the idea yesterday. And then of course as with any diversion I got sucked into trying to get something presentable together today :). Hopefully I'm not duplicating effort with anyone else right now, but I had some ideas and wanted to get things to a state where we can discuss. At this point, @timelyportfolio, @ramnathv, @jjallaire, @jcheng5, @jpmarindiaz, your feedback / help would be great.

Take a look here http://hafen.github.io/htmlwidgetsgallery to see what it's like.

The way I've worked it out here, to submit a widget, the idea would be to simply add the appropriate entry in _config.yml and add a screenshot and send a PR. Whoever the gallery curators are can merge and it's done. This should keep work of curators to a minimum while still allowing for quality control.

In no particular order, here are some things that could use some help and feedback. Would be good to deal with many of these before it's ready for the public:

  • any comments on the meta data format (see below)
  • add any missing htmlwidgets (I don't know how old the yaml file was that I started from but it didn't have, for example, d3heatmap - and @timelyportfolio, I'm sure a good number of yours are missing :), and I just saw, for example, rflot)
  • update any missing / incorrect meta data for current widgets - for example I chose the screenshots and descriptions - maybe some of the authors would prefer something else
  • optimize images (small as possible while looking good on retina)
  • any feedback or work on the layout of each widget's entry would be great
  • What are the criteria for a widget to be admitted to the gallery? Installs without errors? Fully documented? Has examples? Adheres to sizing policies? etc. We should determine what this is and publish it in the README, along with the meta data expectations.
  • Should we consider populating any of the meta data automatically from things like DESCRIPTION? I'd really like to stick to a plugin-less jekyll format as it is now - the idea of having to run some daily script on some machine to keep things up to date or to always have to merge and run something locally with each PR doesn't sound like a good idea.
  • One of the most tricky issues I've been dealing with is how to dynamically get github stars. I tried several things with different badge services, but they always embed in an iframe and you can't access the data inside the iframe. I also tried a way that gets the stars dynamically with javascript using the github api, but you get rate limited on your first view of the page. Any brilliant ideas here would be great. There's got to be some jekyll hack to grab from the github api but probably not without a plugin.

If you have issues with this approach in general, let me know. If this approach is the way to go, here are some questions:

  • Is this an htmlwidgets sanctioned site or 3rd party?
  • Who maintains it?
  • Where will it ultimately reside?

Finally, here are the meta data entries and what they are supposed to contain, as it is now:

  • name: the actual name of the R package (required)
  • thumbnail: location of the thumbnail (standard is images/ghuser-ghrepo.png) - this should be required but defaults to a blank thumbnail
  • url: url to the desired landing page you'd like people to first see for the widget (the widget's home page, a vignette, or as a final resort, if not specified, the widget's github page)
  • jslibs: a comma separated list of javascript library names that the widget depends on, with markdown links to the home pages of the libraries
  • ghuser: the github user/org where the github repository for the widget resides (required)
  • ghrepo: the github repository name where the widget resides (required)
  • stars: the number of github stars (need to move away from hard coding this)
  • tags: comma separated list (with no spaces) of tags that describe the widget
  • cran_url: if on cran, the cran url
  • release: this came from the previous yaml - stable/alpha
  • examples: url or list of urls of examples (blog posts, gists, vignettes)
  • ghauthor: the github handle for the primary author of the widget
  • short: a short (preferably one sentence) description of the package that will be displayed in limited space under the widget thumbnail in the gallery - ideally should be more than "An htmlwidget interface to library x" as that is obvious from jslib, etc. - instead, should describe what you can do with the widget using library x
  • description: a longer form description

What else or is some of this unneccessary? JS lib authors? Author's full name?

hafen commented

Oh and I should mention that the purpose of this particular site is a gallery of htmlwidget packages, for widget discovery, etc. Perhaps gallery isn't the right term. The idea of a gallery of plots, etc. created with htmlwidgets, that's a cool idea too, but not what I'm doing here.

Fantastic!!!!!

Some quick feedback:

  1. I think we should make the thumbnail itself be a click target to
    navigate to the web page for the widget (the link we have now is too
    subtle) . The extra metadata could be in a "More info" or other chevron
    style link in the footer.

  2. The biggest concern I have about an unfiltered list of every widget
    we've ever heard of is that many, many widgets represent a few days work
    and then are abandoned. We need to find a way to distinguish between
    "production" widgets and ones "in the lab".

Some suggested attributes to look for to promote widgets to higher
prominence: on CRAN, actively maintained (recent commits, resolved issues,
etc.), pass R CMD check --as-cran, have a website documenting their use,
handle dynamic sizing correctly, etc.).

We could probably get away with just two categories ("labs" being for less
complete work) whereby a few of us informally agree that if 3 of us give
the thumbs up something can go into the "production" bin.

On Sat, Jul 11, 2015 at 1:29 AM, hafen notifications@github.com wrote:

Oh and I should mention that the purpose of this particular site is a
gallery of htmlwidget packages, for widget discovery, etc. Perhaps gallery
isn't the right term. The idea of a gallery of plots, etc. created with
htmlwidgets, that's a cool idea too, but not what I'm doing here.


Reply to this email directly or view it on GitHub
#1 (comment)
.

@hafen Awesome!

Some comments:

  • Show summary like how many total widgets there are.
  • The production/lab comment from @jjallaire is totally necessary that was the idea behind stable/alpha. I think users should be able to filter by this meta info as well in the top search/filters. This "status" should be shown in the box for each widget.
  • It would be great to add multiple thumbnails to a widget. Some widgets simply do one thing and others do might do a lot of different graphs or stuff. So looking at different thumbnails might give a broad view on the capabilities of the widget without having to go deep into each js lib documentation.

Great work, happy to be a curator/maintainer

Perhaps,this registry/gallery for biojs http://biojs.io/ will give us some ideas. @hafen, I believe it automatically pulls in forks/stars from Github.

This is looking great @hafen ! I would think about going beyond the gallery by adding the following components:

  1. A gallery of plots created using htmlwidgets with reproducible code.
  2. A blog that allows widget authors to write about their widgets.
  3. A google spreadsheet or similar mechanism where authors can report widgets they are working on.

I have some suggestions on the gallery itself and will post them shortly.

hafen commented

Thanks for the feedback. @jjallaire @jpmarindiaz, good idea about the production / lab categorization. I didn't realize that was the purpose of "status" so that makes sense now. Instead of status: stable/alpha, it should probably be production: true/false or incubator: true/false. The initial view of the page would omit the non-production widgets.

Metrics for giving prominence are great. I wonder if to capture all of these in an automated way we will have to move to something more customized than jekyll updating the page for us with each PR.

@jjallaire, I wonder if we could come up with a quantitative approach for what is required for moving to the production bin - if we can agree on some rules and spell it out, we can avoid having to deal with people thinking we are playing favorites or something.

@hafen I think being on CRAN and having a well documented website should be enough to start with. The simpler we keep the tagging guidelines, the easier it is for us to avoid controversies.

I agree. On CRAN + dedicated website with enough simple examples to get
users started should be the criteria.

On Mon, Jul 13, 2015 at 10:57 AM, Ramnath Vaidyanathan <
notifications@github.com> wrote:

@hafen https://github.com/hafen I think being on CRAN and having a well
documented website should be enough to start with. The simpler we keep the
tagging guidelines, the easier it is for us to avoid controversies.


Reply to this email directly or view it on GitHub
#1 (comment)
.

hafen commented

@ramnathv, adding a gallery of reproducible plots would be a great complementary addition - and would be a good piece of data to display as a link in the widget registry (how many examples the widget has and a link to them in the gallery). This might address @jpmarindiaz's issue with one thumbnail not being sufficient.

To do this, something along the lines of bl.ocks.org would be good. I recall @timelyportfolio has done some stuff with this with htmlwidgets before. We might even be able to use bl.ocks.org itself. In general, publishing a gist of R code along with all the html/js necessary to make the plot would be sufficient. An additional htmlwidgets R function (probably in a different package) to publish to such a gallery would make this very easy and make it easier for the gallery to be quickly populated.

By the way, putting something like this together would probably require an offline build system / server for the gallery.

hafen commented

@ramnathv @jjallaire how many htmlwidgets are currently on CRAN? I'd assume it's not a large number.

@hafen Looks like 12 or 13 (reverse imports/suggests): http://cran.r-project.org/web/packages/htmlwidgets/index.html

Also, I think it would be nice to use some of the htmlwidgets to draw networks of htmlwidgets using meta-information such as JavaScript dependencies, co-authorship, tags that @hafen has included.

@hafen perhaps you are thinking about loryR for multi-image carousels or something similar as the htmwidget thumbnail.

Selfishly I'm not crazy about the CRAN-requirement, but I can't think of any other ways to filter/qualify, I currently view all htmlwidgets as experimental and would say few are production quality in the traditional sense (hopefully I'm not offending anyone).

@hafen I had put together a bl.ocks equivalent for rCharts along with a custom publishing function that used a git backend. I could revive that in the htmlwidgets context. It it important to have a custom viewer, since our focus is the R code and not the index.html. I will send a PR to htmlwidgets that wraps up this feature.

I think the CRAN requirement is a helpful filter because it indicates
commitment (of both time and ongoing support/enhancement) and quality (all
the things required to pass R CMD check). We don't have anything else
nearly as objective and cut and dried and clearly we need some criteria to
avoid htmlwidgets gaining the reputation of being incomplete, buggy, etc.

On Mon, Jul 13, 2015 at 4:31 PM, timelyportfolio notifications@github.com
wrote:

Also, I think it would be nice to use some of the htmlwidgets to draw
networks of htmlwidgets using meta-information such as JavaScript
dependencies, co-authorship, tags.

@jcheng5 https://github.com/jcheng5 perhaps you are thinking about loryR
http://www.buildingwidgets.com/blog/2015/5/14/week-19-loryr-slider for
multi-image carousels or something similar as the htmwidget thumbnail.

Selfishly I'm not crazy about the CRAN-requirement, but I can't think of
any other ways to filter/qualify, I currently view all htmlwidgets as
experimental and would say few are production quality in the traditional
sense (hopefully I'm not offending anyone).


Reply to this email directly or view it on GitHub
#1 (comment)
.

I would agree with @jjallaire. While all htmlwidgets are at some level experimental, passing R CMD CHECK ensures that the package author has spent time taking care of some basic stuff at the very least.

Yes, R CMD CHECK helps filter out some junk, but it is fairly easy to pass R CMD CHECK and still be junk on the JavaScript side. Very little of the code/effort in many of my htmlwidgets is R. Perhaps, we can come up with some checklist such as this Wiki A Good htmlwidget that could help insure quality.

I see your point @klr. R CMD CHECK only ensures that the R code works and has been documented appropriately. It is very well possible for the JS code to be buggy and I think we should be careful on that front.

One way to circumvent this issue is to only tag the gallery with facts like On CRAN, 30+ Stars etc. This way, we will let the end-user be the judge of what widgets they way to use. While we could come up with a complicated checklist of what the widget should satisfy, I think it is very hard for us to systematically and objectively evaluate each widget against it.

Hence, I suggest that we restrict gallery tags to fact based ones and let the user be the judge. In my mind the gallery is mainly for widget discovery and shared code and NOT an endorsement of any of the widgets.

Much prefer @ramnathv suggestion of facts-based system On Cran rather than arbitrary potentially opinionated labels, such as production, experimental, dead. I believe much of the quality, attentiveness of the developer, functionality, etc. will shine through rather quickly in the gallery.

hafen commented

Fact based sounds good. But then perhaps to avoid htmlwidgets getting a reputation of being sloppy, there could be a default filtering when the page is viewed that shows the widgets with the "best" facts. An example of this is what I did with sorting by github stars by default. This is something I did not want to do, but otherwise some of the less complete widgets showed up at the top which isn't good.

If we go fact based, we could still do QC on the end of whether a package even makes it into the registry. I think at a minimum passing R CMD CHECK should be a requirement there (to make sure documentation is there, etc.).

Based on this discussion I think we should do the following:

  1. Two categories, CRAN and "Under Development" (or whatever other term
    seems appropriate)

  2. Order both categories by GitHub stars.

  3. Whomever is running the registry reserves the right to exclude a widget
    from either category if it's really shoddy. Obviously this would only occur
    for situations of really poor quality widgets (whether on CRAN or not).

#3 might be controversial, but is really implicit in any public list of
widgets (whoever manages the list can include/exclude whatever they wish).
As long as we explain to a widget author why they are excluded I think it
can still be a process that is fair to all.

On Mon, Jul 13, 2015 at 5:25 PM, hafen notifications@github.com wrote:

Fact based sounds good. But then perhaps to avoid htmlwidgets getting a
reputation of being sloppy, there could be a default filtering when the
page is viewed that shows the widgets with the "best" facts. An example of
this is what I did with sorting by github stars by default. This is
something I did not want to do, but otherwise some of the less complete
widgets showed up at the top which isn't good.

If we go fact based, we could still do QC on the end of whether a package
even makes it into the registry. I think at a minimum passing R CMD CHECK
should be a requirement there (to make sure documentation is there, etc.).


Reply to this email directly or view it on GitHub
#1 (comment)
.

hafen commented

@ramnathv, your custom bl.ocks.org feature would be awesome. If you could point me to the branch for this feature, that would be great. To serve these, do you need a custom web server? Just thinking about how to integrate it with the widget gallery.

hafen commented

@jjallaire I just pushed a few changes that add a "CRAN only" switch that is turned on by default when the page is loaded, and also widgets are sorted by github stars by default as well. That should cover (1) and (2). For (3), it would be great if someone could take a pass right now and propose widgets that should not be there right now.

Also, I know I am missing several widgets. For example, I found pairsD3 and visNetwork on CRAN that aren't in _config.yml. Could someone please go through and add any widgets you think should be there?

I also updated the thumbnail and widget name links to point to the widget home page instead of opening up the detail. Now only clicking on the 3 vertical dots activates the detail view.

This is looking fantastic!

I don't have directly experience with rhandsontable or svgPanZoom. Does
anyone else?

On Thu, Jul 16, 2015 at 3:13 PM, hafen notifications@github.com wrote:

@jjallaire https://github.com/jjallaire I just pushed a few changes
that add a "CRAN only" switch that is turned on by default when the page is
loaded, and also widgets are sorted by github stars by default as well.
That should cover (1) and (2). For (3), it would be great if someone could
take a pass right now and propose widgets that should not be there right
now.

Also, I know I am missing several widgets. For example, I found pairsD3
and visNetwork on CRAN that aren't in _config.yml. Could someone please
go through and add any widgets you think should be there?

I also updated the thumbnail and widget name links to point to the widget
home page instead of opening up the detail. Now only clicking on the 3
vertical dots activates the detail view.


Reply to this email directly or view it on GitHub
#1 (comment)
.

Yes, @hafen this is really nice. I'll volunteer to add a couple that I know are missing.

@jjallaire I have experience with both.

  • svgPanZoom is mine and was accepted to CRAN today. I'll let y'all decide on its merit. Here is the post. For those wondering, I promise not to flood CRAN with all my widgets of varying, but certainly dubious quality. My policy is only if requested. This one svgPanZoom is being considered for use in tmap.
  • rhandsontable is in my belief certainly good enough to be on the list. It is very active, and I know there are more than 3 using it in various Shiny projects

Also, what are thoughts on releasing/publicizing this? I would love for it to get all the attention it deserves plus more.

hafen commented

I'd love to release as soon as possible, but first do the following:

  • find a home for it (what is a good domain that everyone likes, etc.)
  • get feedback on how each individual widget is displayed - anything to tweak here?
  • finalize widget meta data - is it good as it stands?
  • figure out how to get github stars dynamically
  • make sure it's okay on mobile

Would it not just go on http://htmlwidgets.org?

hafen commented

I'm all for putting it on htmlwidgets.org. Is that served with github pages? Or would you just do some DNS stuff to point htmlwidgets.org/gallery (or should we be using registry?). Some style work with the header would have to be done to match. Wherever it goes, I'm completely happy with the idea of it being owned by someone else (e.g. the htmlwidgets team) and I would just contribute with PRs.

hafen commented

@timelyportfolio, thanks for the link on getting github metadata. I tried a JS approach and unfortunately those count as unauthenticated api requests from the client side and you get rate limited before the page finishes loading.

About the only other approach I can think of other than moving to a dedicated non-gh-pages server is to set up some REST endpoint on something cheap like digitalocean that periodically pulls the _config.yml, iterates through the packages, grabs the github meta data using a github auth token to avoid rate limiting, and makes these available via REST calls from the page. Then we can get stuff like # issues, # closed issues, etc. This can all be put together pretty easily with R. There's got to be a more simple way though. But I'd really like to have this meta data.

Perhaps @gaborcsardi has some ideas or experience here with his work on metacran.

@hafen AFAIK there is no trivial way to use the github api with authentication on the client side without exposing the secret and key. One solution is to periodically grab the metadata and cache it as json on the github pages site. This way, you can use it as a fallback when the API limit gets hit. This should give a near-real-time experience. I would be happy to prototype what I am talking about and push it. Let me know.

hafen commented

@ramnathv that's a great idea. I had started down this path, working on an R script to grab the meta data, but stopped at the first problem. I just committed a script I was working on to populate this: https://github.com/hafen/htmlwidgetsgallery/blob/gh-pages/scripts/github_meta.R. If you could prototype your idea that would be awesome.

Yes, htmlwidgets.org is indeed served with github pages. We could also
serve it on any server/backend we like and use gallery.htmlwidgets.org
(that's what we do with the Rcpp Gallery).

On Fri, Jul 17, 2015 at 12:30 AM, hafen notifications@github.com wrote:

I'm all for putting it on htmlwidgets.org. Is that served with github
pages? Or would you just do some DNS stuff to point
htmlwidgets.org/gallery (or should we be using registry?). Some style
work with the header would have to be done to match. Wherever it goes, I'm
completely happy with the idea of it being owned by someone else (e.g. the
htmlwidgets team) and I would just contribute with PRs.


Reply to this email directly or view it on GitHub
#1 (comment)
.

Wow, this looks great!

As for the GitHub API issue, I don't have much to add. Essentially you either do it "offline", independently of the client JS, and regularly push it to GH, or you can set up a simple proxy on digitalocean for $5 per month. The proxy is super simple, it would probably take me an hour to set up and write in node, and I am not a very proficient node programmer. :) You could cache things in redis, so that it is smoother, and you can make requests in parallel, at least from node, but probably also from the browser.

A slightly different approach is to make a server that holds the metadata in some DB, and periodically updates it. This is almost the same, but you also need some proper DB. The gain is that you never need to go to GH (slow), only to your server (fast).

Of course with the DO server there is a small maintenance cost, and maybe something like Heroku or Redhat Openshift, or any other PaaS is better. OpenShift is free for the first three small machines. (I am not affiliated with them in any way.)

Let me know if you need help with this.

Btw. if you want to go the "offline updater" way, RedHat Openshift is also excellent for that. They have a simple CRON service, and you can run a script every minute. That's how MetaCran's crandb is updated.

hafen commented

@gaborcsardi thanks for the pointers. I had looked at digitalocean and heroku looking for something completely free, so I'll check out OpenShift for sure. Does your cron job push to a github repo or expose a database? I was thinking that in the bigger scheme of things a broader meta data repository for all github R packages would be something we could make use of here.

@ramnathv along these lines, I've pushed an R script that gets the meta data and a github_meta.json file. So we just need to update the page to read this file and populate the appropriate field. Let me know if you'd like to do that and if you can do it soon, otherwise I'll take a stab at it and may ask you to check my code. I think for now the plan will be for me to set up an automated periodic update of the json file.

@hafen This is looking good. I will be able to get to this only next week. So if you want it done before that, go ahead and take a crack and I can add any comments/feedback. If next weeks is good, let me know and I will do the needful.

I also think it makes sense to explore the route that @gaborcsardi is proposing since it is more automated and avoids the need to run any scripts locally.

hafen commented

@ramnathv thanks - I actually just updated it - was easier than I thought 3f9bb24. The idea is not to run the script locally in an ad hoc manner but to set up a cron job on a server that runs the R script every hour or so and pushes the updated json. Does that sound reasonable? That's the only part remaining to do.

Yes. That sounds reasonable @hafen.

@hafen My cron job pushes to a DB. You can set secret environment variable on openshift for the DB password. It also has a nice heroku-like git-based workflow. Here is a script: https://github.com/metacran/cron/blob/master/.openshift/cron/minutely/update-crandb.r

hafen commented

Cool! Thanks @gaborcsardi. What does your javascript look like where you are pulling from couchdb? Are you doing this in node or in the browser?

@hafen It is node. E.g. https://github.com/metacran/metacranweb/blob/master/lib/recent.js#L9 (Somewhat complicated by the caching I do in Redis.)

But couchdb has an HTTP API, so you can use it from any language without a client lib.

I should warn you that it is also extremely simple, queries are one round of map-reduce, and this sucks. If you want a proper DB, Mongo is a better choice probably. If you just want to use it as a "cache", that is fine.

hafen commented

Thanks @gaborcsardi.

Since at this point it's a simple problem and a small amount of data, I decided to take the approach of a CRON job running a script to update a json file in the repo every hour or so.

Thanks so much @hafen. Now that you have collected all of this we can use htmlwidgets to analyze it :)

http://bl.ocks.org/timelyportfolio/e591b7c5360633e136d7

hafen commented

@timelyportfolio - that's awesome!

I looked at the stars increase across all packages over the weekend after it kind of got out on twitter and it looks like a lot of stars were added in the course of a day or two. So it looks like it's already serving its purpose (assuming the increase was due to gallery views). Star increases were mainly for CRAN packages, probably due to the bias of only showing them on load.

hafen commented

As far as "release" is concerned, even though it's already out, I suppose the outstanding issues from what I listed before are:

  • find a home for it
  • get feedback on how each individual widget is displayed - anything to tweak here?

For the second one, for now I think I'll leave it as is and remove the vertical dots that show the blank card where more meta data is supposed to be. Can revisit that later. But I'm happy to accept thoughts or PRs that deal with this issue.

For the first one, if people are happy with it on something like gallery.htmlwidgets.org, let me know what's needed to get the blessing to go forward with that. Probably good at a minimum for others to go through the yml and decide whether everything there is worthy of being there.

hafen commented

@timelyportfolio by the way, I like your use of bl.ocks.org - how did you get only the R script to show up? Since the infrastructure is already there, perhaps we should just use this for adding a plot gallery to the page. I suppose the only issue with this is I don't think bl.ocks.org can selectively show certain gists - it's all or nothing, right?

To show R code in bl.ocks, one will have to make it a part of the README.md. Note that the R code does not get syntax highlighted. I implemented a viewer for rCharts, which you can see here

http://rcharts.io/viewer

It includes syntax highlighting of R code along with a setup for disqus comments.

To get gallery.htmlwidgets.org up and running I think a few things are
required:

  1. You need to add the appropriate CNAME file to your gh-pages repo

  2. I need to point the DNS entry of gallery.htmlwidgets.org to the
    appropriate github address

In case you haven't done this before here are the details:
https://help.github.com/articles/setting-up-a-custom-domain-with-github-pages/

Once this is up and running I'll also add links from the main
www.htmlwidgets.org pages to the Gallery.

On Sat, Jul 11, 2015 at 9:29 AM, timelyportfolio notifications@github.com
wrote:

Perhaps,this registry/gallery for biojs http://biojs.io/ will give us
some ideas. @hafen https://github.com/hafen, I believe it automatically
pulls in forks/stars from Github.


Reply to this email directly or view it on GitHub
#1 (comment)
.

hafen commented

@jjallaire I got sidetracked from this for a while - I just added the CNAME file.

I think the file you added was CMAKE (it needs to be CNAME).

I added the DNS record so once you add the CNAME file the URL should start working fairly soon.

hafen commented

haha yeah that would be a problem :)

@hafen two other bits of metadata that might be useful in terms of measuring how updated packages are could be "Last activity" and "Open issues". Can you capture these at the same time you get the github stars?

hafen commented

This would be nice to have and I think I could harvest it pretty easily. Last activity is good. However, metrics like open issues could be controversial.

Open issues as a metric to prevent against the following example. I'm looking for help on a htmlwidget. I go to the github page, it says it's adapted from XXX. I go to XXX's page and THEY have stopped supporting that JS lib. At this point, the R library needs a massive overhaul... but nothing is happening. That's why I was thinking open tickets. Why is it controversial? It's just data no?