#======= # __ __ # ____ ____ _____ ____ ____ _/ /_ __/ /_ # / __ \/ __ `/ __ `/ _ \/ __ `/ / / / / __/ # / /_/ / /_/ / /_/ / __/ /_/ / / /_/ / /_ # / .___/\__,_/\__, /\___/\__, /_/\__,_/\__/ #/_/ /____/ /____/ # # This file is part of Page Glut. # # Page Glut is free software: you can redistribute it and/or modify # it under the terms of the GNU General Public License as published by # the Free Software Foundation, either version 3 of the License, or # (at your option) any later version. # # Page Glut is distributed in the hope that it will be useful, # but WITHOUT ANY WARRANTY; without even the implied warranty of # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the # GNU General Public License for more details. # # You should have received a copy of the GNU General Public License # along with Page Glut. If not, see <http://www.gnu.org/licenses/gpl-3.0.txt>. # # name: README # date: 2018DEC31 # 2016DEC23 # prog: pr # desc: Page Gluts: read docs/ABOUT.txt #====== I wrote this quick hack to work out how heavy a page was. It works as shown below. Only two options, the -v option (verbosity) tells you everything that is goint on, the -u option (URL) lets you examine the index page found at this URL. Requirements: ============= Python3 is used here, though I use nothing (that I'm aware of) that could not be used in py2X. # ------- start requirements ------------------------------------------------------------------- import bs4 # source: https://www.crummy.com/software/BeautifulSoup # docs: http://beautiful-soup-4.readthedocs.io import requests # source: https://github.com/kennethreitz/requests # docs: http://docs.python-requests.org/en/latest/ import html5lib # source: https://github.com/html5lib/html5lib-python # docs: https://html5lib.readthedocs.io/en/latest/ import validators # source: https://github.com/kvesteri/validators # docs: https://validators.readthedocs.io/en/latest/ # ------- end requirements -------------------------------------------------------------------- Example use: ============ $ python3.5 pageglut.py -v -u http://news.ycombinator.com page resource total url (b) (b) (Kb) ---------------------------------------------------------- request <http://news.ycombinator.com> url ok status 200 get img extract ex 'image' 'img' 'src' warning: problem url, needs fixing <y18.gif> warning: problem url, needs fixing <s.gif> extract image <['y18.gif', 's.gif']> get css extract ex 'css' 'link' 'href' warning: problem url, needs fixing <news.css?AMTnWwphWLGKuspwrcEL> warning: problem url, needs fixing <favicon.ico> warning: problem url, needs fixing <rss> extract css <['news.css?AMTnWwphWLGKuspwrcEL', 'favicon.ico', 'rss']> get js extract ex 'javascript' 'script' 'src' warning: problem url, needs fixing <hn.js?AMTnWwphWLGKuspwrcEL> extract javascript <['hn.js?AMTnWwphWLGKuspwrcEL']> url <y18.gif> warning: fixed url <http://news.ycombinator.com/y18.gif> status 200 warning: content type not text size 100 url <s.gif> warning: fixed url <http://news.ycombinator.com/s.gif> status 200 warning: content type not text size 43 url <news.css?AMTnWwphWLGKuspwrcEL> warning: fixed url <http://news.ycombinator.com/news.css?AMTnWwphWLGKuspwrcEL> status 200 warning: content type not text warning: header.'content-length' no content size found url <favicon.ico> warning: fixed url <http://news.ycombinator.com/favicon.ico> status 200 warning: content type not text warning: header.'content-length' no content size found url <rss> warning: fixed url <http://news.ycombinator.com/rss> status 200 warning: content type not text warning: header.'content-length' no content size found url <hn.js?AMTnWwphWLGKuspwrcEL> warning: fixed url <http://news.ycombinator.com/hn.js?AMTnWwphWLGKuspwrcEL> status 200 warning: content type not text warning: header.'content-length' no content size found 34184 143 34.327 <http://news.ycombinator.com>... ----------------------------------------------------------- # vim: ff=unix:ts=4:sw=4:tw=78:noai:expandtab
peterrenshaw/pageglut
I wrote this quick hack to work out how heavy a page is. It works as shown below. Only two options, the -v option (verbosity) tells you everything, the -u option (URL) lets you examine the index page found at this URL.
Python