GRobot

GRobot is still under development and I love to refactor.So,control your risks.

GRobot is a powerful web robot based on gevent. This project comes from Ghost.py,which I have rewrote most of the code inside, and changed its name to GRobot.

import gevent
from grobot import GRobot

def test():
    robot = GRobot()
    robot.open("http://www.yahoo.com")
    assert 'yahoo' in robot.content

gevent.spawn(test).join()

What it can and can't do

##Can do##

Can set up the socks5/http{s} proxy.
Can simulate ALL the operations of hunman beings.
Can run webkit plugin.
Can evaluate javascript.
Can grab the web page as image.
Can run on a GUI-Less server(by install xvfb).

##Can't do##

Can't operate Flash.
Can't run without PyQt and gevent
Can't get back http status code.
Can't transport human to the Mars.

Installation

First you need to install PyQt.

In Ubuntu

sudo apt-get install python-qt4

Install gevent from the development version.

pip install http://gevent.googlecode.com/files/gevent-1.0b3.tar.gz#egg=gevent

Install Flask for unittest(optional).

pip install Flask

Install GRobot using pip.

pip install git+https://github.com/DYFeng/GRobot.git

How to use

##Quick start##

First of all, you need a instance of GRobot in greenlet:

import gevent
from grobot import GRobot

def test():
    robot = GRobot()
    #do something

gevent.spawn(test).join()

##Element selector##

Element selector tell GRobot which HTML element a command refers to. The format of a selector is:

selectorType=argument

We support the following strategies for locating elements:

identifier = id : Select the element with the specified @id attribute. If no match is found, select the first element whose @name attribute is id. (This is normally the default; see below.)
id = id : Select the element with the specified @id attribute.
name = name : Select the first element with the specified @name attribute.

xpath = xpathExpression : Locate an element using an XPath expression.

  xpath=//img[@alt='The image alt text']
  xpath=//table[@id='table1']//tr[4]/td[2]
  xpath=//a[contains(@href,'#id1')]
  xpath=//a[contains(@href,'#id1')]/@class
  xpath=(//table[@class='stylee'])//th[text()='theHeaderText']/../td
  xpath=//input[@name='name2' and @value='yes']
  xpath=//*[text()="right"]

link = textPattern : Select the link (anchor) element which contains text matching the specified pattern.
```
  link=The link text
```
css = cssSelectorSyntax : Select the element using css selectors. Please refer to CSS2 selectors for more information.
```
  css=a[href="#id3"]
  css=span#firstChild + span
```

##Option Selector##

When you using GRobot.select to deal with select tag,option selector tell GRobot which option you want. The format of a selector is:

selectorType=argument

We support the following strategies for locating elements:

text = text : Match options based on their the visible text.(This is normally the default.)
id = id : Match options based on their @id attribute.
name = name : Match options based on their @name attribute.
value = value : Match options based on their values.

##Open a web page##

GRobot provide a method that open web page the following way:

robot.open('http://my.web.page')

##Execute javascript##

Executing javascripts inside webkit frame is one of the most interesting features provided by GRobot:

result = robot.evaluate( "document.getElementById('my-input').getAttribute('value');")

As many other GRobot methods, you can pass an extra parameter that tells GRobot you expect a page loading:

robot.evaluate( "document.getElementById('link').click();", expect_loading=True)

###Type the text### Simulating human typing.

robot.type('id=blog_content',u'Hello,world.I'm Tom.\n你好,世界.我是无名氏\n')

Type key by key,only ASCII character allowed.

robot.key_clicks('id=blog_content','Hello,world.\rToday is good for sleep.')

###Select options###

Select options those match selector from select tag.

Single selectbox.

robot.select('name=sex','text=Male')

Multiple selectbox.

robot.select('id=like',[
    ('apple',True),
    ('orange',False),
    ('banana',True),
])

###Set the checkbox###

You can specify the state of checkbox.

Check it.

robot.check('id=agree')

Uncheck it.

robot.check('id=agree',False)

###Click something###

Click the first element which selected by selector.

robot.click("xpath=//input[@type='submit']")

Click a point of the absolute position (1500,36).

robot.click_at(1500,36)

###Move the mouse###

Move your mouse to first element which selected by selector.

robot.move_to('css=#button')

Move your mouse to the absolute position (500,300).

robot.move_at(500,300)

###Setup input file###

Selenium can't access the <input type='file'/> tag.You can't use selenium to set up a file type input.

robot.choose_file('id=file-upload', '/tmp/file')

##Waiters##

GRobot provides several methods for waiting for specific things before the script continue execution:

###wait_for_page_loaded()###

That wait until a new page is loaded.

robot.wait_for_page_loaded()

###wait_for_selector(selector)###

That wait until a element match the given css selector.

result = robot.wait_for_selector("ul.results")

###wait_for_text(text)###

That wait until the given text exists inside the frame.

result = robot.wait_for_text("My result")

Sample use case

###Post a twitter###

import gevent
import logging
from grobot import GRobot

USERNAME = 'your twitter username'
PASSWORD = 'your twitter password'

def main():

    robot = GRobot(display=True, log_level=logging.DEBUG, develop=False)
    robot.set_proxy('socks5://127.0.0.1:7070')
    robot.open('https://twitter.com')

    #Login
    robot.key_clicks('id=signin-email',USERNAME)
    robot.key_clicks('id=signin-password',PASSWORD)

    robot.click("xpath=//td/button[contains(text(),'Sign in')]",expect_loading=True)

    #Post a twitter
    robot.key_clicks("id=tweet-box-mini-home-profile","GRobot is too powerful.https://github.com/DYFeng/GRobot")

    #Wait for post success
    while 1:
        robot.click("xpath=//div[@class='module mini-profile']//button[text()='Tweet']")
        try:
            robot.wait_for_text('Your Tweet was posted')
            break
        except :
            #Something go wrong,refresh page.
            if 'refresh the page' in robot.content:
                robot.reload()

    # Wait forever.
    robot.wait_forever()

if __name__ == '__main__':
    gevent.spawn(main).join()

###Browsing Google.com and find GRobot project###

import gevent
import logging
from grobot import GRobot

def main():

    # Show the browser window.Open the webkit inspector.
    robot = GRobot(display=True, develop=False, log_level=logging.DEBUG, loading_timeout=10, operate_timeout=10)

    # In China,people can only using proxy to access google.
    robot.set_proxy('socks5://127.0.0.1:7070')

    #Open google
    robot.open('http://www.google.com/')

    #Type out project and search.
    robot.type('name=q','GRobot github')
    robot.click('name=btnK', expect_loading=True)

    for i in xrange(1, 10):
        # Waiting for the ajax page loading.

        robot.wait_for_xpath("//tr/td[@class='cur' and text()='%s']" % i)

        if u'https://github.com/DYFeng/GRobot' in robot.content:
            print 'The porject in page', i
            break

        # Click the Next link.We don't use expect_loading.Because it's ajax loading,not page loading.
        robot.click("xpath=//span[text()='Next']")

    else:
        print "Can not found.Make a promotion for it."

    # Wait forever.
    robot.wait_forever()

if __name__ == '__main__':
    gevent.spawn(main).join()

sky395/GRobot

GRobot

What it can and can't do

Installation

How to use

Sample use case