HARPy

One of my current clients is seeing a bit of wackiness when running scripts against their automation environment; when done manually things load at an acceptable speed, but slooooooow down when run through automation. Its our suspicion that there is something that is being loaded slowly when the WebDriver stack is involved. But how to debug?

Enter the HAR (HTTP Archive) file which gives you the timing information for every request that makes up a page load.

Step 1 – Generate the file

The easiest way to generate a HAR file with WebDriver is to run your session through the BrowserMob Proxy. This is a Python post so we’ll use David’s browsermob-proxy module.

easy_install browsermobproxy

Once the egg is installed you can either start the server manually or in your script. (For the record, I prefer doing it outside of the script and controlling it with something like Puppet.)

from browsermobproxy import Server
server = Server("path/to/browsermob-proxy")
server.start()
 
# do stuff
 
server.stop()

With the server running you have two ways of creating the WebDriver instance.

Local WebDriver

from browsermobproxy import Client
from selenium import webdriver
 
proxy = Client("http://url.to.proxy:port")
profile  = webdriver.FirefoxProfile()
profile.set_proxy(proxy.selenium_proxy())
driver = webdriver.Firefox(firefox_profile=profile)

Remote Remote WebDriver

from browsermobproxy import Client
from selenium.webdriver.common.desired_capabilities import DesiredCapabilities
from selenium import webdriver
 
desired_capabilities = DesiredCapabilities.FIREFOX
 
client = Client("http://url.to.proxy:port")
client.add_to_webdriver_capabilities(desired_capabilities)
 
driver = webdriver.WebDriver(desired_capabilities = desired_capabilities

Now you’re routing all requests through the proxy regardless of the style of WebDriver you are using.

client.new_har("Example")

The new_har method takes an optional name for the page you are about to trap information for. If you don’t give it one it will just be called ‘Page 1’. Then you do stuff. And get the HAR from the proxy.

h = client.har

Step 2 – Interrogate the file

Yes, at this point you could look at the HAR Specification and call things out specifically, but its easier to use HARPy — a library I wrote for HAR files in Python.

easy_install harpy
from harpy.har import Har
 
har = Har(h)

This har object follows the spec pretty closely but allows you to do the interrogations a little cleaner than you have otherwise. For instance checking for 404s

four_oh_fours = [e for e in har.entries if e.response.status == 404]

or for requests that took longer than 3 seconds

unacceptable_duration = [p for p in self.parsed.pages if p.timings[1] > 3000]

Step 3 – Py.Saunter (optional)

Unsurprisingly, Py.Saunter has grown new support for both the BrowserMob Proxy and for HAR files. If you are using Py.Saunter as your runner, this is how to hook everything up as of 0.43.

saunter.ini

[Proxy]
proxy_url: http://url.to.proxy:port
browsermob: true

The proxy client is available in the script as self.client and created for you automatically. Here is a full script which ties creating the HAR file together and making a success decision on the outcome.

@pytest.marks('shallow', 'ebay', 'har')
def test_har_retrieval(self):
    self.client.blacklist("http://www\\.facebook\\.com/.*", 404)
    self.client.blacklist("http://static\\.ak\\.fbcdn\\.com/.*", 404)
    self.client.new_har("shirts")
    s = ShirtPage(self.driver)
    s.go_to_mens_dress_shirts()
    h = self.client.har
    har = Har(h)
    four_oh_fours = [e for e in har.entries if e.response.status == 404]
    assert(len(four_oh_fours) == 1)

And also shows how to remove non-relevant 3rd party crap using the Python client. (Though normally you would likely want to have a 200 be injected as the HTTP Response.)

Doing this sort of integration is at least two years old, but is starting to reach the mainstream blog posts, mailing lists, etc. I suspect we’ll only be seeing some more of it in the next while until it is commonplace in another 18 months or so.

Post a Comment

Your email is never published nor shared. Required fields are marked *