One of my current clients is seeing a bit of wackiness when running scripts against their automation environment; when done manually things load at an acceptable speed, but slooooooow down when run through automation. Its our suspicion that there is something that is being loaded slowly when the WebDriver stack is involved. But how to debug?
Enter the HAR (HTTP Archive) file which gives you the timing information for every request that makes up a page load.
Step 1 – Generate the file
Once the egg is installed you can either start the server manually or in your script. (For the record, I prefer doing it outside of the script and controlling it with something like Puppet.)
from browsermobproxy import Server server = Server("path/to/browsermob-proxy") server.start() # do stuff server.stop()
With the server running you have two ways of creating the WebDriver instance.
from browsermobproxy import Client from selenium import webdriver proxy = Client("http://url.to.proxy:port") profile = webdriver.FirefoxProfile() profile.set_proxy(proxy.selenium_proxy()) driver = webdriver.Firefox(firefox_profile=profile)
Remote Remote WebDriver
from browsermobproxy import Client from selenium.webdriver.common.desired_capabilities import DesiredCapabilities from selenium import webdriver desired_capabilities = DesiredCapabilities.FIREFOX client = Client("http://url.to.proxy:port") client.add_to_webdriver_capabilities(desired_capabilities) driver = webdriver.WebDriver(desired_capabilities = desired_capabilities
Now you’re routing all requests through the proxy regardless of the style of WebDriver you are using.
The new_har method takes an optional name for the page you are about to trap information for. If you don’t give it one it will just be called ‘Page 1’. Then you do stuff. And get the HAR from the proxy.
h = client.har
Step 2 – Interrogate the file
Yes, at this point you could look at the HAR Specification and call things out specifically, but its easier to use HARPy — a library I wrote for HAR files in Python.
from harpy.har import Har har = Har(h)
This har object follows the spec pretty closely but allows you to do the interrogations a little cleaner than you have otherwise. For instance checking for 404s
four_oh_fours = [e for e in har.entries if e.response.status == 404]
or for requests that took longer than 3 seconds
unacceptable_duration = [p for p in self.parsed.pages if p.timings > 3000]
Step 3 – Py.Saunter (optional)
Unsurprisingly, Py.Saunter has grown new support for both the BrowserMob Proxy and for HAR files. If you are using Py.Saunter as your runner, this is how to hook everything up as of 0.43.
[Proxy] proxy_url: http://url.to.proxy:port browsermob: true
The proxy client is available in the script as self.client and created for you automatically. Here is a full script which ties creating the HAR file together and making a success decision on the outcome.
@pytest.marks('shallow', 'ebay', 'har') def test_har_retrieval(self): self.client.blacklist("http://www\\.facebook\\.com/.*", 404) self.client.blacklist("http://static\\.ak\\.fbcdn\\.com/.*", 404) self.client.new_har("shirts") s = ShirtPage(self.driver) s.go_to_mens_dress_shirts() h = self.client.har har = Har(h) four_oh_fours = [e for e in har.entries if e.response.status == 404] assert(len(four_oh_fours) == 1)
And also shows how to remove non-relevant 3rd party crap using the Python client. (Though normally you would likely want to have a 200 be injected as the HTTP Response.)
Doing this sort of integration is at least two years old, but is starting to reach the mainstream blog posts, mailing lists, etc. I suspect we’ll only be seeing some more of it in the next while until it is commonplace in another 18 months or so.