validate() — your HTML

For a couple months now I have been telling people that their Page Objects should have at minimum the three methods; open, wait_until_loaded and validate. The first two are easy, open navigates to the page it represents, and wait_until_loaded synchronizes. But validate was there because of a conversation on irc … that I can’t remember anything about other than I think it happened.

Until yesterday where I didn’t remember the conversation, but came up with a decent idea for what to put in there. And that is, non-functional static checks. Like whether the page is ‘valid’ HTML and whether the elements that should have accessibility attributes do. And a couple hours later I had a bit of an implementation.

At first I tried a couple of existing modules that will interact with the W3C validator but they either didn’t work, did too much for my needs (like parsing xml when json is a output format option) and/or used urllib2 as the means to talking to the server. (I now consider not using requests a code smell.)

Again, this is only a couple hours of thinking and is not perfect. It will need some tweaks, for instance…

  • The W3C provides the validator free of charge, but if you are hitting it 10 000 times a day, that’s kinda a jerk move. Its just an Apache/CGI/Perl application that is Open Source so run a copy on your local network. It might even speed things up since you are not in queue with everyone else.
  • If you chain your Page Object creation methods like this, you will validate the HTML every single time. Which could indeed find problems if there is a lot of things being injected or removed — especially in a CMS context where users are also adding markup to their content. But if you don’t want to do that, maybe some sort of global ‘did I validate this page’ counter? Which itself might get tricky if you are going parallel.
  • The doctype is ‘supposed’ to be discoverable by the validator, but I’d rather be specific about which one I am trying to validate against. You can see the entire like of types available to you by inspecting the type select element on the web interface to the validator but I think the important ones are
    • HTML5
    • XHTML 1.0 Strict
    • XHTML 1.0 Transitional
    • HTML 4.01 Strict
from selenium.webdriver import Firefox
from po import Element34
 
class TestValidation(object):
    def setup_method(self, method):
        self.f = Firefox()
 
    def teardown_method(self, method):
        self.f.quit()
 
    def test_validation(self):
        e = Element34(self.f).open().wait_until_loaded().validate()
import requests
import json
 
class ValidationException(Exception):
    pass
 
class Page(object):
    def _validate_html(self):
        post_data = {
            "fragment": self.driver.page_source,
            "output": "json",
            "doctype": "XHTML 1.0 Transitional"
        }
        r = requests.post('http://validator.w3.org/check', data=post_data)
 
        j = json.loads(r.text)
 
        validation_errors = []
        if r.headers['x-w3c-validator-errors'] != 0:
            for m in j['messages']:
                if m['type'] == 'error':
                    validation_errors.append(m)
 
        validation_warnings = []
        if r.headers['x-w3c-validator-warnings'] != 0:
            for m in j['messages']:
                if m['type'] == 'info':
                    validation_warnings.append(m)
 
        if len(validation_errors) != 0 or len(validation_warnings) != 0:
            raise ValidationException('There were %d validation errors and %d validation warnings' % (len(validation_errors), len(validation_warnings)))
 
class Element34(Page):
    def __init__(self, driver):
        self.driver = driver
 
    def open(self):
        self.driver.get('http://element34.ca')
        return self
 
    def wait_until_loaded(self):
        return self
 
    def validate(self):
        self._validate_html()
        return self
puppet:unicorn adam$ py.test validation_test.py -s
======================================== test session starts ========================================
platform darwin -- Python 2.7.2 -- pytest-2.3.2
plugins: marks, xdist
collected 1 items 
 
validation_test.py F
 
============================================= FAILURES ==============================================
__________________________________ TestValidation.test_validation ___________________________________
 
self = <validation_test.TestValidation object at 0x101a82310>
 
    def test_validation(self):
>       e = Element34(self.f).open().wait_until_loaded().validate()
 
validation_test.py:12: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
 
self = <po.Element34 object at 0x1015cd410>
 
    def validate(self):
>       self._validate_html()
 
po.py:45: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
 
self = <po.Element34 object at 0x1015cd410>
 
    def _validate_html(self):
        post_data = {
            "fragment": self.driver.page_source,
            "output": "json",
            "doctype": "XHTML 1.0 Transitional"
        }
        r = requests.post('http://validator.w3.org/check', data=post_data)
 
        j = json.loads(r.text)
 
        validation_errors = []
        if r.headers['x-w3c-validator-errors'] != 0:
            for m in j['messages']:
                if m['type'] == 'error':
                    validation_errors.append(m)
 
        validation_warnings = []
        if r.headers['x-w3c-validator-warnings'] != 0:
            for m in j['messages']:
                if m['type'] == 'info':
                    validation_warnings.append(m)
 
        if len(validation_errors) != 0 or len(validation_warnings) != 0:
>           raise ValidationException('There were %d validation errors and %d validation warnings' % (len(validation_errors), len(validation_warnings)))
E           ValidationException: There were 4 validation errors and 1 validation warnings
 
po.py:31: ValidationException
===================================== 1 failed in 20.47 seconds =====================================

Post a Comment

Your email is never published nor shared. Required fields are marked *