get_cell_at(x, y)

I kinda think that developers in the nebulous past are trying to troll me with code that is later expected to be automated. (I know this is not really the case, and that automatable-code techniques are only starting to become mainstream-ish but really folks…) A recent client does all sorts of crazy price modelling and displays that information in a table — which makes sense since it is tabular data.

Unfortunately…

  • There are no [visible] column headings since they are in a different table which allows for scrolling of large tables whilst having the headings remain on the screen
  • There are no id’s or class attributes that one can glom onto
  • The dimensions of the table changes based on a number of factors
  • The data cells are actually tables themselves!
  • The content of the cells changes based on a number of factors

If fact, it looks suspiciously like this.

  A B BAD C
Prod Group 117
Monkey
787.159
Cats:


749.675

Items: 1
Revenue:
$14,169


9.7%
Impact: $0

0.0%
Margin:
19.4%


19.4%
Monkey
637.744
Cats:


607.375

Items: 8
Revenue:
$52,295


35.7%
Impact: ($0)

(0.0%)
Margin:
20.1%


20.1%
Monkey
799.219
Cats:


761.161

Items: 2
Revenue:
$13,587


9.3%
Impact: $0

0.0%
Margin:
21.8%


21.8%
Monkey
661.240
Cats:


629.752

Items: 26
Revenue:
$27,772


18.9%
Impact: ($0)

(0.0%)
Margin:
22.7%


22.7%
Prod Group 142
Monkey
0
Cats:


0

Items: 0
Revenue:
$0


0.0%
Impact: $0

0.0%
Margin:
0.0%


0.0%
Monkey
374.310
Cats:


356.486

Items: 7
Revenue:
$26,202


17.9%
Impact: ($0)

(0.0%)
Margin:
19.7%


19.7%
Monkey
0
Cats:


0

Items: 0
Revenue:
$0


0.0%
Impact: $0

0.0%
Margin:
0.0%


0.0%
Monkey
370.155
Cats:


352.529

Items: 14
Revenue:
$8,884


6.1%
Impact: ($0)

(0.0%)
Margin:
20.9%


20.9%
Prod Group 144
Monkey
0
Cats:


0

Items: 0
Revenue:
$0


0.0%
Impact: $0

0.0%
Margin:
0.0%


0.0%
Monkey
0
Cats:


0

Items: 0
Revenue:
$0


0.0%
Impact: $0

0.0%
Margin:
0.0%


0.0%
Monkey
0
Cats:


0

Items: 0
Revenue:
$0


0.0%
Impact: $0

0.0%
Margin:
0.0%


0.0%
Monkey
534.086
Cats:


508.653

Items: 1
Revenue:
$3,739


2.5%
Impact: $0

0.0%
Margin:
20.5%


20.5%

Long story made short, we don’t care what the actual numbers are, just that they are consistent throughout the application. Which of course means we need to be able to get a specific number, from a specific cell. Which of course would be fairly easy-peasy if there were meaningful id’s to hang off of…

Here is the non-annotated version of my get_cell_at(x, y) function.

def get_cell_at(x, y):
    # get the column elements
    column_elements = driver.find_elements_by_css_selector('th')
    # get the text of them, and do some html cleanup
    column_names = [driver.execute_script('return arguments[0].textContent', e).strip() for e in column_elements]
    # find which column is the one we want
    which_column = column_names.index(x)
 
    # get all the rows in the body (we'll use them a couple times)
    rows = driver.find_elements_by_xpath('//table[not(@cellspacing)]/tbody/tr')
 
    row_names = []
    for row in rows:
        name = row.find_element_by_xpath('./td[starts-with(@id, "RevenueLabel")]')
        row_names.append(name.text)
    which_row = row_names.index(y)
 
    cell = row.find_element_by_xpath('//table[not(@cellspacing)]/tbody/tr[%d]/td[not(@style)][%d]' % ((which_row + 1), (which_column + 2)))
 
    # cell contents (there are different 'types' of cell contents)
    data = {}
    things = cell.find_elements_by_xpath('./div/table/tbody/tr')
    for thing in things:
        first_inner_thing = thing.find_element_by_css_selector('td')
        if first_inner_thing.text == "Monkey":
            data["Monkey"] = thing.find_element_by_xpath('./td[2]').text
 
    return data

And now for the annotations.

The first thing we need to do is get the names of the columns. Thankfully they are in a <thead> section in the main table. As mentioned above, the user visible strings are outside of this table to allow for scrolling of the table while keeping the column identifiers visible.

Also, recall how WebDriver doesn’t let you get the text of a non-visible element? Ya … JS Executor to the rescue.

Oh. And just for extra fun, some column headings have embedded <br /> rather than using word wrapping capabilities of CSS.

def get_cell_at(x, y):
    # get the column elements
    column_elements = driver.find_elements_by_css_selector('th')
    # get the text of them, and do some html cleanup
    column_names = [driver.execute_script('return arguments[0].textContent', e).strip() for e in column_elements]
    # find which column is the one we want
    which_column = column_names.index(x)

Now we fetch all the rows in the table body.

    # get all the rows in the body (we'll use them a couple times)
    rows = driver.find_elements_by_xpath('//table[not(@cellspacing)]/tbody/tr')

Finding the row want is very similar to finding the column, but because it is visible we don’t need the JS Executor. The hitch here is that when you are doing an xpath based search using a previously found element you need to start it with “./” rather than “//”. To me this is a bit counter intuitive since I think holding the element should reset where ‘the top’ of the document is for the search, but I can also see how this way is correct.

    row_names = []
    for row in rows:
        name = row.find_element_by_xpath('./td[starts-with(@id, "RevenueLabel")]')
        row_names.append(name.text)
    which_row = row_names.index(y)

This would be so much easier (and less brittle) if there was id’s, but it does get the specific cell we wanted. The which_[column|row] munging happens due to the difference between python’s counting at 0 and xpath’s at 1. Hurray for consistency!

    cell = row.find_element_by_xpath('//table[not(@cellspacing)]/tbody/tr[%d]/td[not(@style)][%d]' % ((which_row + 1), (which_column + 2)))

The last thing to deal with is the actual contents of the cell, which as mentioned is itself a table. In an ideal world there would be hints in the elements itself through something like a data-* attribute which point we could just loop over things and build a dictionary automagically. But there isn’t so we’ll do it by hand. (I’m only doing the first one for this example…)

    # cell contents (there are different 'types' of cell contents)
    data = {}
    things = cell.find_elements_by_xpath('./div/table/tbody/tr')
    for thing in things:
        first_inner_thing = thing.find_element_by_css_selector('td')
        if first_inner_thing.text == "Monkey":
            data["Monkey"] = thing.find_element_by_xpath('./td[2]').text
        # check the other characteristics of the data...
 
    return data

With that in place, you can refer to individual bits like so.

driver = Remote(desired_capabilities=DesiredCapabilities().FIREFOX)
try:
    driver.get('file:///Users/adam/tmp/tables/cells.html')
    cell = get_cell_at("C", "Prod Group 142")
    assert(cell["Monkey"] == "370.155")
finally:
    driver.quit()

Phew! The moral of this story? Be nice to your future automators and put things that identify columns and rows and cell data. Oh, and that even a solution as peppered with brittle locators but does the job is still better than one that doesn’t.

Post a Comment

Your email is never published nor shared. Required fields are marked *