Dealing with File Downloads with Selenium

Selenium is fantastic for things that are in the browser, but much less so when dealing with things that are of the browser. Oh, like say file download dialogs as those are outside of the application content and relying on the browser itself to manage things. So how do you deal with file downloads with Selenium?

The simplest solution is to not deal with them with Selenium. All the languages that Se-RC supports have a way of downloading a resource from a server — use that. Remember that at its heart Se is just a library for controlling a browser. Nothing more. So don’t use it if you don’t a browser. Do something like this instead

import unittest, urllib2, tempfile, os
 
class FileDownload(unittest.TestCase):
    def setUp(self):
        self.verificationErrors = []
        self.selenium = selenium("localhost", 4444, "*chrome", "http://test.example.com/")
        self.selenium.start()
 
    def smartDownload(self):
        file_on_server = "http://test.example.com/file.csv"
 
        se = self.selenium
        se.open("/")
        self.assertTrue(se.getAttribute("download@href"), file_on_server)
        f = urllib2.urlopen(file_on_server)
        fhandle, path = tempfile.mkstemp(text = True)
        local_file = os.fdopen(fhandle, "w")
        local_file.write(f.read())
        local_file.close()
        # now process the contents of your file for accuracy
        os.remove(path)

This way lets Se do what it does best which is to drive the browser and inspect the returned response — in this case to ensure that the href attribute of the anchor with an id (or name) of download is the value of the file_on_server variable. If it is, then we skip over to pure python and download the file to a tempfile. What is not show is parsing the csv contents, but that isn’t important for this. What is important is that you remove the file when you are done. If you don’t and your script runs a long time you can run out of file handles on the machine (trust me).

That is of course in the ideal situation where you don’t have some fancy javascript in the way that absolutely forces you to use a browser to download the file. What then? Well, then you are forced into a variation of the above, but this time instead of using something like urllib2, we need to use something like AutoIt.

Danger! – by using AutoIt you are now making Windows only scripts! Be forewarned.

AutoIt is a handy scripting tool that uses a Perl-ish syntax to interact with windows. I’m pretty sure that it is just a wrapper around COM, but it can be ridiculously powerful looking through the examples. Thankfully we just need a very simple bit of functionality of finding the Save window and filling in a few text boxes. Here is the Internet Explorer ‘Save As’ script.

AutoItSetOption("WinTitleMatchMode", "2")
 
WinWait("File Download")
$title = WinGetTitle("File Download")
WinActivate($title)
WinWaitActive($title)
Send("s")
 
WinWait("Save As")
$title = WinGetTitle("Save As")
WinActivate($title)
WinWaitActive($title)
$filename = ControlGetText($title, "", "Edit1")
$fullpath = $CmdLine[1] & "\" & $CmdLine[2] & "-" & $filename
ControlSetText($title, "", "Edit1", $fullpath)
Send("{ENTER}")
 
ConsoleWrite($fullpath)

And the Firefox one.

AutoItSetOption("WinTitleMatchMode", "2")
 
WinWait("Opening")
$title = WinGetTitle("Opening")
WinActivate($title)
WinWaitActive($title)
Send("!s")
Send("{ENTER}")
 
WinWait("Enter name of file")
$title = WinGetTitle("Enter name of file")
WinActivate($title)
WinWaitActive($title)
$filename = ControlGetText($title, "", "Edit1")
$fullpath = $CmdLine[1] & "\" & $CmdLine[2] & "-" & $filename
ControlSetText($title, "", "Edit1", $fullpath)
Send("{ENTER}")
 
ConsoleWrite($fullpath)

One thing I have seen a lot of is these scripts getting compiled to a .exe file and that is used. I recommend you don’t do this. By definition a compiled file is a binary one which means it cannot be diff’ed between versions.

So how do you run these scripts? Well, here is a [Python] function I wrote to do just that.

from System.Diagnostics import Process
import time, os, os.path, random, string
 
def download(o):
  se = o.selenium
  save_location = os.path.join(os.getcwd(), "tmp")
  file_prefix = "".join(random.sample(string.letters, 5))
 
  # http://www.ironpython.info/index.php/Launching_Sub-Processes
  p = Process()
  p.StartInfo.UseShellExecute = False
  p.StartInfo.RedirectStandardOutput = True
  p.StartInfo.RedirectStandardError = True
  p.StartInfo.FileName = "e:\AutoIt3\AutoIt3.exe"
 
  if o.browser in ["*chrome", "*firefox"]:
    p.StartInfo.Arguments = "support/au3/ff_save.au3 %s %s " % (save_location, file_prefix)
  elif o.browser in ["*iexplore", "*iehta"]:
    p.StartInfo.Arguments = "support/au3/ie_save.au3 %s %s " % (save_location, file_prefix)
  else:
    print("unsupported downloadable browser")
 
  p.Start()
  p.WaitForExit() 
  stdout = p.StandardOutput.ReadToEnd()
  stderr = p.StandardError.ReadToEnd()
 
  exist_counter = 0
  if exist_counter == 20:
    raise IOError, "File download timeout"
  while not os.path.exists(stdout):
    exist_counter += 1
    time.sleep(3)
 
  previous_size = 0
  current_size = 1
  while current_size != previous_size:
    time.sleep(3)
    previous_size = current_size
    current_size = os.path.getsize(stdout)
 
  return stdout

You can use this pretty much as is, with the only modifications necessary being the location of AutoIt3.exe, ff_save.au3 and ie_save.au3.

And because saving files is not enough of a pain already, you may have to start your Se-RC server with a custom Firefox profile that has the following line in the prefs.js.

user_pref("browser.download.useDownloadDir", false);

When I’m talking to people about Se, I usually suggest that they skip automating the download portion of their application as it is often more efficient to just do it by hand. But if they really, really want to automate it I go straight to the first technique. Using AutoIt is a yes-it-works-but-should-be-a-last-resort way of doing things.

Trackbacks & Pingbacks 1

  1. From A Smattering of Selenium #17 « Official Selenium Blog on 28 Jun 2010 at 6:26 am

    […] I’ll finish with one of my posts from this week which was on how to deal with pesky file downloads in Se. (hint: don’t use Se for downloading files) Leave a […]

Post a Comment

Your email is never published nor shared. Required fields are marked *