Selenium is a web automation tool. It can be used to crawl and scrape websites. It requires a web browser installed to use as a web driver: we’re going to use Firefox. We’ll also need to trick Firefox into thinking there is a display attached: we’ll user Xvfb for that.
One setup, you’ll be able to run Selenium scripts at your pleasure.
Prerequisites
- A virgin Alpine Linux container
- A medium knowledge of Linux
Add Firefox repository to APK
As of this blog post, Firefox is not available in the main alpine repository. You can search the Alpine repository to find out where you can install Firefox from. In my case, it is available in the community
repository branch v3.3
.
With that information, I can add the following line to /etc/apk/repositories
http://dl-2.alpinelinux.org/alpine//v3.3/community
Update your cache to pull from the new repository
apk upgrade --update-cache --available
Install Selenium dependencies
Now you should be install Firefox and all other Selenium dependencies without issue.
apk add xvfb firefox dbus py-pip ttf-dejavu
Here is what we just installed:
Repository | Purpose |
xvfb | A virtual display driver. We're building a headless system, so we need a virtual display for our web browser to run in. |
firefox | We will use the Firefox driver for Selenium. This is what we will use for scraping web sites. |
py-pip | Used to install Selenium drivers for Python |
ttf-dejavu | Fonts! These are required for Firefox to render pages |
Setup Xvfb virtual display server
Test Xvfb is setup properly
We should be able to run the Xvfb virtual display server by running
Xvfb :99 -ac &
Check to make sure your display server is running using the top
command.
Note: If you get an error complaining about a machine-id, install the dbus
package and run dbus-uuidgen > /etc/machine-id
. You can then uninstall dbus.
Setup Xvfb to start on system boot
We will use the local
service to create a basic script that will start our service at boot time. We’ll start by enabling the local
service to run at boot.
rc-update add local default
Now we need to create our start script. When local
service is enabled, it will run all executable scripts in the /etc/local.d/
that end with .start
at boot time. It will run all .stop
scripts when the local
service is stopped
In our case, we just need a single script /etc/local.d/Xvfb.start
with the following lines:
#!/bin/sh
/usr/bin/Xvfb :99 -ac &
Now make it executable:
chmod +x /etc/local.d/Xvfb.start
Note: Now is a good time to test that everything is working smoothly. Stop and start your container and then check to see if the Xvfb service is running
Note: Creating a script to shutdown Xvfb is left as an exercise for the reader
Install Selenium with Python
We’re going to use Python Selenium bindings because it’s really easy to get started.
pip install --upgrade pip # ensure pip is upgraded to latest version
pip install selenium requests
That’s it. You can checkout the Selenium Python Getting Started Guide.