Breaking Down the working of Appium 2.0

Japneet Sachdeva
4 min readApr 14, 2024

If your goal is to achieve mobile automation like for a tablet, smartphone, desktops etc. then Appium is the go to library. Appium 1.X as everybody knows was quite unstable or even required lots integrations for setup. Now with the latest releases like 2.X, Appium has improved alot. I have started using it for my iOS based application’s automation.

Appium Breakdown

What is Appium?

Appium is ultimately a Node.js program, it could have looked like importing Appium and its drivers as libraries into your own Node.js programs. But that wouldn’t meet Appium’s goal of providing automation capabilities to people using any popular programming language.

You see, the WebDriver specification is actually an HTTP-based protocol, meaning it is designed to be used over a network rather than within the memory of a single program.

How does Appium work?

One of the main benefits of this “client-server” architecture is that it allows the automation implementer (the thing doing the automation, in this case the ‘server’) to be completely distinct from the automation runner (the thing defining what automation should be done, in what steps, etc…, in this case the ‘client’)

client libraries can be written in any programming language which simply encodes HTTP requests to the server in a language-appropriate way.

There are a couple important takeaways here for you, the Appium user:

Appium is an HTTP server. It must run as a process on some computer for as long as you want to be able to use it for automation. It must be accessible on the network to whichever computer you want to use to run the automation from (whether that is the same machine or one across the world).

Appium for automation involves the use of an Appium Client in the language of your choice. The goal of each of these clients is to encapsulate the WebDriver protocol so that rather than worrying about the protocol

The Appium server and the Appium client do not need to be running on the same computer. You simply need to be able to send HTTP requests from the client to the server over some network. This greatly facilitates the use of cloud providers for Appium, since they can host the Appium server and any related drivers and devices, and all you need to do is point your client script to their secure endpoints.

So as might know already, I am using Appium for iOS automation so below is the breakdown of how it functions?

What’s going on with iOS app testing using Appium in the background?

Your test code (in its programming language) — owned by you

The Appium client library — owned by Appium

The Selenium client library — owned by Selenium

The network (local or Internet)

The Appium server — owned by Appium

The Appium XCUITest driver — owned by Appium

WebDriverAgent — owned by Appium

Xcode — owned by Apple

XCUITest — owned by Apple

iOS itself — owned by Apple

macOS (where Xcode and iOS simulators run) — owned by Apple

It’s a pretty deep stack!

What exactly Appium is doing?

Appium implements a client-server architecture. The server (consisting of Appium itself along with any drivers or plugins you are using for automation) is connected to the devices under test, and is actually responsible for making automation happen on those devices. The client (driven by you, the Appium test author) is responsible for sending commands to the server over the network, and receiving responses from the server as a result

What sorts of automation commands are available?

That is up to the particular driver and plugins that you are using in any given session. A standard set of commands would include, for example, the following:

Find Element

Click Element

Get Page Source

Take Screenshot

They are not Java commands, or JavaScript commands, or Python commands. Instead, they form part of an HTTP API which can be accessed from within any programming language (or none! you could just use cURL if you want)

for example, the Find Element command corresponds to an HTTP POST request sent to the HTTP endpoint /session/:sessionid/element

(where in this case, :sessionid is a placeholder for the unique session ID generated by the server in a previous call to Create Session)

There exists a set of Appium client libraries that take care of the responsibility of speaking HTTP to the Appium server. Instead, they expose a set of “native” commands for a particular programming language, so that, to the test author, it just feels like you’re writing Python, or JavaScript, or Java

-x-x-

If you want to learn and become a Test Architect with Full Stack QA knowledge than take a look at my new course Read more using the link provided (but course is paid): Link

#japneetsachdeva

--

--