Building a concurrent network pinger part 1 {engineering@intility}

In this article we'll explore how to build a concurrent network pinger using Elixir and some core OTP behaviours.

In the previous post we discussed some core concepts of the Elixir programming language and the OTP framework. Let's see if we can make a fun little program by using some of those building blocks.

The task at hand is to build a program that can concurrently ping subnet ranges on a network, and report back which hosts are up or down. We already have a basic understanding of how processes in Elixir work, and now we're going to put them to the test. We will take a look at supervision trees, processes that can keep state, task management and a simple interface to communicate with the application.

If you would like to browse the source code or just download the application and run it on your computer, it is available at GitHub.

Application architecture

In Elixir applications, we structure processes by having top-level Supervisor processes. Supervisors are responsible for running other processes and implements different kind of policies for how to run them. The supervised processes can in turn be other Supervisors with their own children and policies. This structure unfolds like branches on a tree, and is usually referred to as the "supervision tree".
Supervision policies defines behaviour for child processes in case they die, timeout or otherwise exits unexpectedly. For example we might have a Supervisor running a set of processes that relies heavily on each other. If one of the children unexpectedly crashes, the other processes will be unable to perform their job. In such cases we can tell the Supervisor to restart all of its children if one of them dies. In other cases it might be sufficient to only restart the failing process.

This way of structuring processes into supervision trees is what gives Elixir its fault-tolerance. Failures can be isolated to only certain parts of the system, where the Supervisor can attempt to either restart its children (or allow itself to fail if that's desirable), or bubble up to its parent Supervisor to decide what to do next.

We can visualize the application supervision tree like this.

Let's quickly go through how we want to structure our application.

Applications are the idiomatic way to package software in Erlang/OTP and comes with a standardized directory structure, configuration and life-cycle. They have their own environment, can be loaded, started, stopped, have dependencies and so on.
For example, if we would like to interface against a PostgreSQL database, we would use the postgrex driver for handling all the nitty-gritty communication details with our PostgreSQL server. This would run as a separate Application in the Beam, as would a lot of other dependencies we'd pull into our application.
At the very root we have the Application callback module. This module is the entry point of our application, and is where we start the application supervision tree.

Next we have a Supervisor. This will act as the application root supervisor and its job is to keep three other processes alive at all times; a DynamicSupervisor, a Registry, and a Task.Supervisor. The Registry is a local key-value process storage and we will use it to register unique "Subnet Managers". This ensures us that we cannot run multiple ping scans against the same subnet simultaneously.

Then we have a DynamicSupervisor. We have already talked a bit about the purpose of Supervisors, and the DynamicSupervisor resembles a normal Supervisor in many ways.
While a Supervisor requires us to define all child processes when the application starts, the DynamicSupervisor allows us to start child processes after it has started. This is good for us, because we want to run ping jobs for IP ranges in supervised processes, and we don't want to hard-code in what IP ranges to ping beforehand!

Which leads us to the "Subnet Manager" boxes in the drawing. For these we will use something called GenServers. Most processes in Elixir cannot keep state. They carry out the work they're assigned to do, then they die. For our pet project we want to keep track of all the IP addresses we have pinged, so that we can report back which hosts responded and which didn't. GenServers are perfect for this. They are regular Elixir processes, except they can keep state. They have a standard set of callback functions we can use for communication, and are super easy to run in a supervision trees. We'll use the "Subnet Manager" to keep track of all the IP's in the given Subnet range.

The final part of the puzzle is to actually do something! It's all fun and games to build architecture and supervision trees, but the most important thing for our application is to actually run some ping jobs. For this crucial task we'll use Tasks. Tasks are processes meant to execute one particular action throughout their lifetime, then terminate. This is perfect for spawning a process that sends a ping request, and report its outcome back to its caller. Once it has completed its purpose in life, we don't care about it anymore :) But what if the "Ping Worker" crashes in a way we didn't expected? Is such cases we probably want it to restart and try again. This is where the Task.Supervisor comes into play. Whenever we want to execute a ping request, we spawn a new "Ping Worker" task and hands it over to the Task.Supervisor, which in turn will execute it, restart if necessary, and report back to its calling process (in our case the "Subnet Manager" GenServer) the outcome of the job it has performed.

Phew, that was a stretch! Don't worry if this seems a bit overwhelming, we'll take it nice and slow from the top!

Installing Elixir

If you would like to follow along the tutorial and build the application for yourself, you need to install Elixir on your computer. Elixir can be installed in many ways, but the easiest way is probably to consult the official documentation on how to install Elixir for your platform.

Creating a new Elixir project

Open a terminal and create a new mix project using the command <highlight-mono>mix new ping_machine --sup<highlight-mono>.

$ mix new ping_machine --sup
* creating README.md
* creating .formatter.exs
* creating .gitignore
* creating mix.exs
* creating lib
* creating lib/ping_machine.ex
* creating lib/ping_machine/application.ex
* creating test
* creating test/test_helper.exs
* creating test/ping_machine_test.exs

Your Mix project was created successfully.
You can use "mix" to compile it, test it, and more:

    cd ping_machine
    mix test

Run "mix help" for more commands.

Mix is Elixirs build tool, and can be used to create projects, compiling, run test suites, dependency management and so on.

The <highlight-mono>mix new<highlight-mono> generator will create a new standard Elixir project for us. It will create a README.md file, a .gitignore file, a .formatter.exs file, a mix.exs file and lib and test directories. Most of these are fairly self-explanatory, but let's take a quick detour and go over them anyway.

README.md - The README file
.gitignore - The git ignore file :)
.formatter.exs - Contains code formatting rules for the project
mix.exs - This file contains all project dependencies, project config, main module for the application as well as some other useful information
lib - directory for project source code
test - directory for project test files

Add some code

We'll start with creating the public api for our application. A common pattern for structuring Elixir applications is to have a context module that expose and group related functionality. In our case we need to do four things; start new ping jobs, stop already running ping jobs, report successful hosts, and report failed hosts. We already have a context module created for us when we created the project, so go ahead and open the <highlight-mono>lib/ping_machine.ex<highlight-mono> file in your favorite text editor.

<info>For all you VSCode people out there, the ElixirLS plugin provides excellent support for Elixir! You can search for it in the VSCode Extensions pane or get it from the VSCode marketplace.<info>
Get the ElixirLS plugin for VSCode here.

Delete all code in the file and write/paste the following:

defmodule PingMachine do
  @moduledoc false

  def start_ping(subnet) do
    IO.puts("Start pinging #{subnet} subnet.")
  end

  def stop_ping(subnet) do
    IO.puts("Stop pinging #{subnet} subnet.")
  end
  
  def get_successful_hosts() do
    IO.puts("I will report successful hosts")
  end

  def get_failed_hosts() do
    IO.puts("I will report failed hosts")
  end

  defp start_worker() do
    IO.puts("I will eventually start new ping jobs!")
  end
end

This tutorial will not focus much on Elixir syntax, but a keen eye will see that we have both <highlight-mono>def<highlight-mono> and a <highlight-mono>defp<highlight-mono> function definition keyword. Functions defined using the def keyword will be public, meaning that we can call it from outside the module using a <highlight-mono>ModuleName.function()<highlight-mono> call, while functions defined using defp is considered private and can only be called from within the module they are defined in. You may also wonder why it is no <highlight-mono>return<highlight-mono> keyword; Elixir functions returns the last evaluated statement.

Now we have set up all the functions we need for our "public" api in our application. For now they do nothing else than printing some stuff to stdout, but that will change :)

<info>Pro tip! Running "mix format" from the project root will format all the files in your project according to formatting rules defined in the formatter.exs file.<info>

By supplying the <highlight-mono>-S mix<highlight-mono> flag to the iex command, we can run the interactive Elixir shell with the current mix project. Make sure your current working directory is the project root (where the mix.exs file is), and run the <highlight-mono>iex -S mix<highlight-mono> command.

$ iex -S mix
Erlang/OTP 24 [erts-12.1] [source] [64-bit] [smp:12:12] [ds:12:12:10] [async-threads:1] [jit]

Compiling 2 files (.ex)
Generated ping_machine app
Interactive Elixir (1.12.3) - press Ctrl+C to exit (type h() ENTER for help)
iex(1)> PingMachine.start_ping("192.168.1.0/24")
Start pinging 192.168.1.0/24 range.
:ok
iex(2)> PingMachine.stop_ping("192.168.1.0/24")
Stop pinging 192.168.1.0/24 range.
:ok

That worked great! Press <highlight-mono>Ctrl-c Ctrl-c<highlight-mono> to exit the iex shell.

However, we should probably verify that the subnet input value is a proper subnet, and maybe give an error on wrong user input.
In order to verify if the input value is in fact a valid subnet, we need to do a whole lot of geeky computer science math (which I don't know how to do), so let's find a 3rd party library that can help us. There's a library called net_address that seems to cover our needs, so let's try using that.

Open the mix.exs file, and enter the following inside the deps function.

defmodule PingMachine.MixProject do
  use Mix.Project
	
  # Rest of file is not visible here

  defp deps do
    [
      {:net_address, "~> 0.3.0"}
    ]
  end
end

Save the file and install dependencies using the <highlight-mono>mix deps.get<highlight-mono> command from the project root.

$ mix deps.get
Resolving Hex dependencies...
Dependency resolution completed:
Unchanged:
  net_address 0.3.0
All dependencies are up to date

Let's refactor the <highlight-mono>start_ping/1<highlight-mono> function in the lib/ping_machine.ex file to be a little smarter.

defmodule PingMachine do
  @moduledoc false

  require Logger
  require IP.Subnet

  def start_ping(subnet) when is_binary(subnet) do
    case IP.Subnet.from_string(subnet) do
      {:ok, subnet} ->
        Logger.info("Started pinging all hosts in range #{IP.Subnet.to_string(subnet)}")
        {:ok, subnet}

      {:error, _reason} ->
        {:error, :invalid_subnet}
    end
  end


  def stop_ping(subnet) do
    IO.puts("Stop pinging #{subnet} subnet.")
  end
end

A few things are going on here. Near the top we see a new keyword, require. Elixir uses macros as a mechanism for meta-programming. Public functions in modules are globally available, but if you want to use macros defined in a module, you need to opt-in by require'ing the module they are defined in. Both the Logger module, and the IP.Subnet module contains macros we want to use, therefore we need to require them before we can use them.

Next, our <highlight-mono>start_ping/1<highlight-mono> function definition has changed slightly. By adding a is_binary/1 guard to the function definition, it will only match if the given argument is a string type. Strings in Elixir are represented internally by contiguous sequences of bytes known as binaries, hence the name is_binary/1.

<info>While working with Elixir, you'll see notation like function_name/1 or function_name/2. This is commonly referred to as function arity, and is a fancy way to say that a function can take different number of arguments. In Elixir, functions in the same module can have the same name, but with different arity. So whenever you see a reference to for example a function_name/3, you know that this is the version that takes 3 arguments.<info>

In the function body we pass the output of <highlight-mono>IP.Subnet.from_string/1<highlight-mono> function to a case statement. We capture the return value in the case .. do block, and pattern match on whatever we receive. Since we know that the <highlight-mono>IP.Subnet.from_string/1<highlight-mono> function either returns an :ok or :error tuple, it is sufficient to match on these, but we can add as many conditions as we wish.
In Elixir it is common for functions to return values that represents either success or error. These return values often take the form of a tuple where the first value of the element is either an :ok or :error atom and the second value contains the success or error value. An atom is just a literal constant data-type; its name is its value.

Once a condition is met inside the case statement, we either logs an info message to stdout and returns an :ok tuple, if not we returns an :error tuple.

Root supervisor

We have looked at how we wish to structure the application, so lets set up the root supervisor. Open the <highlight-mono>lib/ping_machine/application.ex<highlight-mono> file in your text editor and add a DynamicSupervisor, Registry and a Task.Supervisor to the children list in the <highlight-mono>start/2<highlight-mono> function.

defmodule PingMachine.Application do
  # See https://hexdocs.pm/elixir/Application.html
  # for more information on OTP Applications
  @moduledoc false

  use Application

  @impl true
  def start(_type, _args) do
    children = [
      {DynamicSupervisor, strategy: :one_for_one, name: PingMachine.PingSupervisor}, # Add this line
      {Registry, keys: :unique, name: PingMachine.Registry},                         # Add this line
      {Task.Supervisor, name: PingMachine.TaskSupervisor}                            # Add this line

    ]

    # See https://hexdocs.pm/elixir/Supervisor.html
    # for other strategies and supported options
    opts = [strategy: :one_for_one, name: PingMachine.Supervisor]
    Supervisor.start_link(children, opts)
  end
end

Great! This is a good time to fire up the Beam observer to see if our new supervision tree work like expected.
From the project root (where the mix.exs file is) run the same command as we did earlier; iex -S mix and once you enter the iex shell :observer start.

iex -S mix
Erlang/OTP 24 [erts-12.1] [source] [64-bit] [smp:12:12] [ds:12:12:10] [async-threads:1] [jit]

Interactive Elixir (1.13.0) - press Ctrl+C to exit (type h() ENTER for help)
iex(1)> :observer.start
:ok

This will open the Beam observer application. Feel free to click around and explore it a bit (it's awesome), and when you're done click on the "Applications" tab.
In the left pane we have a list of all the applications running in the Beam instance at the moment. You should see our "ping_machine" somewhere on the list. Click on it and we should see a diagram that resembles the drawing above.

Beam observer visualizing the application supervision tree.

Summary

In this part we have talked a little about the overall goals for the application, how we will interface with it, architecture and the supervision tree. We dipped our toes in a little bit of coding, how to start and stop the application and how to open the Beam observer to peek into the application during runtime. In the next part we will take a look at how to implement the "Subnet Manager" GenServer and refactor the public api functions in the context module.

If you have any comments or questions, please reach out to me at rolf.havard.blindheim@intility.no.