Intility employs many systems and practices to help ensure a stable user experience. From device configuration, ensuring management agents are functioning and up to date, to addressing maintenance and security needs. However, we often see that due to the uniqueness of each device and its users workflow, certain devices fall behind for their maintenance needs. Some lack security updates, while others have problems installing applications or logging in to the device, to name a few. This can cause underlying issues that reduce the overall user experience. The reason behind these underlying issues are often difficult to predict or prevent, but not impossible to do something about. After taking inspiration from a project by Anders Rødland, we decided to take on the challenge of securing the health of all our managed devices.
Intility Client Health (ICH) started out as an extensive PowerShell script that aimed to address and troubleshoot client scenarios we were familiar with from other devices' past experiences. Running silently in the background on all Intility managed devices, it was built with low local resource consumption by design. The script for Intility Client Health is written in PowerShell, and with its native integration with Windows we could ensure that ICH would run as expected without worrying about any prerequisites. The syntax was also intuitive, making it highly accessible for technicians to learn, adopt and collaborate with.
Drawing on our internal knowledge database, previous support tickets and other data points from our management systems, we had all we needed to get started. As this data contained detailed information on problems and deviations for specific devices, we started creating automation tasks to identify and remedy the very same problems and deviations on all other devices.
After a few years of development, the ICH script now consists of over 11.000 lines of code containing over 140 automation tasks covering a wide range of topics:
When Intility Client Health is triggered, it starts by running an initialization function that performs a couple of different tasks:
To trigger the automation tasks, we created a "runbook" which considers your device configuration and decides the order of tasks to execute. The Runbook provides a valuable overview of how and when tasks are executed, with the flexibility to change which tasks to execute and their respective order. After initializing, the runbook is triggered and tests run based on the global variables set, as shown in this snippet from the runbook:
When running ICH everything is logged to a file in JSON format. A typical detection log event will look like this:
In order to maintain an organized project structure, we decided to categorize the automation tasks into two main groups with tailored design templates as base code: Detections and Remediations.
A detection task, also referred to as a "Test", looks for specific information, deviations, or problems. This can be anything from error codes, specific log events, missing files, faulty configurations, BIOS setup or Windows Updates, to name a few.
Below is an example of what our detection of SecureBoot events looks like:
A remediation task, also referred to as a "Resolve", consists of one or more ways to solve a detection. These tasks are usually specific to each detected event, but can in some cases be reused by multiple detection events, such as remediation of any Windows service through our Resolve-Service remediation task. These tasks return a result after attempting to resolve the detected event, being either success, failed or pending if a reboot or replication time is necessary. This gives us insight into how effective our remediations are at resolving specific detections, allowing us to upgrade or change remediations to increase the rate of success.
Below is an example of what our SecureBoot remediation for Dell looks like:
<info>The entire process of automatically detecting and remediating underlying errors runs completely silent in the background without disturbing the users workflow.<info>
To assist in maintaining and developing the project, we utilize CI/CD through GitLab to:
Spellcheck: A custom job that runs every log line through a check for spelling mistakes
Analyze: PSScriptAnalyzer checks the quality of PowerShell code
Test: Pester test framework for PowerShell runs custom tests on all of our code
Our Pester setup includes custom one-to-one tests (a test for one automation task), as well as general one-to-many (a test for all automation tasks). We primarily use one-to-many tests, as they allow us to create any test/rule that every PowerShell function in the project must adhere to. We leverage this for quality assurance, making sure code and documentation is up to standard as well as compatibility between modules in the project.
Below is an example of a test running on all our functions:
Additionally, we have another CI/CD, running on a mirrored instance of the GitLab repository in a separate secure environment. This environment requires special privileges to access and is used to trigger the pipeline responsible for:
Build:
CodeSign: Signs the large single file PowerShell script with our CodeSign certificate
We encountered several challenges in our journey developing Intility Client Health. <highlight-mono>Firstly<highlight-mono>, we realized that using Windows Task Scheduler to run the script, a primary component in our initial strategy, did not fulfill all our requirements and often failed to perform as anticipated. This led to instances where ICH was not executed on time, causing delays and inefficiencies. Furthermore, these scheduled tasks lacked comprehensive logging, making it challenging to pinpoint reasons for non-execution.
<highlight-mono>Secondly<highlight-mono>, our initial deployment approach presented a considerable obstacle. We used group policy for Configuration Manager and Win32Lob applications for Intune to deploy the script and a scheduled task to trigger it. This required us to maintain two separate installation procedures that were to operate as if being one. Upgrading and downgrading versions of ICH thus demanded substantial amounts of manual work, which was not only time-consuming, but also increased the risk of human errors. Additionally, it reduced our overall capacity to address other development matters.
<highlight-mono>Lastly<highlight-mono>, testing new versions of ICH was challenging, as it was difficult to determine the sufficient level of testing. During the early development stages, we reached out to a select few customers to test new features, but quickly realized the disadvantages of this approach. Given the diversity and complexity in configuration and setup among our customers, it was difficult to determine when a feature had been tested enough to be deemed reliable for deployment to all devices. This led to a lot of manual testing, usually requiring more time than developing the features themselves.
These challenges served as learning opportunities and guided us towards refining our development and deployment processes for ICH. They helped us innovate and adopt more efficient and reliable approaches that would ultimately push the project in a new direction.
Having identified the hurdles we needed to overcome; we knew what the solution had to address. We settled on building the application Intility Client Health Service. It consists of a Windows Background Service that ensures that the ICH script runs on predetermined intervals, keeps the ICH script updated and ensures that certificates and signatures are in order.
<highlight-mono>How does it work? Let's explain.<highlight-mono>
The application consists of an installation file that is used to install the service for all machines, regardless of differences in brands, models or management systems. Our management systems push an application package to all Windows devices, installing our background service which takes over from there.
When the service starts up, it checks if any new versions of the ICH script is available in Azure for the current update channel, downloads if necessary and runs it after passing security checks, such as verifying file hash and digital signatures. From there, the service keeps checking for updates of ICH every hour, as well as when it should trigger the next run, currently set to once every 24 hours.
After a long period of internal testing, we discovered how reliable the service was in executing ICH on time compared to our old scheduled task approach. Additionally, the service strengthened our control with security checks that run before initiating ICH. These checks ensure that the script content hash matches our remote versions of the script in Azure, that it's signed with our certificate and that the correct version of the script is in place before executing it.
In parallel to developing Intility Client Health Service, we created a framework to be used when testing new features, called "Waves". Waves creates test groups, like Windows Update rings, evenly dividing devices into groups with custom percentage sizes. This allows for gradual and controlled testing of new features in as many unique environments as possible, while minimizing the impact on end-users' experience.
We used Waves to create and later update the "update channel" for ICH. The update channels are divided into General, Targeted and Insider and are configured in the Windows registry via policies. Using this simple concept of updating channels to decide on a target version of ICH to run, we can deploy new versions, bugfixes or rollback fast and seamlessly.
<info>New versions of Intility Client Health are packaged and automatically distributed through AppPackBot, which you can read more about here.<info>
Both ICH and ICH Service sends their log to us via Splunk, where it is used in monitoring. We create dashboards using this data to drive our own development forward by:
Our journey developing Intility Client Health has been methodical. We wanted to start simple and iterate block by block, rigorously testing every change. This method has been crucial to avoid the pitfalls of overcomplexity, while making sure the project stays easy to maintain and develop.
As we progressed, we realized that in order to overcome certain core challenges we had to evolve our existing ICH solution. With Intility Client Health Service we are able to address real challenges such as ensuring correct configuration, maintenance, troubleshooting and compliance needs, and since its launch it has become clear how well the service fits into our device ecosystem.
Increasing complexity is emerging on all fronts. We see the potential for services such as Intility Client Health to abstract the many needs and translate it to a well-functioning user layer in the years ahead. One remediation at a time.