The cyber resilience challenge for critical infrastructures at scale
As cybersecurity threats continue to grow at a rapid pace around the world, their financial impact is expected to reach USD 10.5 trillion by 2025. Given this concerning statistic, the challenges for critical infrastructure owners and operators remain just as relevant today. Sectors such as energy, telecoms, transport and defence increasingly face sophisticated cyber threats from various actors, including state-sponsored groups, hacktivists and cybercriminals.
Buffer overflows have been known about for over 40 years and yet they still remain a safety problem for industry. They occur when a program writes data to a memory address beyond the buffer intended for it, overwriting the adjacent data, which is characteristically unsafe behaviour. Recent overflows disclosed by Microsoft, specifically affecting CODESYS, underscore the continuing risk posed by memory-related vulnerabilities in automated systems and infrastructure. CODESYS is a commercial integrated development environment (IDE), one that aids in simulating and communicating with programmable logic controllers (PLCs) widely used in energy, process, and factory automation. In their research, Microsoft exploited a dozen buffer overflows to allow remote code execution, which could itself permit backdoors within PLCs to allow exfiltration or persistent access. Given the persistence, resources, and competence of nation-state threat groups, the overflows would be one potential vector available to them if they were to target infrastructure operation and development.
Legacy systems and equipment are also a huge concern, since many critical infrastructure systems were implemented before modern cybersecurity standards were established and may have legacy code, or else lack critical security patches or built-in security features.The ongoing digital transformation process which attaches, connects and interconnects systems, creates new attack vectors in critical infrastructures. This means that cybercriminals can exploit weaknesses in one system to gain access to others, leading to cascading effects and widespread disruption.
The growing interactions of established businesses in different regulated sectors, such as telecoms and energy is amplifying the risk coming from the supply chain. This is particularly the case when established businesses interact with new and largely unregulated businesses and even consumers or prosumers, such as those providing microgeneration, electric vehicle charging or ‘connected places’ capabilities amplify the risk coming from the supply chain.
Ultimately, critical infrastructure owners and operators also face a cultural challenge when it comes to the disclosure of information in the case of a cybersecurity incident. The rapid sharing of information is one of the best ways to mitigate malicious activities, allowing prompt response to be taken across interconnected organisations, while also encouraging others to learn and implement appropriate actions to prevent cyberattacks elsewhere. Some entities are however reluctant to share sensitive information due to concerns about loss of reputation and stock market value.
The cyber attack surface. How exposed are current critical infrastructures?
The cyber-attack surface refers to all the possible entry points or vulnerabilities that could be exploited by bad actors to gain unauthorised access and cause harm to an organisation.
This attack surface can be particularly wide for critical infrastructure operators due to the complexity and scale of their operations and interconnected systems. For example, Internet-facing systems and cloud-based infrastructures add potentially new entry points for attackers. This means that increased dependency on third-party suppliers of managed services, which often have privileged access to the IT systems of clients, is creating new risks.
While the attack surface is inevitably growing in size, it is at a deeper level where the greatest vulnerability lies. Most applications, particularly in safety-critical industries, are written in C and C++, programming languages that lack safeguards to prevent highly damaging, unsafe behaviour from passing through unchecked and undetected.
C and C++ are still widely used, as demonstrated by the TIOBE Index for August 2023 where they are respectively the second and third most popular programming languages worldwide, right behind Python. They are particularly prevalent in industrial applications due to the age and slow rate of change for critical and reliable systems.
Both languages have been established for almost fifty years, proving to be efficient with high performance and capable of handling the necessary low-level system tasks, and today they enjoy a huge support base of resources, training and libraries. Features created for simpler systems and efficient computation, such as ‘pointers’, aliases to reference other code, however increase the risk of programming mistakes. These mistakes can lead to more bugs being created, that can cause memory safety issues and security vulnerabilities, which the current CPU architecture is unable to limit exploitation of.
While the technology landscape is evolving, and newer and safer languages like Rust are being adopted by the industry, the dominance of C and C++ in industrial applications and their developers remains significant. C and C++ languages are likely to continue being used in industrial domains for the foreseeable future.
Secure-by-design. Why now?
Within the current scenario, and given the time it would take to rewrite all existing C and C++ software into memory safe languages, in addition to the challenge represented by constant software patching, security-by-design was introduced by the UK Government as a radical approach to solve the issue. Secure-by-default development and deployment practices aim to reduce the attack surface and potential for vulnerabilities. Security-by-design refers to the practice of selecting security-focused architectures, measures and considerations into the procurement and design of systems, products, and services before development starts. Adoption of this approach will then work to block, by its design, the undesirable consequences of any vulnerabilities remaining in the attack surface. In the case of the CHERI (Capability Hardware Enhanced RISC Instructions) CPU (central processing unit) architectural extensions, supported by the Digital Security by Design (DSbD) programme, security would be built-in at hardware level in the CPU, offering fundamental protection around the way software is executed.
Although the memory safety limitations and vulnerabilities of current processor architectures were known since the 1970s, there was no route for any one organisation to solve the problem. This was in part because new hardware needed software to justify its existence, and new software needed hardware to run on.
Introducing Digital Security by Design
To address this challenge, UK Research and Innovation (UKRI) inaugurated the Digital Security by Design (DSbD) programme, a five-year initiative worth £80 million matched by £200 million private investment, to facilitate experimentation on fundamentally secure CPU architecture. One way to access DSbD technologies is through the Technology Access Programme (TAP) that provides the Morello hardware prototype, designed and supplied by Arm, to successful applicants with up to £15,000 in project funding.
This prototype is an integrated circuit board comprising a modified Armv8.2-A chipset, extended to be compatible with CHERI, a set of cutting-edge CPU architecture extensions developed by SRI International and the University of Cambridge. Morello and CHERI are RISC-based, with a wide array of applications, ranging from microcontrollers to crypto-processors and mobile devices to desktops, cloud servers, and high-performance computers. Over the last eighteen months, participants in DSbD have taken that opportunity to experiment with the groundbreaking technology, tailoring their own applications to better harness the new engineering paradigm that underlies the architecture.
Creating resilient architecture
These architectures both implement the concept of a “memory capability”, an unforgeable token that prevents vulnerabilities that have long-since plagued software development to ultimately make C/C++ safe. Capabilities replace C/C++ pointers; they extend the 64-bit address space with a further 64 bits of metadata, holding bounds and permissions, forming a 128-bit capability. These additional bounds help defeat spatial memory vulnerabilities, such as buffer overflows, which might have otherwise provided a hacker with the means to access whatever data had been adjacent to the used space.
For example in an edge router, the point of access device or platform for an enterprise, which is most at risk, is the exposure of a single function call to a buffer overflow exploit. This could facilitate remote code execution or the denial of service, either of which could compromise an enterprise or its users. Preventing spatial vulnerabilities reflects the paradigm shift with CHERI. Fundamental changes to the resilience of the architecture can be witnessed throughout the software stack, hardening processes and systems alike.
That paradigm complements a granular sandboxing technique, known as compartmentalisation, which isolates processes and libraries, and limits shared resources to reduce the application’s overall exposure to attack. Key operations within the application, such as calling functions via third-party libraries or inter-process communication, are susceptible to manipulation, hence they have cause to be mutually distrustful.
Compartments resolve this tension with a fine-grained separation of the distrustful components, which constrains anomalous behaviour and minimises the impact of a vulnerability being successfully exploited. Data compression libraries, zlib for example, are widely used and are potential sources of vulnerability, and so would be ideal candidates for compartmentalisation to mitigate their risk to dependent applications. That gain in resilience can result in a magnitude of improvements downstream, with a reduced dependence on hypervisors, more scalable application designs, and system-wide product reliability for end-users.
Engineering real change
Both CHERI’s protection mechanisms, memory capabilities and the compartments they enable, are well-suited to improving the safety and reliability of applications that are reliant on “legacy code”. That term often applies to older implementations of C/C++.
Legacy issues such as these can be rectified, by refactoring the code or rewriting it for a modern standard; CHERI C/C++ is a language dialect that facilitates this for capability-enhanced hardware like Morello. Research into the porting effort required by CHERI, undertaken in 2021, suggested that adoption has a realistic and minor overhead: 0.026% lines of code were changed in a stack containing six million. That effort is also consistent with DSbD’s TAP outcomes in the last eighteen months, which found that enterprises unfamiliar with the dialect, toolchain, or concepts still quickly deliver ports that are resilient by design.
Adapting C/C++ to CHERI is cost-effective, but the calculation does bleed into a continuing debate around the value of just rewriting applications in Rust, another system-oriented programming language. Rust is still a relatively new language, but it nonetheless enforces memory-safe compilation by default and requires programmers to make deliberate and explicit choices regarding the use of any unsafe calls. That implies some degree of competition between the two, one that requires developers to evaluate the relative costs and benefits of porting small amounts of code to CHERI C/C++ or rewriting more in Rust. Yet CHERI has the potential to complement any system language adapted to it, and there is on-going research to integrate its features into Rust, to grant it the same capabilities as those now available to C/C++. This complementary approach pairs a resilient architecture with a memory-safe language, so that those who have already begun their rewrite have the option to swap Rust’s own unsafe features for capabilities later.
Benefits of DSbD technologies for critical infrastructure operators
The adoption of DSbD technologies such as CHERI translates into a number of benefits for critical infrastructure operators. Fewer vulnerabilities and a more balanced approach to risk, shared with technology manufacturers and suppliers means lower insurance and legal costs. In a highly regulated environment where compliance with cybersecurity regulations and standards can be a complex and resource intensive task, having shared responsibility with suppliers will help ease the burden.
Within tight operating profit margin industries, such as Telecoms and utilities, DSbD will enable huge time saving on patching, mitigating and fixing, thus freeing scarce and expensive IT resources that could be employed in revenue-generating activities such as product development.
In terms of direct financial benefits, it will drastically reduce the cost of patching up software. According to a recent survey, organisations spend 321 hours a week on average – the equivalent of about eight full-time employees – managing the vulnerability response process. DSbD will increase operations resilience by reducing downtime as well, with cost savings of up to £80K per hour for avoided outage.
Ultimately, DSbD technologies will help mitigate PR crises and their detrimental impact on brand and stock market value. According to research by IBM, the global average cost of a data breach in 2023 rose to USD 4.45 million, a 2.3% increase from the previous year. 82% of breaches involved data stored in the cloud; public, private or multiple environments.
What critical infrastructure operators ought to do
Critical infrastructure organisations should make secure-by-design a priority and collaborate with their industry peers to understand which safety challenges are the most pressing, and contribute towards the design and development of such technologies. On the other hand, these companies should start coordinating their requests to the supply chain and prioritise the adoption of secure-by-design technologies. By doing so they accelerate the transition to a safer and more secure digital world.