Harden embedded systems part 1: Fault-resilient C programming

Tuesday, Jun 16, 2020

Scope of these HowTos

This HowTo is the first one of a set of blogposts targeting embedded systems hardening. These posts will talk about how to harden the embedded software, the way to use efficients mechanisms such as SSP, ROP protection, memory protection, and so on even in very small embedded systems.

This set of Howtos targets embedded systems, including small ones (ARMv7-m MPU-based, such as STM32 family) or even smaller, such as AVR cores. These devices are more widespread than we think, including IoT devices, various USB devices, home automatons, industrial control units, smart city IoT, etc. Nevertheless, for the sake of simplicity and reproducibility, code examples are written as POSIX code instead of bare metal code. However, it does not impact the rationale behind the examples and these are provided without a loss of generality.

NOTE: Though this post targets embedded systems, some of the hardening mechanisms described here can also be used in bigger ones, such as Linux-based. Only bare-metal related content is specific to small embedded ones. This is not the case of this blogpost

About Fault injection attacks

Basics

FIA (Fault Injection Attacks) are a class of hardware attacks which aim at corrupting the execution of a given running program in order to change its behavior in a way which helps the attacker.

The target of such an attack is usually the processor core. The software asset to be corrupted is:

an instruction (usually a conditional instruction)
a register (holding a temporary copy of a given variable on which a conditional instruction is executed)

A FIA can also target a firmware-based hardware security feature [¹,²,³] on a peripheral device (typically embedded flash memory devices).

NOTE: Here, we only talk about attacks targetting unsecure processor cores (i.e. not including Secure Elements). Some methods exist to harden attacks such as hardware readout protection check corruption, but these are out of the scope the current blogpost

One should notice that even with a code RTE free (RunTime Errors such as buffer overflows and so on), FIA allows to exploit new attack paths that could not be reached with pure software techniques.

The Fault injection attack family

Fault injection attacks can be made with various technologies. Some are expensive, other can be reproduced at low cost [⁴][⁵]. Low cost attacks are (almost) non-invasive ones. They usually require very few hardware modifications and preparation of the targeted chip. This is the case of:

Glitch attacks. These attacks highly increase (or decrease) the input power of the target during a very short time (some nanoseconds), corrupting the execution pipeline. Such attacks can target the Power Vcc line, or the core clock when an external clock is used.
Electromagnetic attacks. These attacks inject, during a very short time, an EM pulse on a predefined and controlled part of the chip, impacting the electronic behavior during a very short moment. They can be seen as a kind of widespread glitch over a portion of the CPU die.

Other hardware attacks require a huge preparation, such as chemical modifications (decaping and so on) of the chip in order to access the various IPs directly. This is the case of:

Laser fault attacks, generating a corruption through the transmission of a laser flow to a targeted component (typically memory buses, cells or registers). Such attacks are expensive but are way more accurate (in terms of space and timing) than the cheaper ones described above. Thus they are no more limited to states and ITSEFs and may challenge secure ships security [³]

Impact on the embedded software

A successful fault injection attack on an embedded software allows to change the execution behavior in a way that was not considered by the developer. It may lead to a direct bypass of a security feature, or permits to initiate a hybrid attack, including successive FIA and software exploitation [²] when targetting hardened devices.

Here is a typical short example:

 1#include <stdlib.h>
 2#include <inttypes.h>
 3#include <stdbool.h>
 4
 5static int test = 42;
 6
 7static bool check_test(void)
 8{
 9    if (test == 42) {
10        return true;
11    }
12    return false;
13}
14
15int main(void)
16{
17    bool res = check_test();
18    return (int)res;
19}

This (useless) code should always return 1, as the test variable is never updated. As test is not const, the compiler still considers it is a variable and reads it back from the memory (we suppose here that the code is compiled without optimizations using -O0). Without any options, the associated ARM assembly code generated by Debian stable gcc 8.3.0 for the check_test function is the following:

 1push    {fp}                  ; 0x00, preparing the frame
 2add     fp, sp, #0            ; 0x04
 3ldr     r3, [pc, #40]         ; 0x08, get back the test variable
 4add     r3, pc, r3            ; 0x0c, using pc-relative instructions
 5ldr     r3, [r3]              ; 0x10, now test is in r3
 6cmp     r3, #42 ; 0x2a        ; 0x14, compare test to '42'
 7bne     568 <check_test+0x24> ; 0x18, if not equal, go to offset 0x24
 8mov     r3, #1                ; 0x1c, set 1 into r3
 9b       56c <check_test+0x28> ; 0x20, go to offset 28
10mov     r3, #0                ; 0x24, set 0 into r3
11mov     r0, r3                ; 0x28, set return value
12add     sp, fp, #0            ; 0x2c, cleaning the frame
13pop     {fp}                  ; 0x30,
14bx      lr                    ; 0x34, return back to main

NOTE: The generated assembly code may vary between compilers and compilers version. Here, we specify the specific gcc version but the result can change even between packages of the same version in different GNU/Linux distributions

Reading the C and assembly code, we would imagine that a part of the code is dead, as the return false is never executed. This is the same in the generated assembly code, where instruction at offset 0x24 should never be executed. Though, what if, during the execution of cmp r3, #42, a power glitch makes the processor core invert the check? In such a case, the function may consider the branch if not equal as true, and modify its current execution branch.

Writing FIA resilient code

Glitch fault injection on embedded systems is not that hard and can be done with few materials [⁵,²]. Though, there are various ways to harden the attacker’s path to an effective exploitation. First of all, what are the weaknesses of FIA?

Triggering one fault is not that hard, but triggering multiple successive faults is highly harder, especially when they must be triggered is a very short amount of time (which is the case when multiple instructions must be defeated).
When corrupting data (typically in a register), the way the content is corrupted is usually difficult to predetermine. It is possible to corrupt a single check mode (typically a comparison), but it is harder to corrupt a double, inverted check (an equality and a differentiation check on the same variable)
When preparing a FIA, an execution profile must be measured, to determine when the fault must be triggered. Making the execution sequence time slightly unpredictable makes the execution profile highly harder to define and highly reduces the probability to trigger the targeted instruction. This is what is usually called adding a jitter to instructions execution.

Now that we are aware of the weaknesses of FIA, let’s start hardening our code.

Enforce critical control instructions

Basically, branches are typical instructions that are targeted by FIA. They define the program execution flow, and are generated by the compiler for each if, switch/case and even loop statement of the initial C code. This is also the case in others languages.

Let’s harden a little our code:

 1#include <stdlib.h>
 2#include <inttypes.h>
 3#include <stdbool.h>
 4
 5uint32_t critical_val = 42;
 6
 7bool check_test(void)
 8{
 9    if ((critical_val == 42) &&
10        !(critical_val != 42)) {
11        goto res_true;
12    }
13    return false;
14res_true:
15    return true;
16}
17
18int main(void)
19{
20    bool res = check_test();
21    return (int)res;
22}

Here, we consider the global variable critical_var. We want to securely check that its value is 42. In the previous piece of C code, we have modified the if statement by:

testing that critical_val value is 42
testing that the boolean expression (critical_val not equal to 42) is false

This would be considered as a redundant check, as the two boolean expression should respect the same truth table. Here, we successively check a boolean equality and difference. The generated assembly code with gcc without any options for check_test is the following:

 1push    {fp}
 2add     fp, sp, #0
 3ldr     r3, [pc, #64]
 4add     r3, pc, r3
 5ldr     r3, [r3]
 6cmp     r3, #42
 7bne     574 <check_test+0x30>
 8ldr     r3, [pc, #48]
 9add     r3, pc, r3
10ldr     r3, [r3]
11cmp     r3, #42
12beq     57c <check_test+0x38>
13mov     r3, #0
14b       584 <check_test+0x40>
15nop
16mov     r3, #1
17mov     r0, r3
18add     sp, fp, #0
19pop     {fp}
20bx      lr
21andeq   r0, r1, ip, ror #21
22ldrdeq  r0, [r1], -r8

We can see that the compiler makes two comparison on the variable value, succeeded by, at first a branch if not equal, and for the second comparison, a branch if equal. If the goto is replaced by a direct return statement, gcc, in its version 8.3 with no option, will generate two bne instructions.

Let’s now add a basic optimization flag: -O1. This flag makes some basic optimizations of the code, not including complex one such as advanced loop unrolling and so on. The generated assembly code for check_test is then the following:

1ldr     r3, [pc, #20]   ; 560 <check_test+0x1c>
2add     r3, pc, r3
3ldr     r0, [r3]
4cmp     r0, #42 ; 0x2a
5movne   r0, #0
6moveq   r0, #1
7bx      lr

Oops! No more double checks! The if statement is a basic cmp statement, with two conditional mov just after. We will talk about that below.

Harden variables

Some C historical considerations make some part of it weak to FIA by design. This is the case of the boolean handling. In C code, a boolean value is defined as:

false if equal to 0
true for any other value

Clearly, the invalid concept here is the true if any other value. Regarding fault injection attacks, a boolean variable can be translated from False to True without requiring to control the target register value corruption. Whatever the bit(s) which is (are) flipped, this is enough to semantically define a True value starting from a False one.

When attacking a storage area (register, etc.) the result is usually a small sequence of uncontrolled (yet reproducible with good probability) bit flips (bit inversion, from 0 to 1 or from 1 to 0).

The best way to harden a set of values (typically the boolean type) regarding such attacks is to define an enumerate type with the maximum number of inverted bit between them. Though, they must not be the exact invert (for e.g. 0b1010 for 0b0101): such a pattern is a perfect bit flip result.

The minimum number of differentiated bits value in a given type between all its possible values is named Hamming distance. Hamming distance has been used mostly for memory failure detection [⁶], but can also be used for fault injection. The bigger the Hamming distance is, the harder to trig the fault injection is.

The below code is typically a bad idea:

1int check_flag(bool critical_flag)
2{
3    if (critical_flag) {
4      return RES_FLAG_TRUE;
5    }
6    return RES_FLAG_FALSE;
7}

First of all, the passive if check considers any non-zero value for critical_flag as valid. In term of assembly code, this would generate a bne instruction, inverting the if statement to compare the flag with the #0 immediate value. This test is incorrect with regard to FIA as it does not check an effective non-zero value, making a fault injection trigger easy to reach.

We can harden a little the check:

1int check_flag(bool critical_flag)
2{
3    if (critical_flag == true) {
4      return RES_FLAG_TRUE;
5    }
6    return RES_FLAG_FALSE;
7}

Here the comparison to true is explicit. The compiler must compare the effective true value and can’t rely on a comparison on false. Yet, in the C language, true value is 1, making the Hamming distance between true and false equal to 1. This value is really small and can be corrupted more easily than more complex enumerated types.

Let’s harden a little more:

 1typedef enum {
 2    SECURE_FALSE = 0xacefaecf,
 3    SECURE_TRUE =  0xca38c3e8
 4} secure_bool_t;
 5
 6int check_flag(secure_bool_t critical_flag)
 7{
 8    if (critical_flag == SECURE_TRUE) {
 9      return RES_FLAG_TRUE;
10    }
11    return RES_FLAG_FALSE;
12}

Here is the binary representation of the secure_bool_t type:

value name	hexadecimal	binary
SECURE_FALSE	0xacefaecf	1010 1100 1110 1111 1010 1110 1100 1111
SECURE_TRUE	0xca38c3e8	1100 1010 0011 1000 1100 0011 1110 1000

What we see here is:

The Hamming distance is 19
the two values are not the exact opposite of each others
equal bits are spread over all bytes of the type

As this type is an explicitly defined enumerate type with explicit values, and while the control structure explicitly check each value, boolean operations based on this type will use 32-bits based, secured storage types, avoiding basic corruption of historical C booleans.

Fight against the compiler

In the previous sections, we have seen that:

we can add control structure duplication in the C code, to harden critical control flow checks
we can harden critical variables by using our own types with specifically forged values
though, the compiler is our enemy, as:
- small modifications in the C code may generate real differences in the assembly code
- it will not distinguish security oriented code duplication and will try to suppress such features from the generated assembly

There are various ways to avoid compilation problems when adding security-oriented code blocks:

Replace all code that is incorrectly compiled by direct assembly code
Compile the overall project or library with -O0 optimization flag
Write macros/functions for each security feature which handle inline assembly backend
Slightly target security critical functions to locally deactivate optimization flags
Write a compiler module dedicated to secure programming, handling hardened branches when explicitly annotated in the source code (using dedicated annotations or language attributes)

All these ways are not mutually exclusive and all have their advantages and drawbacks.

Replace code with inline assembly

This choice is globally not the good one. Replacing the overall C (or other language) code with assembly code is:

not portable
dangerous (there is absolutely no type checking or any helper as the code directly use opcodes)
sub-optimal (even in -O0, the compiler will probably write better assembly than us)
no more static analysis can be done easily on the program

Compile the overall project with the `-O0` flag

This solution works in the way that the compiler has no more ability to optimize a single line of code. Yet, as shown below, some optimization may happen even in this mode and the assembly code may drop some security features. Though, on the overall program execution, do all functions handle critical control flow? Moreover, compiling the overall program without any optimization highly increase its footprint, which is a real problem in embedded systems.

Write macros/functions for each security feature which handle inline assembly backend

This can be done, though macros may be hard to write. A SECURE_IF() macro can’t be written easily as the preprocessing stage is not the good state for an if statement. As a consequence, injecting assembly code including control opcodes (typically branches) through preprocessing is complex and challenging.

Another way that could be used is to handle secure_if(), secure_case(), and so on API in a dedicated compilation unit. This unit would be compiled with no optimization and its API used in the others parts of the program.

We can then imagine a (basic) compilation unit like this:

 1/* secure_control.h */
 2
 3#include <stdlib.h>
 4#include <inttypes.h>
 5#include <stdbool.h>
 6
 7typedef enum {
 8    SECURE_FALSE = 0xacefaecf,
 9    SECURE_TRUE =  0xca38c3e8
10} secure_bool_t;
11
12typedef enum {
13    IF_STATE_EQUAL = 0xefaccfae,
14    IF_STATE_DIFF = 0x8eca383c9
15} secure_if_statement_t;
16
17secure_bool_t secure_if(int flag, int val, secure_if_statement_t if_type);
18
19/* secure_control.c */
20#include "secure_control.h"
21
22secure_bool_t secure_if(int flag, int val, secure_if_statement_t if_type)
23{
24    secure_boot_t result = SECURE_FALSE;
25    if (if_type == IF_STATE_EQUAL && !(if_type != IF_SATE_EQUAL)) {
26      if (flag == val) {
27        if !(flag != val) {
28          result = SECURE_TRUE;
29        }
30      }
31    } else if (if_type == IF_STATE_DIFF && !(if_type != IF_STATE_DIFF)) {
32      if (flag != val) {
33        if !(flag == val) {
34          result = SECURE_TRUE;
35        }
36      }
37    }
38    return result;
39}

Such a paradigm would only move the problem, as until we get back the result of the secure_if() hardened API, what do we do with it? Another comparison is done, which also needs to be hardened. Lots of code has been added, but a single unique fault injection is still efficient yielding in a single point of failure.

Target security critical functions to locally deactivate optimization flags

Let’s reduce the -O0 flag to the minimal unit we can. Modern compilers (gcc, llvm) support to push and pop compilation flags up to a function scope. It is then possible to reduce the impact of optimization loss to the critical functions by pushing and popping the dedicated flag.

This can by done with this bunch of preprocessing instructions:

 1#if __GNUC__
 2#pragma GCC push_options
 3#pragma GCC optimize("O0")
 4#endif
 5
 6#if __clang__
 7#pragma clang optimize off
 8#endif
 9secure_bool_t my_critical_function(void)
10{
11  // ...
12}
13#if __clang__
14#pragma clang optimize on
15#endif
16
17#if __GNUC__
18#pragma GCC pop_options
19#endif

Here we can deactivate optimization options for one (or more) functions in a given compilation unit without impacting the other parts of the program.

Write a compiler module dedicated to secure programming

You’re welcome! This is the best idea, as the compiler would be able to handle properly the AST and flag the critical conditional instructions by generating supplementary assembly code. Though, this requires a lot of work, which should be, if you’re ready to start with it, easier with LLVM than with GCC. Such a work is a complex academic topic, specifically when formal proofs of generated code resistance against FIA fault models are expected.

Conclusion

Hardening embedded code against FIA is not an easy task but can be reasonably deployed in critical parts of secure embedded systems where fault injections attacks are in the attack surface scope. These basic mechanisms highly increase the attackers efforts, making single fault attacks poorly efficient without requiring advanced program schemes such as control flow integrity (CFI) mechanisms.

This HowTo stops here. In the next posts, we will discuss of the ways to use finite state automatons (FSA) and control flow integrity (CFI) designs and implementations in embedded systems to detect and react to fault injections.

References

Sultan Qasim Khan, (2020), Microcontroller Readback Protection: Bypasses and Defenses, NCC-Group Whitepaper. ↩︎
Benadjila, R. et al. (2020). Inter-CESTI: Methodological and Technical Feedbacks on Hardware Devices Evaluations, Symposium pour la Sécurité des Systèmes d’Information. ↩︎ ↩︎ ↩︎
Hériveaux, O. (2020) Black-Box Laser Fault Injection on a SecureMemory, Symposium pour la Sécurité des Systèmes d’Information. ↩︎ ↩︎
Bozzato, C., Focardi, R., & Palmarini, F. (2019). Shaping the Glitch: Optimizing Voltage Fault Injection Attacks. IACR Transactions on Cryptographic Hardware and Embedded Systems, 2019(2), 199-224 ↩︎
Go, H. L. C. W. (2017). Low-Cost Setup for Localized Semi-invasive Optical Fault Injection Attacks. In Constructive Side-Channel Analysis and Secure Design: 8th International Workshop, COSADE 2017, Paris, France, April 13-14, 2017, Revised Selected Papers (Vol. 10348, p. 207). Springer. ↩︎ ↩︎
Qin Minghai, (2019) Encoding and decoding of hamming distance-based binary representations of numbers, Google Patent. ↩︎

Questions ?

Any questions, remarks ? Contact us on any of our social networks or communications interfaces !