Chips & Salsa: This Hardware Does Not Exist

IPAS_Security · ‎02-20-2024

By Brandon Marken Ph.D. and Rowan Hart

Over the past decade, security has become an increasingly important part of the software development process. Fuzzing has become such an integral part of the process that many large companies have invested large sums of money into large-scale fuzzing infrastructure projects, like OSS-fuzz [1], and large-scale, continuous fuzzing of open-source projects. In 2021, the National Institute of Standards and Technology (NIST), recommended fuzzing be included as part of the standard verification process for software development [2].

It is well known that the earlier a bug or design flaw is found in the development cycle the cheaper it is to fix [3]. Defects found by fuzzing software prior to release can be fixed without impacting end users. The industry-wide effort to ensure that silicon and critical software stacks are made correctly before proceeding to the next phase of development is commonly referred to as “shifting left” [4].

In Intel’s continuous effort to shift left software testing and validation, we introduce a new software fuzzer, the Target Software Fuzzer for the Intel® Simics® Simulator or TSFFS (which rhymes with Sisyphus from classical mythology). TSFFS leverages the Intel Simics Simulator and virtual platform, together with LibAFL, to fuzz software that requires hardware which is as-yet not available in silicon.

The design of TSFFS from a macro level is straightforward: TSFSS is an Intel Simics Simulator package containing a single module which provides fuzzing capabilities and the interfaces to control them through configuration scripts written in Python* or the Intel Simics Simulator scripting language.

Watch the Chips & Salsa video:

Open-Source Participation

TSFFS is an open-source project available at https://github.com/intel/tsffs. The documentation is available at https://intel.github.io/tsffs/. To participate in the project, submit a pull request, file an issue, or contact the authors.

TSFFS

The Intel® Simics® Simulator

The Intel Simics Simulator is a software suite that provides virtual platforms which allow developers to simulate the hardware for which that they wish to develop software . Intel uses the Intel® Simics® Simulator extensively for software development and testing.

Discussing how simulated systems running in the Intel Simics Simulator are different from a typical virtual machine is both out of scope for this document and is discussed at length elsewhere [5]. Instead, we focus on some of the features of the Intel® Simics® Simulator that makes it suitable for TSFFS. The most important feature is that Intel Simics is the only full-system simulator that supports the entire feature-set of current and future generations of Intel ® x86_64 processors. In addition, Intel and Intel customers already use the Intel Simics Simulator extensively in pre-silicon design phases, which avoids the steep learning curve associated with introducing unfamiliar tooling into the developer toolkit. The simulator features impressive introspection facilities which make it well suited for software security work [6]. It offers debugger-like features to inspect the register and memory state of the virtual platform, as well as deeper levels of inspection of the core, uncore, and external device state to debug complex scenarios. It is also programmable in Python, C, C++ [7], and the Intel® Simics® Simulator scripting language. In addition to these introspection tools, the simulator provides facilities for communication between target software and the simulator by executing magic instructions, which are similar to the hypercalls used by the Linux* Kernel Virtual Machine (KVM) or Xen* hypervisors.

The standard way to develop hardware models for the Intel® Simics® Simulator is via Device Modeling Language (DML). DML is C-like, open source, and several examples are available online [8]. An example declaration of a register bank in DML is shown in Figure 1. Once a device model has been defined, it can be used in a virtual platform by connecting the model to other devices and testing firmware that interacts with the simulated device.

Figure 1: Example register bank in Simics DML [8]

LibAFL

LibAFL [9] is a library written in Rust which is designed to allow easy development of fuzzers as compared to AFL [10] or AFL++ [11], which were not designed as flexible frameworks for implementing fuzzers. In fact, AFL++ is a monolithic C application; only the mutation components have a modular design. Despite this, prior to the development of LibAFL it was customary for new fuzzing research to fork an existing fuzzer, such as AFL or AFL++, and splice the new capabilities into the older fuzzer.

This posed a number of problems for fuzzing research. The new capabilities were often incompatible with each other. For example, a seed scheduler added to a fork of AFL++ by one group may be incompatible with the feedback mechanism in another fork. While it is possible to adapt any two such components so that they can be made compatible with each other and integrate both into AFL++, this approach does not scale to the hundreds or thousands of fuzzing improvements that are being made every year. This diverse landscape of fuzzers also makes it difficult to evaluate the performance of individual fuzzers.

These issues lead to the development of LibAFL which provides a set of building blocks common to all fuzzers such as:

Executors execute the code under test.
Mutators mutate the individual test inputs.
Generators generate test inputs from scratch.

The developers of LibAFL designed it with the eventual goal that LibAFL will become the standard library that all fuzzing researchers will use when developing and comparing their new techniques.

LibAFL was a perfect fit for TSFFS. It allows us to build the fuzzing capabilities into an Intel Simics module without having to either write our own fuzzing library or reuse and adapt code from existing fuzzers such as AFL++.

Design and goals

When designing TSFFS, the overarching principle was to remove as many reasons that software is not being fuzzed as possible. This principle lent itself to several design goals:

TSFFS should require no special hardware to run.
TSFFS should have few constraints on the target software.
TSFFS should scale horizontally by coordinating multiple instances of the Intel® Simics® Simulator to allow parallel execution of multiple test cases.

.

The TSFFS module is an Intel Simics Simulator package that provides fuzzing capabilities and the interface necessary to control them. The fuzzing capabilities are controlled through configuration scripts which are written in Python or the Intel® Simics® Simulator scripting. In Figure 2 we see the layout of the target software and the TSFFS module together with the data flows.

Figure 2: TSFFS fuzzing a generic target and the information flows between the different segments of the simulator.

TSFFS module

The TSFFS module is a loadable package for the Intel Simics Simulator, and is made of three main components: The driver, the tracer, and the LibAFL engine. All parts of the fuzzer are written in Rust, a language Simics did not originally support. As part of developing TSFFS we developed low-level and idiomatic high-level Rust bindings to interact with the Intel Simics Simulator API, including via the Python front-end of the simulator.

The driver component installs callbacks required to manage the fuzzing loop and detect configured solution conditions. The TSFFS fuzzing loop is entirely driven by callbacks and user-script triggered events. Fuzzing starts either via a callback to the fuzzer on execution of an initial magic instruction or via an interface method invocation. The fuzzing loop resets and runs again on either another magic instruction execution or an interface method invocation. Additional callbacks trigger on each instruction to trace target software execution on CPU exceptions to detect solution conditions like page faults, timeouts, and user-configured breakpoints.

The tracer component of TSFFS tracks which code paths have been executed. Most modern fuzzers [11], [12] are feedback driven, which means that every time a test case is executed, the fuzzer updates the code coverage and uses it to inform the next test case generated. The goal of a feedback-driven fuzzer is to maximize the amount of the target software’s code that has been tested by the fuzzer. TSFFS’s tracer uses two methods to determine the code coverage for the target software: “hit count” and “once.” The “hit count” method determines how many times a particular code path has been executed when determining code coverage. The “once” method simply checks whether a code path has been executed or not. After the tracer has updated, the code coverage is sent to the fuzzing engine to generate the next test case.

The fuzzing engine is the part of TSFFS that interacts directly with LibAFL and is used to handle typical fuzzing tasks. These tasks include managing the test corpus, generating the next test case for the target software, and determining which code paths passed to the engine by the tracer are interesting and should be further explored.

TSFFS Flow

First, the configuration script configures and starts the target software and the simulated system . The portion we call the “fuzzing loop” begins when the target software begins to execute. It then executes normally until it has reached a start harness condition (discussed in more detail in the Compiled-In Harnessing section). Once control flow has reached the initial start harness condition, execution of the target code is briefly suspended.

The TSFFS module takes either a snapshot or micro-checkpoint[1] of the simulation state and saves the memory location. This is necessary because once we have reached the stop harness condition, we suspend the execution and then later restore the simulation state from this initial snapshot. A test case, either randomly generated or read from an existing input corpus, is then written to the test-case buffer in that location in the target software’s memory. This buffer stores test cases generated by TSFFS until they can be inserted into the target software via the TSFFS driver.

The simulation is then resumed and runs until it reaches the stop harness macro or some other stop condition, such as an exception or timeout. While the target code is running, TSFFS records edge coverage and logs comparison operands via the tracer. When target code reaches the stop macro or another stop condition, its execution is again suspended while TSFFS restores the simulation state from the snapshot or micro-checkpoint. The LibAFL engine then records the output into the output corpus, mutates the test-case based on the traces gathered from this execution as well as any previous times the target has been executed, and writes the new test case to the test case buffer. Once the test case is injected, TSFFS resumes the simulation, and the process repeats itself until some set condition (for example, a specific number of iterations) is reached or the fuzzing campaign is ended manually. A high-level diagram of the fuzzing process is shown in Figure 2.

Requirements

To fuzz target software using TSFFS:

The target software must be supported by an Intel Simics Simulator virtual platform.
The target software must run on one of the supported architectures (either x86_64, x86, or RISC-V).
The target virtual platform model must support snapshots.

[1] Snapshots were introduced in the Intel Simics simulator in version 6.0.175. Micro-checkpoints are unsupported as of version 7.0.0. If the version of the Intel Simics simulator used is between 6.0.175 and 7.0.0, both Micro-checkpoints and Snapshots are supported by TSFFS but snapshots are recommended.

These requirements are less stringent than they initially appear. TSFFS handles many traditionally difficult to fuzz targets running on publicly available Intel Simics Simulator virtual packages, including:

Unified Extensible Firmware Interface (UEFI) integrated firmware image components
Security (SEC), Platform Initialization (PI), Pre-EFI initialization (PEI), and Driver Execution Environment (DXE) stage UEFI drivers, as well as UEFI applications
The Linux and Windows* kernel and kernel drivers
User-space applications in both Linux and Windows

Setup and Configuration

TSFFS is configured via a user script written either in Python, the simulator’s own scripting language, or a combination of the two. These scripts are commonly used as entry-points during normal use of the Intel Simics Simulator, and only minor additions are necessary to load and configure TSFFS.

1. First, load the TSFFS module by running the following command:

2. Once the module is loaded, the tsffs class is available in the simulator and can be instantiated. This is easiest do via Python, which can be embedded in a script using the “@” symbol as a prefix as shown here:

3. The user script sets fuzzer options via the interface. For example, to set a virtual timeout of 3 seconds and configure general protection faults as “solutions” which will be saved during fuzzing, use the following commands:

For testing purposes, these configuration commands are also valid on the simulator command line interface. Fuzzing configurations also require a harness. We describe several ways to implement a harness using TSFSS in the next section.

Compiled-In Harnessing

The first harnessing method is a compiled-in harness. This is the preferred mode and works for open-box fuzzing. In this mode, the target software includes a header file, and the user signals that the fuzzer should begin the fuzzing loop via a start macro and signals that the fuzzer should stop the current fuzzing iteration via a stop macro. For brevity we refer to the start and stop macros together as the harness macro. Figure 3 shows a very simple C-language example of a harness using TSFFS’s harness macros.

Figure 3: An example of a C-language fuzzing harness for tsffs.

Test Case Injection

The second harnessing mode is closed-box test case injection. This method is suitable for pre-compiled binaries and is more target specific. One of two interface methods can be used to manually start the fuzzing loop using test case injection. In the first case, two addresses are required: the test case address and an address for the size of the test case. We call those two variables testcase_address and size_address respectively. We also need to know the processor core object in the simulator that is running the code (which we call cpu), and a Boolean virt which is true if the provided addresses are virtual and false otherwise. In the typical case, this information will be retrieved in a script branch which waits for a start condition (for example, a specific address in the target software to be executed or a specific event to occur). In Figure 4 below, a script-branch waits for a magic instruction with magic number 1 to execute, which is equivalent to the start harness macro.

Figure 4: Test case injection script branch start

When this call is executed, the fuzzer takes a snapshot and begins the fuzzing loop. Instead of the harnessed target software providing the memory address that test cases should be written to via the harness macro, test cases will instead be written to the test case address provided by the script. This allows targets which cannot be recompiled to be harnessed for fuzzing, for example to fuzz a device driver which uses Memory Mapped Input/Output (MMIO).

The memory address pointed to by size_address is written with a maximum test case size, which we will call maximum_size. Instead of reading the size of a test case from size_address it will simply truncate any testcase longer than maximum_size. This is useful in cases where closed-box software must be harnessed which does not include a suitable memory area or does not use any size variable. The approach above simply changes to what is shown in Figure 5:

Figure 5: Script branch start with maximum test case size

When using test case injection, a corresponding interface method is called in the user script to stop the current fuzzing iteration. This is equivalent to the stop harness and should typically be triggered on a certain breakpoint or other condition. For example, to implement the equivalent functionality to the stop harness macro, a callback can be registered on the magic instruction event to call a function which invokes the stop interface method as shown in Figure 6:

tsffs.iface.tsffs.start_with_maximum_size(cpu, testcase_address, maximum_size, True)

Figure 6: Callback to stop interface method

Fully Manual Harnessing

The last option is for cases when the target software does not provide an opportunity for injecting test cases, such as the case when the target is driven by network traffic. In these cases, the user can obtain their test cases directly from the fuzzer itself and dispatch the testcase in any way necessary. For example, the test case can be sent via the network or delivered through a separate device model. This method only requires the same cpu argument used in previous examples, which is required to properly snapshot the simulation in the correct state when the method is called.

Figure 7: Manually setting cpu argument

When fuzzing target software that was previously fuzzed using the open-box method using harness macros, the macros can be disabled by setting:

Figure 8: Manually setting start and stop harnesses

Performance

For smaller virtual platform configurations, TSFFS reaches upwards of 200 executions per second. The larger the model being simulated, the slower the fuzzing becomes because the size of the state to be restored grows with the size of the configuration. This side effect is mitigated with horizontal scaling; multiple instances of the Intel Simics Simulator with TSFFS can be started and synchronized together via a shared corpus or via LibAFL’s Low Level Message Passing (LLMP), allowing cooperative parallel execution of multiple test cases.

Conclusion

TSFFS is a new open-source fuzzer which allows developers to fuzz drivers and firmware much earlier in the software development lifecycle than ever before, enabling fuzz testing during the hardware design and manufacturing process. Post-silicon, TSFSS makes traditionally difficult-to-fuzz software (including UEFI drivers, hypervisors, and bare-metal applications) easier to test than ever before. Researchers have successfully used this fuzzer to find bugs in open-source projects, and Intel developers are putting the fuzzer to work to secure software for products across the compute stack. Expanding adoption of the Intel Simics Simulator and TSFFS has the potential to substantially improve the security of the system software ecosystem.

References

[1] K. Serebryany, “{OSS-Fuzz}-Google’s continuous fuzzing service for open source software,” 2017.

[2] P. E. Black, B. Guttman, and V. Okun, “Guidelines on minimum standards for developer verification of software,” ArXiv Prepr. ArXiv210712850, 2021.

[3] D. Thomas and A. Hunt, The pragmatic programmer. Addison-Wesley Professional, 2000.

[4] M. Greene, “Shifting Left—Building Systems & Software Before Hardware Lands.” Accessed: Oct. 30, 2023. [Online]. Available: https://www.intel.com/content/www/us/en/developer/articles/technical/shifting-left-building-systems-software-before-hardware-lands.html

[5] J. Engblom, “How the Intel® Simics® Simulator Executes Instructions.” Accessed: Dec. 20, 2023. [Online]. Available: https://community.intel.com/t5/Blogs/Products-and-Solutions/Software/How-the-Intel-Simics-Simulator-Executes-Instructions/post/1543049

[6] “12 Low-level Debugging with Simics.” Accessed: Jan. 22, 2023. [Online]. Available: https://intel.github.io/tsffs/simics/simics-user-guide/debug.html

[7] “API reference manual.” Accessed: Jan. 22, 2023. [Online]. Available: https://intel.github.io/tsffs/simics/reference-manual-api/index.html

[8] J. Engblom, “Opening up the Device Modeling Language.” Accessed: Dec. 20, 2023. [Online]. Available: https://community.intel.com/t5/Blogs/Products-and-Solutions/Software/Opening-up-the-Device-Modeling-Language/post/1417739

[9] A. Fioraldi, D. C. Maier, D. Zhang, and D. Balzarotti, “LibAFL: A framework to build modular and reusable fuzzers,” in Proceedings of the 2022 ACM SIGSAC Conference on Computer and Communications Security, 2022, pp. 1051–1065.

[10] “Google/AFL: American Fuzzy Lop - A Security Oriented Fuzzer.” Accessed: Jan. 22, 2024. [Online]. Available: https://github.com/google/AFL

[11] A. Fioraldi, D. Maier, H. Eißfeldt, and M. Heuse, “{AFL++}: Combining Incremental Steps of Fuzzing Research,” in 14th USENIX Workshop on Offensive Technologies (WOOT 20), 2020.

[12] “libFuzzer – a library for coverage-guided fuzz testing.” Accessed: Feb. 06, 2024. [Online]. Available: https://llvm.org/docs/LibFuzzer.html

Notices & Disclaimers
Intel technologies may require enabled hardware,software or service activation.No product or component can be absolutely secure.Your costs and results may vary.
Intel Corporation. Intel, the Intel logo, and other Intelmarks are trademarks of Intel Corporation or its subsidiaries. Other names and brands may be claimed as the property of others.