For EIP-4844, Ethereum clients need the ability to compute and verify KZG commitments. Instead of each client developing their own cryptography, a group of researchers and developers collaborated to create c-kzg-4844, a small C library with bindings for higher-level languages. The goal was to create a reliable and efficient cryptographic library that all clients could utilize. The Ethereum Foundation’s Protocol Security Research team reviewed and enhanced this library. This blog post will discuss some of the techniques we employed to improve the security of C projects.
Fuzzing is a dynamic code testing technique where random inputs are provided to a program to discover bugs. LibFuzzer and afl++ are two popular fuzzing frameworks for C projects. Both are coverage-guided, evolutionary fuzzing engines that operate within the program’s process. We used LibFuzzer for c-kzg-4844 since we were already integrated with LLVM project’s other offerings. The code snippet provided demonstrates the fuzzer for the “verify_kzg_proof” function in c-kzg-4844.
Differential fuzzing is a technique that fuzzes multiple implementations of the same interface and compares their outputs. By comparing the outputs, it can identify discrepancies that indicate a potential problem. Ethereum utilizes this technique to ensure safety through diversification. For KZG libraries, we developed “kzg-fuzz” which differentially fuzzes c-kzg-4844 and go-kzg-4844 (its Golang bindings). So far, no differences have been found.
We used LLVM tools like “llvm-profdata” and “llvm-cov” to generate coverage reports from running tests. These reports provide insights into which parts of the code are executed and tested. A coverage report example in the Makefile of c-kzg-4844 shows how to generate this report. The generated HTML file highlights non-executed code in red, allowing us to identify areas that need to be tested further. In c-kzg-4844, unexecuted code mainly pertains to hard-to-test error cases like memory allocation failures.
While not applicable to all projects, profiling is important for performance-critical libraries like c-kzg-4844. Profiling helps identify inefficiencies that could potentially cause denial-of-service (DoS) attacks on nodes. We used gperftools for profiling since it offers more features and is easier to use than llvm-xray. The provided code sample showcases how to profile the “my_function” function using gperftools.
To gain a deeper understanding of how high-level constructs are translated into low-level machine code, we recommend using software reverse engineering (SRE) tools like Ghidra or IDA. These tools allow you to analyze and interpret machine code, helping you comprehend compiler optimizations and any potential issues. Decompiling functions in SRE tools may not retain variable names or complex types, requiring some reverse engineering work to add clarity.
Clang comes with the Clang Static Analyzer, a powerful tool for identifying potential problems in code without executing it. It complements the compiler by detecting issues missed during compilation. The provided code example illustrates how the Clang Static Analyzer detects a memory leak in the form of an un-freed object.
Sanitizers are dynamic analysis tools that insert additional instructions into programs to identify issues during execution. Clang includes several sanitizers, four of which are particularly useful for identifying memory-related errors: AddressSanitizer (ASan), UndefinedBehaviorSanitizer (UBSan), MemorySanitizer (MSan), and ThreadSanitizer (TSan). These tools can help catch common mistakes and improve memory handling within the program.