Master your RISC-V codebase: Improve code quality with built-in static analysis

Written by Rafael Taubinger, Global Product Marketing Manager, IAR

The embedded industry’s challenge of delivering software that is more complex at an ever-increasing pace raises the risks of software errors, which can affect the quality of products as well as cause security issues. This is even more a reality for the RISC-V codebase when source code is often being reused or migrated from other architectures with the expectation of the same outcome. Still, most developers are only getting started with the architecture, new instruction set, and a handful of extensions.

When we talk about taking control of your RISC-V codebase, there are really two aspects to it. The first meaning is reusing your code base for future projects. The second aspect is that poor code quality is actually a widespread problem, and there is quite a bit of evidence to support the claim that bad coding practices lead directly to vulnerabilities. This makes it clear that every developer and company needs to improve the code’s quality so that the software stands the test of time, meaning that it is defect-free or as close to defect-free as possible.

Reusing code

Boehm’s COCOMO [1] method, shown in picture 1, estimates how the relative cost of writing the code is dramatically impacted by how much modification you do to the reused software. The x-axis is what percentage of modification you do to the code you intend to reuse, while the y-axis represents the percentage of what it would be if you wrote fresh code. Note that for two of the three data samples of code, you did not have to modify much of the supposedly reused code to suddenly jump to 50% of the effort of rewriting the code from scratch. The key point here is that if you really want to reuse code, it has to be of very high quality and well-designed in order to be cost-effective.

Picture 1 - Boehm’s COCOMO non-linear reuse effects method

Focus on code quality

There are several reasons why code quality is a big issue: first, depending on the maturity of your development organization, you can spend up to 80% of your time in debugging. If you could quickly isolate defects before they make it into a formal build, you’d have a lower defect injection rate, which means you can meet your organization’s quality metrics much more quickly. But it also means that your code has fewer remaining bugs overall, which makes it a good candidate for reuse since using the code again has a lower chance of uncovering a previously undetected bug. High-quality code is easier to maintain because of fewer defects, and – if it follows good software engineering principles – it will be easier to extend; therefore, reusing it really does give you faster follow-on projects. It’s also easier to get safety certifications if your application requires it. What this amounts to is that higher code quality means less “technical debt” to reusing it.

Available coding standards

There are quite many coding standards available, but only a few are widely used. MISRA C [2] is a software development standard for the C programming language developed by the Motor Industry Software Reliability Association. It aims to facilitate code safety, portability, and reliability in embedded systems, specifically those programmed in ISO C.

The first edition of the MISRA C standard, "Guidelines for the use of the C language in vehicle-based software," was produced in 1998 and is officially known as MISRA C:1998. It was updated in 2004 and again in 2012 to add more rules. There is also a MISRA C++ 2008 standard based on C++ 2003.

Some good coding standard rules can also be found in the CWE - Common Weakness Enumeration [3] from MITRE. The list was started when the folks at mitre.org surveyed to find out what kinds of defects developers accidentally inject into their code. Surprisingly, developers of all stripes - be it web, app, desktop, or embedded – tend to make the same kinds of mistakes. Thus, was born the CWE which is a list of these common pitfalls that developers should avoid. For example, allocations without deallocations in C++ code (or even in C code). Also functions used without prototyping, which is an interesting point on good coding practices. If you don’t prototype your function, you don’t get rigorous type-checking at compile time. Still, you can also have less efficient code because the rules of the C language state that without a prototype, all arguments are promoted to integers. This can invoke casting and floating point operations if your MCU doesn’t have an FPU. That’s why you should always prototype. However, the main point of the CWE is that it identifies risky and bad coding behavior.

SEI CERT C and C++ [4] also define common vulnerabilities that come from case studies, like checking floats for out-of-bounds conditions and making sure that you don’t override a const. It also prescribes styling conventions to make code more readable and understandable.

Practical examples of MISRA C 2012

MISRA C 2012 is widely used for securing code quality in embedded applications. Let’s explore some rules and directives to understand better how the coding standards affect the source code.

In Directive 4.6, for example, you’re not allowed to use a primitive data type. At first, this may seem like an odd thing to do, but when you understand the reason, it makes a lot of sense. Different compilers treat things like int differently, both in terms of its size and its signedness. This can make it tricky to review code as well. If you’re a reviewer, it also makes you wonder if the original author of the code understood how the compiler interprets that code. If you don’t use primitive types, you make the code invariant across compilers and architectures.

Most of the time, developers will be using something like uint16_t which tells the compiler the variable is an unsigned 16-bit quantity because the width and signedness are explicitly stated in the variable type. These are part of stdint.h.

Another interesting directive is rule 13, which says that the right-hand side of an AND or OR operator cannot contain side effects. The code snippet from picture 2 might look fully correct, but it isn’t.

Picture 2 – MISRA C 2012 - rule 13 code example

The issue is that the right-hand side only gets executed if the expression on the left side is false. Only then will the pointer p be post-incremented. The problem is that it’s easy to get this behavior wrong when writing code and everyone who ever reviews, tests, or maintains the code must understand the ramifications of how you wrote your code. Comments in this section of code could help, but the reality is that it is seldom well-documented.

Another good example to explore is rule 14, which states that the body of an if or while statement must be enclosed in curly braces. The code snippet from picture 3 shows an example of it.

Picture 3 – MISRA C 2012 - rule 14 code example

It’s difficult to tell if the z=1 statement is intended to be part of the else block. This happens because it’s indented at the same level as the previous statement. If it is intended to be that way, this is a bug because it clearly doesn’t go into the code block the way it’s written. This rule helps prevent this type of coding error. This is just a small sample of the 200+ rules available in MISRA C to make your code more reliable and portable, thus future-proofing your design.

Fast ways to better code

The fastest way to improve code quality is to use code analysis tools. In fact, if you’re doing a functional-safety-certified application, you’re required to use static analysis. These types of tools help you find the most common sources of defects in your code, but they also help you find problems that developers tend not to think or worry about when they’re trying to write their code, especially when they’re just putting up scaffold code to get something working. These types of tools really help you develop better code because they enforce coding standards.

Depending on the quality of your static analysis solution, they can check for many other potential issues while you’re still desk-checking your code. To see how it works, let’s check the C-STAT static analysis tool (which is built into IAR Embedded Workbench for RISC-V) in action. C-STAT sources its rules from MISRA C 2004 and 2012 rulesets, MISRA C++ 2008 ruleset, the Common Weakness Enumeration (CWE) from MITRE, and SEI CERT C. Picture 4 shows the available rules that can be enabled or to enforce compliance with the coding standards.

Picture 4 - Standards and Rule Selection

It is possible can drill down into categories and select only the rules that we feel are applicable to our project. Additionally, it’s possible to override these selections at the group, file, function, or even individual line level to give a complete granularity over what is being checked. Once the tools are configured, the project (or group or individual source file) can be analyzed (or files). After the analysis completion, it’s possible to drill into each file to verify the issue that has been triggered:

Picture 6 – Context-sensitive help

From the help window, it’s possible to get a full description of the issue, how certain this is a bug vs. the severity of what happens if that bug manifests itself, and all the coding standards it violates. Most importantly – down at the bottom, like shown in picture 7, it’s possible to see 1-3 code examples that show a bad example and how to correct that bad example so that it will pass the check and make the code more robust. This helps to quickly eradicate the defects that static analysis uncovered in the source code.

Picture 7 - Code Examples that fail and pass checks

Automated workflows

Ensuring the code quality is important for the developers working at the desk day by day but even more important in the modern and scalable build server topologies for CI/CD pipelines, including Virtual Machines, Containers (Docker), and Runners. Code analysis tools should scale well so the automated task of ensuring compliance with the programming standards can easily be achieved for bigger teams and teams spread in different locations around the globe. Picture 8 shows the use of the C-STAT static analysis tool used from the command line in Linux - Ubuntu. For many automated workflows, cross-platform support is a standard to improve efficiency for development teams.

Picture 8 - Automated workflows

Get help from code analysis

One of the major theoretical benefits of static analysis is that it does not impact the performance of a system since it’s not even running the system while running the analysis. It’s also independent of the quality of test suites. After all, finding a specific error in running code is dependent on executing a specific path through the program with a specific data set. Still, a static analysis tool can, in theory, examine all possible paths through the code.

By introducing code quality control early in the development cycle or while reusing code and future-proofing the source code, the impact of errors can be minimized. Providing static analysis right at the fingertips of developers working with RISC-V devices with well-defined coding standards can help them find issues in the source code during development, where the cost of errors is smaller than in the released product.

References

[1] https://en.wikipedia.org/wiki/COCOMO

[2] https://www.misra.org.uk/misra-c/

[3] https://cwe.mitre.org/

[4] https://wiki.sei.cmu.edu/confluence/display/seccode/Top+10+Secure+Coding+Practices?focusedCommentId=88044413