Integer Safety and Undefined Behaviour in C/C++ on IBM z/Architecture and z/TPF or z/OS
Abstract
That C/C++ does not guarantee the expected results from an arithmetic integer operation, nor will even notify us that something is awry, is something many of us try to ignore. Or if we’ve come from an assembly language programming background we might just assume that C/C++ takes care of checking the (e.g.) carry bit for us.
But as the size of our business documents grow this has become something we can no longer keep ignoring.
Target Audience
C/C++ systems and applications programmers, especially those on z/Architecture machines such as z/TPF and z/OS. COBOL programmers who want a good laugh.
Introduction
Poor code quality leads to unpredictable behaviour. From a user’s perspective that often manifests itself as poor usability. For an attacker it provides an opportunity to stress the system in unexpected ways.” – Seven Pernicious Kingdoms: A Taxonomy of Software Security Errors (Tsipenyuk, Chess, McGraw)
The ability to perform arithmetic operations safely, by which we mean warning us when an operation overflows or wraps around, is fundamental to computational integrity. A lack of integer safety has real-world consequences, both financial and life-altering:
Between 1985–87 software errors, including arithmetic overflow, caused massive overdoses of radiation to at least 6 patients, 3 of which died.
On 4 June 1996, the maiden flight of the Ariane 5 launcher ended in failure. Only about 40 seconds after initiation of the flight sequence, at an altitude of about 3700 m, the launcher veered off its flight path, broke up and exploded, caused by the most expensive software bug in history at a cost of $370M: code without protection against integer overflow.
In Dec 2004 a severe winter storm hit the American midwest resulting in the cancellation or postponement of 91% of Comair’s flights. The crew management system could only handle 32 000 changes per month and crashed, resulting in the cancellation of 3900 flights and the stranding of 200 000 passengers, at a cost of $20M: the entire profit from the previous quarter. The Comair President was ultimately to lose his job.
In 2015 the FAA and the EASA instructed Boeing 787 operators to periodically reset its electrical system to prevent loss of power and ram air turbine deployment as a result of integer overflow which would otherwise occur ever 248 days.
Does C care?
What does the C standard have to say about integer safety?
C11 6.5/5: If an exceptional condition occurs during the evaluation of an expression (that is, if the result is not mathematically defined or not in the range of representable values for its type), the behaviour is undefined.
Translation: The results of overflow/wraparound are undefined. The compiler can choose to do anything it wants. Even make demons fly out of your nose.1
But beware! Unsigned integers are special!
C11 6.2.5/9: The range of nonnegative values of a signed integer type is a subrange of the corresponding unsigned integer type, and the representation of the same value in each type is the same. A computation involving unsigned operands can never overflow, because a result that cannot be represented by the resulting unsigned integer type is reduced modulo the number that is one greater than the largest value that can be represented by the resulting type.
Translation: Unsigned integer types are special case and perform MODULO arithmatic by design.
Summary
The C standard was built upon the then-existing compiler implementations of Kernighan and Ritchie’s 1970s-era C specification, warts and all, and designed so that most existing variants would be standard-compliant. The policy of non-interference appears to have continued up to the current day.
Unsigned integers: reduced modulo MAX(type).
Signed integers: undefined.
Nothing is an error.
The situation appears that it might be getting worse: Some compiler developers seem to be optimising based on undefined behaviour delivering faster code that is, strictly speaking, standards compliant, but achieve that at the expense of expected or usual practice.
Impacts: CVE database
“Common Vulnerabilities and Exposures” Over 3000 recorded cases of integer issues.
- Availability
Undefined behaviour, crashes, infinite loops, DoS
- Integrity
Data corruption
- Confidentiality/Availability/Access Control
Bypass protection
The Common Vulnerabilities and Exposures database, funded by the US government, attempts to categorise problems with software products.
There are a number of flavours of “Numeric Issues (CWE-189)” in the CVE database. This is from CWE-190: “Integer Overflow or Wraparound”}
Availability: “DoS: Crash, Exit, or Restart; DoS: Resource Consumption (CPU); DoS: Resource Consumption (Memory); DoS: Instability”. This weakness will generally lead to undefined behavior and therefore crashes. In the case of overflows involving loop index variables, the likelihood of infinite loops is also high.
Integrity: “Modify Memory”. If the value in question is important to data (as opposed to flow), simple data corruption has occurred. Also, if the wrap around results in other conditions such as buffer overflows, further memory corruption may occur.
C/A/AC: “Execute Unauthorized Code or Commands; Bypass Protection Mechanism”. This weakness can sometimes trigger buffer overflows which can be used to execute arbitrary code. This is usually outside the scope of a program’s implicit security policy.
Does this affect IBM z Systems?
Yes!
Example: XML/JSON parsing of large B2B documents caused infinite loops and subsequent four months of forced manual processing as a direct result of integer overflow.
So, what happens on z/Architecture machines?
Refresher: Unsigned types
Bit Width | Minimum Value | Maximum Value |
8 |
0 |
28-1 FF16 25510 |
16 |
0 |
2^16^-1 FFFF~16~ 65 535~10~ |
32 |
0 |
2^32^-1 FFFF FFFF~16~ 4 294 967 295~10~ |
64 |
0 |
264-1 FFFF FFFF FFFF FFFF16 18 446 744 073 709 551 61510 |
Refresher: Signed types
Bit Width | Minimum Value | Maximum Value |
8 |
-(27) 8016 -12810 |
27-1 7F16 12710 |
16 |
-(215) 800016 -32 76810 |
215-1 7FFF16 32 76710 |
32 |
-(231) 8000 000016 -2 147 483 64810 |
231-1 7FFF FFFF16 2 147 483 64710 |
64 |
-(263) 8000 0000 0000 000016 -9 223 372 036 854 775 80810 |
263-1 7FFF FFFF FFFF FFFF16 9 223 372 036 854 775 80710 |
A demonstration
Lets run some tests. These were under z/TPF using the GCC compiler but the results should generalise.
Let’s define some variables at their limits. We’ll use specific types rather than generic ones so we can see what we’re getting:
// Define some signed integers
= 0x7f; // char
__int8_t s8 = 0x7fff; // short
__int16_t s16 = 0x7fffffff; // int
__int32_t s32 = 0x7fffffffffffffff; // long
__int64_t s64
// Define some unsigned integers
= 0xff;
__uint8_t u8 = 0xffff;
__uint16_t u16 = 0xffffffff;
__uint32_t u32 = 0xffffffffffffffff; __uint64_t u64
And print out their values before and after incrementing them.
(b, "max s8: %4i, s16: %6hi, s32: %11i, s64: %20li", s8, s16, s32, s64);
sprintf(b);
wtopc_text
++; s16++; s32++; s64++;
s8
(b, "p1 s8: %4i, s16: %6hi, s32: %11i, s64: %20li", s8, s16, s32, s64);
sprintf(b);
wtopc_text
// same as above for unsigned
Which gives us the results:
Signed integers:
max s8: 127, s16: 3 2767, s32: 2147483647, s64: 9223372036854775807
p1 s8: -128, s16: -32768, s32: -2147483648, s64: -9223372036854775808
Unsigned integers:
max u8: 255, u16: 65535, u32: 4294967295, u64: 18446744073709551615 p1 u8: 0, u16: 0, u32: 0, u64: 0
As promised: No errors generated! Unsigned integers wrap modulo n, as demanded by the C specification; Signed integers are allowed do anything they want; here they wrap from their maximum value to their minimum.
OK, so it’s a problem: Give me a solution
Regrettably, the standard development methodology seems to be to pick a length that seems to work ok in testing and ship that until it breaks, at which point increase the size and hope it doesn’t happen again. One has to suspect that there’s a better way.
A number of attempts at solutions have been implemented in various compilers with varying levels of success. There may be something incorporated into the next C2x standard. But we need a solution now and we need that solution to be both robust and fast, not dependent upon slow post-operation checks, which is all I’ve seen to date in the prototypes.
The IBM machine architecture has always provided safe arithmetic operations2, so let’s build a solution around that.
A possible API
We can fashion an alternate checked arithmetic function call around a model such as:
inline extern __attribute__((always_inline
bool ckd_add_u64( __uint64_t *u64sum,
,
__uint64_t u64a,
__uint64_t u64bbool abort )
Some of this is maybe a bit non-intuitive? Let’s walk through it:
The function call name starts ckd followed by the operation followed by the size of the operands. No attempt has been made to complicate the issue by supporting mixed-length arithmetic: if you want to add a 16 bit value to a 64 bit value, just cast the former before invoking the function.
The function takes a pointer to the result, values for the operands, and a boolean value to determine if it should abort if it detects an error or return an indicator.
inline: a suggestion to the compiler that this function should be inlined.
extern: prevent the generation of a callable function.
always_inline: force inlining even if the noinline override specified.
__uintNN_t: preferred to types such as short and long as the type size is made explicit.
u64a, u64b: addends.
u64sum: sum.
return: TRUE if overflow.
It is then as easy to use as, for example:
#include "checked_integer_arithmetic.inl"
if ckd_add_u64(&sum, a, b, false) {
// handle any overflow
}
// carry on with processing
We therefore define 12 function calls:
Addition, Subtraction, Multiplication, Division, Remainder
64, 32, 16 bit
We also add in our implementation:
Debug tracing selectable via #define
SERRC type selectable via #define as z/TPF, our target system, supports two types of system error processing. Extension to other z/Architecture operating systems is trivial.
Efficient solutions
The IBM z/Architecture has always provided safe arithmetic operations, so let’s build a solution around that.
The code that we would like the compiler to generate would be, for example:
algr ... add
brc 3,toobig branch on carry set
xr ... flag as successful ...
We can’t quite manage that as the gcc inline assembler code is unstable due to potential optimiser reordering of statements. (This might be possible in gcc v11+, but this not yet supported by z/TPF). But what we can instead is:
algr ... add
ipm ... get cc
nilf ... test carry bit/set sucessful jnz toobig branch on carry set
which we can generate without problem. It’s almost as good — one additional (fast) instruction.
16 bit types do not have hardware detection of overflow so post-operation checks are implemented instead.
Similarly for 8 bit types: We have not implemented support for these as the use case for this small range types is different.
For division, the quotient cannot exceed the dividend, so no overflow is possible. Functional support for these operations are then provided for i) consistency, ii) divide by zero trapping, iii) future expansion to unmatched types, and iv): obtaining both the quotient and the remainder for the cost of a single division operation.
A full implementation of a checked addition routine for unsigned 64 bit integers follows. Note that, due to those restrictions around how the the inline assembler interacts with the optimiser, we specify the minimum possible assembler, and code everything else in C. The optimiser then translates that to efficient code and optimises it, resulting in an efficient inlined routine.
/**
* Add two 64 bit unsigned integers checking the result for overflow
* @param *result pointer to the result, 64 bit unsigned
* @param u64a addend, unsigned 64 bit
* @param u64b addend, unsigned 64 bit
* @param abort boolean action on overflow: true for serrc+exit, false for return
* @return true for overflow detected
*/
inline extern __attribute__((always_inline))
bool ckd_add_u64( __uint64_t *result,
,
__uint64_t u64a,
__uint64_t u64bbool abort) {
;
__uint32_t ccpm
const __uint32_t carrybit = 0x20000000;
= u64a;
__uint64_t sum
("algr %[r1],%[r2] \n\t"
asm"ipm %[r3]"
: [r1] "+r" (sum), // output
[r3] "=r" (ccpm)
: [r2] "r" (u64b) // input
: "cc" // clobbers
);
bool overflow = (ccpm & carrybit) != 0; // check if carry bit set
#ifdef DEBUG
char msgbuf[200];
( msgbuf,
sprintf"ckd_add_u64 a: %lu, p b: %lu, sum: %lu\n ccpm: 0x%08x, overflow: %s",
, u64b, sum, ccpm,
u64a? "true" : "false" );
overflow ( msgbuf );
wtopc_text#endif
if (abort && overflow) {
( (enum t_serrc)(SERRC_EXIT), 0xDEAD00,
serrc_op"INTEGER OVERFLOW OCCURRED", NULL );
}
*result = sum;
return overflow;
}
There are, perhaps, some further optimisations which could be made, but testing of these requires a GCC level higher than 4.6.
Restrictions
Unsigned integers only: The focus here is on fixing pointer problems, though an extension to support signed integers is easy.
Matching lengths/types only: focus as above. (Non-matching can be provided, but complicates the interface)
Not infix. This is a C language restriction, and the major issue with this implementation. The change from being able to code using standard infix arithmetic notation to having to use prefix notation (i.e. function calls) is by far from being an easy sell. Given the seeming aversion of the C language standards committee to evolution, there can be no expectation of a solution to this.
There are some alternative behaviours that may be desired of integers, for example some algorithms depend upon two’s complement wrapping, others want integers at their limit value to stay there. Thus
Rust
, for instance, implements four different flavours of overflow behaviour, in addition to allowing checking to be turned on or off in production:wrapping_...
returns the straight two’s complement result,saturating_...
returns the largest/smallest value (as appropriate) of the type when overflow occurs,overflowing_...
returns the two’s complement result along with a boolean indicating if overflow occurred, andchecked_...
returns anOption
that’sNone
when overflow occurs
No attempt has been made to implement these.
If recompiling in C++ is an option though, it is trivial to provide class definitions and overload the standard arithmetic operators to provide infix capability and implement as much of this additional functionality as is required.
Further reading
David Svoboda – Towards Integer Safety
(https://www.open-std.org/jtc1/sc22/wg14/www/docs/n2428.pdf)Tsipenyuk, Chess, McGraw — Seven Pernicious Kingdoms: A Taxonomy of Software Security Errors (https://samate.nist.gov/SSATTM_Content/papers/Seven%20Pernicious%20Kingdoms%20-%20Taxonomy%20of%20Sw%20Security%20Errors%20-%20Tsipenyuk%20-%20Chess%20-%20McGraw.pdf)
Common Weakness Enumeration: Numeric Errors (https://cwe.mitre.org/data/definitions/189.html)
C11 final draft (https://www.open-std.org/jtc1/sc22/wg14/www/docs/n1570.pdf)
Victor Yodaiken – How ISO C became unusable for operating systems development (https://arxiv.org/pdf/2201.07845.pdf)
JeanHeyd Meneide – Undefined behavior, and the Sledgehammer Principle (https://thephd.dev/c-undefined-behavior-and-the-sledgehammer-guideline)
Presentation deck
A full presentation deck of this article is available here. This deck is best displayed with the free open source Pympress PDF presentation tool, which natively supports notes on a second screen, or something similar.
Thanks
This article is based on work performed for SNCF, and is published with their permission, for which many thanks.
Footnotes
nasal demons: n. Recognized shorthand on the Usenet group comp.std.c for any unexpected behavior of a C compiler on encountering an undefined construct. During a discussion on that group in early 1992, a regular remarked “When the compiler encounters [a given undefined construct] it is legal for it to make demons fly out of your nose” (the implication is that the compiler may choose any arbitrarily bizarre way to interpret the code without violating the ANSI C standard). Someone else followed up with a reference to “nasal demons”, which quickly became established. The original post is web-accessible at http://groups.google.com/groups?hl=en&selm=10195%40ksr.com.↩︎
At least for some data lengths for some operations. Good orthogonality has never been a feature of the instruction set.↩︎