Integer Safety and Undefined Behaviour in C/C++ on IBM z/Architecture and z/TPF or z/OS

Abstract

That C/C++ does not guarantee the expected results from an arithmetic integer operation, nor will even notify us that something is awry, is something many of us try to ignore. Or if we’ve come from an assembly language programming background we might just assume that C/C++ takes care of checking the (e.g.) carry bit for us.

But as the size of our business documents grow this has become something we can no longer keep ignoring.

Target Audience

C/C++ systems and applications programmers, especially those on z/Architecture machines such as z/TPF and z/OS. COBOL programmers who want a good laugh.

Introduction

Poor code quality leads to unpredictable behaviour. From a user’s perspective that often manifests itself as poor usability. For an attacker it provides an opportunity to stress the system in unexpected ways.” – Seven Pernicious Kingdoms: A Taxonomy of Software Security Errors (Tsipenyuk, Chess, McGraw)

The ability to perform arithmetic operations safely, by which we mean warning us when an operation overflows or wraps around, is fundamental to computational integrity. A lack of integer safety has real-world consequences, both ﬁnancial and life-altering:

Between 1985–87 software errors, including arithmetic overﬂow, caused massive overdoses of radiation to at least 6 patients, 3 of which died.
On 4 June 1996, the maiden ﬂight of the Ariane 5 launcher ended in failure. Only about 40 seconds after initiation of the ﬂight sequence, at an altitude of about 3700 m, the launcher veered oﬀ its ﬂight path, broke up and exploded, caused by the most expensive software bug in history at a cost of $370M: code without protection against integer overﬂow.
In Dec 2004 a severe winter storm hit the American midwest resulting in the cancellation or postponement of 91% of Comair’s ﬂights. The crew management system could only handle 32 000 changes per month and crashed, resulting in the cancellation of 3900 ﬂights and the stranding of 200 000 passengers, at a cost of $20M: the entire proﬁt from the previous quarter. The Comair President was ultimately to lose his job.
In 2015 the FAA and the EASA instructed Boeing 787 operators to periodically reset its electrical system to prevent loss of power and ram air turbine deployment as a result of integer overﬂow which would otherwise occur ever 248 days.

Does C care?

What does the C standard have to say about integer safety?

C11 6.5/5: If an exceptional condition occurs during the evaluation of an expression (that is, if the result is not mathematically defined or not in the range of representable values for its type), the behaviour is undefined.

Translation: The results of overflow/wraparound are undefined. The compiler can choose to do anything it wants. Even make demons fly out of your nose.¹

But beware! Unsigned integers are special!

C11 6.2.5/9: The range of nonnegative values of a signed integer type is a subrange of the corresponding unsigned integer type, and the representation of the same value in each type is the same. A computation involving unsigned operands can never overflow, because a result that cannot be represented by the resulting unsigned integer type is reduced modulo the number that is one greater than the largest value that can be represented by the resulting type.

Translation: Unsigned integer types are special case and perform MODULO arithmatic by design.

Summary

The C standard was built upon the then-existing compiler implementations of Kernighan and Ritchie’s 1970s-era C specification, warts and all, and designed so that most existing variants would be standard-compliant. The policy of non-interference appears to have continued up to the current day.

Unsigned integers: reduced modulo MAX(type).
Signed integers: undefined.
Nothing is an error.

The situation appears that it might be getting worse: Some compiler developers seem to be optimising based on undefined behaviour delivering faster code that is, strictly speaking, standards compliant, but achieve that at the expense of expected or usual practice.

Impacts: CVE database

“Common Vulnerabilities and Exposures” Over 3000 recorded cases of integer issues.

Availability

Undeﬁned behaviour, crashes, inﬁnite loops, DoS

Integrity

Data corruption

Conﬁdentiality/Availability/Access Control

Bypass protection

The Common Vulnerabilities and Exposures database, funded by the US government, attempts to categorise problems with software products.
There are a number of flavours of “Numeric Issues (CWE-189)” in the CVE database. This is from CWE-190: “Integer Overflow or Wraparound”}
Availability: “DoS: Crash, Exit, or Restart; DoS: Resource Consumption (CPU); DoS: Resource Consumption (Memory); DoS: Instability”. This weakness will generally lead to undeﬁned behavior and therefore crashes. In the case of overﬂows involving loop index variables, the likelihood of inﬁnite loops is also high.
Integrity: “Modify Memory”. If the value in question is important to data (as opposed to ﬂow), simple data corruption has occurred. Also, if the wrap around results in other conditions such as buﬀer overﬂows, further memory corruption may occur.
C/A/AC: “Execute Unauthorized Code or Commands; Bypass Protection Mechanism”. This weakness can sometimes trigger buﬀer overﬂows which can be used to execute arbitrary code. This is usually outside the scope of a program’s implicit security policy.

Does this affect IBM z Systems?

Yes!

Example: XML/JSON parsing of large B2B documents caused infinite loops and subsequent four months of forced manual processing as a direct result of integer overflow.

So, what happens on z/Architecture machines?

Refresher: Unsigned types

Bit Width	Minimum Value	Maximum Value
8	0	2⁸-1 FF₁₆ 255₁₀
16	0	2^16^-1 FFFF~16~ 65 535~10~
32	0	2^32^-1 FFFF FFFF~16~ 4 294 967 295~10~
64	0	2⁶⁴-1 FFFF FFFF FFFF FFFF₁₆ 18 446 744 073 709 551 615₁₀

Refresher: Signed types

Bit Width	Minimum Value	Maximum Value
8	-(2⁷) 80₁₆ -128₁₀	2⁷-1 7F₁₆ 127₁₀
16	-(2¹⁵) 8000₁₆ -32 768₁₀	2¹⁵-1 7FFF₁₆ 32 767₁₀
32	-(2³¹) 8000 0000₁₆ -2 147 483 648₁₀	2³¹-1 7FFF FFFF₁₆ 2 147 483 647₁₀
64	-(2⁶³) 8000 0000 0000 0000₁₆ -9 223 372 036 854 775 808₁₀	2⁶³-1 7FFF FFFF FFFF FFFF₁₆ 9 223 372 036 854 775 807₁₀

A demonstration

Lets run some tests. These were under z/TPF using the GCC compiler but the results should generalise.

Let’s define some variables at their limits. We’ll use specific types rather than generic ones so we can see what we’re getting:

// Define some signed integers
__int8_t s8 = 0x7f; // char
__int16_t s16 = 0x7fff; // short
__int32_t s32 = 0x7fffffff; // int
__int64_t s64 = 0x7fffffffffffffff; // long

// Define some unsigned integers
__uint8_t u8 = 0xff;
__uint16_t u16 = 0xffff;
__uint32_t u32 = 0xffffffff;
__uint64_t u64 = 0xffffffffffffffff;

And print out their values before and after incrementing them.

sprintf(b, "max s8: %4i, s16: %6hi, s32: %11i, s64: %20li", s8, s16, s32, s64);
wtopc_text(b);

s8++; s16++; s32++; s64++;

sprintf(b, "p1 s8: %4i, s16: %6hi, s32: %11i, s64: %20li", s8, s16, s32, s64);
wtopc_text(b);

// same as above for unsigned

Which gives us the results:

Signed integers:
max s8: 127, s16: 3 2767, s32: 2147483647, s64: 9223372036854775807
p1 s8: -128, s16: -32768, s32: -2147483648, s64: -9223372036854775808

Unsigned integers:
max u8: 255, u16: 65535, u32: 4294967295, u64: 18446744073709551615
p1 u8: 0, u16: 0, u32: 0, u64: 0

As promised: No errors generated! Unsigned integers wrap modulo n, as demanded by the C speciﬁcation; Signed integers are allowed do anything they want; here they wrap from their maximum value to their minimum.

OK, so it’s a problem: Give me a solution

Regrettably, the standard development methodology seems to be to pick a length that seems to work ok in testing and ship that until it breaks, at which point increase the size and hope it doesn’t happen again. One has to suspect that there’s a better way.

A number of attempts at solutions have been implemented in various compilers with varying levels of success. There may be something incorporated into the next C2x standard. But we need a solution now and we need that solution to be both robust and fast, not dependent upon slow post-operation checks, which is all I’ve seen to date in the prototypes.

The IBM machine architecture has always provided safe arithmetic operations², so let’s build a solution around that.

A possible API

We can fashion an alternate checked arithmetic function call around a model such as:

inline extern __attribute__((always_inline
bool ckd_add_u64( __uint64_t *u64sum,
                  __uint64_t u64a,
                  __uint64_t u64b,
                  bool abort )

Some of this is maybe a bit non-intuitive? Let’s walk through it:

The function call name starts ckd followed by the operation followed by the size of the operands. No attempt has been made to complicate the issue by supporting mixed-length arithmetic: if you want to add a 16 bit value to a 64 bit value, just cast the former before invoking the function.
The function takes a pointer to the result, values for the operands, and a boolean value to determine if it should abort if it detects an error or return an indicator.
inline: a suggestion to the compiler that this function should be inlined.
extern: prevent the generation of a callable function.
always_inline: force inlining even if the noinline override speciﬁed.
__uintNN_t: preferred to types such as short and long as the type size is made explicit.
u64a, u64b: addends.
u64sum: sum.
return: TRUE if overﬂow.

It is then as easy to use as, for example:

#include "checked_integer_arithmetic.inl"

if ckd_add_u64(&sum, a, b, false) {
   // handle any overflow
}
// carry on with processing

We therefore define 12 function calls:

Addition, Subtraction, Multiplication, Division, Remainder
64, 32, 16 bit

We also add in our implementation:

Debug tracing selectable via #deﬁne
SERRC type selectable via #deﬁne as z/TPF, our target system, supports two types of system error processing. Extension to other z/Architecture operating systems is trivial.

Eﬃcient solutions

The IBM z/Architecture has always provided safe arithmetic operations, so let’s build a solution around that.

The code that we would like the compiler to generate would be, for example:

algr ...     add
brc 3,toobig branch on carry set
xr ...       flag as successful
...

We can’t quite manage that as the gcc inline assembler code is unstable due to potential optimiser reordering of statements. (This might be possible in gcc v11+, but this not yet supported by z/TPF). But what we can instead is:

algr ...   add
ipm ...    get cc
nilf ...   test carry bit/set sucessful
jnz toobig branch on carry set

which we can generate without problem. It’s almost as good — one additional (fast) instruction.

16 bit types do not have hardware detection of overﬂow so post-operation checks are implemented instead.
Similarly for 8 bit types: We have not implemented support for these as the use case for this small range types is different.
For division, the quotient cannot exceed the dividend, so no overﬂow is possible. Functional support for these operations are then provided for i) consistency, ii) divide by zero trapping, iii) future expansion to unmatched types, and iv): obtaining both the quotient and the remainder for the cost of a single division operation.

A full implementation of a checked addition routine for unsigned 64 bit integers follows. Note that, due to those restrictions around how the the inline assembler interacts with the optimiser, we specify the minimum possible assembler, and code everything else in C. The optimiser then translates that to efficient code and optimises it, resulting in an efficient inlined routine.

/**
 * Add two 64 bit unsigned integers checking the result for overflow
 * @param *result pointer to the result, 64 bit unsigned
 * @param u64a addend, unsigned 64 bit
 * @param u64b addend, unsigned 64 bit
 * @param abort boolean action on overflow: true for serrc+exit, false for return
 * @return true for overflow detected
 */
inline extern __attribute__((always_inline))    
bool ckd_add_u64( __uint64_t *result, 
                  __uint64_t u64a, 
                  __uint64_t u64b, 
                  bool abort) {
    __uint32_t ccpm;

    const __uint32_t carrybit = 0x20000000;

    __uint64_t sum = u64a;

    asm("algr %[r1],%[r2] \n\t"
        "ipm  %[r3]"
        : [r1] "+r" (sum),  // output
          [r3] "=r" (ccpm)
        : [r2] "r"  (u64b)  // input
        : "cc"              // clobbers
       );

    bool overflow = (ccpm & carrybit) != 0; // check if carry bit set

#ifdef DEBUG
    char msgbuf[200];
    sprintf( msgbuf, 
             "ckd_add_u64 a: %lu, p b: %lu, sum: %lu\n  ccpm: 0x%08x, overflow: %s",
            u64a, u64b, sum, ccpm,
            overflow ? "true" : "false" );
    wtopc_text( msgbuf );
#endif

    if (abort && overflow) {
        serrc_op( (enum t_serrc)(SERRC_EXIT), 0xDEAD00, 
                  "INTEGER OVERFLOW OCCURRED", NULL );
    }

    *result = sum;

    return overflow;
}

There are, perhaps, some further optimisations which could be made, but testing of these requires a GCC level higher than 4.6.

Restrictions

Unsigned integers only: The focus here is on ﬁxing pointer problems, though an extension to support signed integers is easy.
Matching lengths/types only: focus as above. (Non-matching can be provided, but complicates the interface)
Not inﬁx. This is a C language restriction, and the major issue with this implementation. The change from being able to code using standard infix arithmetic notation to having to use prefix notation (i.e. function calls) is by far from being an easy sell. Given the seeming aversion of the C language standards committee to evolution, there can be no expectation of a solution to this.
There are some alternative behaviours that may be desired of integers, for example some algorithms depend upon two’s complement wrapping, others want integers at their limit value to stay there. Thus Rust , for instance, implements four different flavours of overflow behaviour, in addition to allowing checking to be turned on or off in production:
- wrapping_... returns the straight two’s complement result,
- saturating_... returns the largest/smallest value (as appropriate) of the type when overflow occurs,
- overflowing_... returns the two’s complement result along with a boolean indicating if overflow occurred, and
- checked_... returns an Option that’s None when overflow occurs
No attempt has been made to implement these.

If recompiling in C++ is an option though, it is trivial to provide class definitions and overload the standard arithmetic operators to provide infix capability and implement as much of this additional functionality as is required.

Presentation deck

A full presentation deck of this article is available here. This deck is best displayed with the free open source Pympress PDF presentation tool, which natively supports notes on a second screen, or something similar.

Thanks

This article is based on work performed for SNCF, and is published with their permission, for which many thanks.

Author

Ian S. Worthington has over 25 years’ experience as a developer and systems SME delivering solutions with high-volume, high-performance transaction processing systems using z/OS, z/TPF, ALCS, z/VM, and Linux on (and off) IBM Z. His customers have included IBM (three of ’em: US, UK, and India), British Airways, American Express, Citibank (Assoc Bancorp), and currently SNCF. He holds a BSc in Applied Physics, and an MSc in Database and Information Science, both from the University of London.

Footnotes

nasal demons: n. Recognized shorthand on the Usenet group comp.std.c for any unexpected behavior of a C compiler on encountering an undefined construct. During a discussion on that group in early 1992, a regular remarked “When the compiler encounters [a given undefined construct] it is legal for it to make demons fly out of your nose” (the implication is that the compiler may choose any arbitrarily bizarre way to interpret the code without violating the ANSI C standard). Someone else followed up with a reference to “nasal demons”, which quickly became established. The original post is web-accessible at http://groups.google.com/groups?hl=en&selm=10195%40ksr.com.↩︎
At least for some data lengths for some operations. Good orthogonality has never been a feature of the instruction set.↩︎

Integer Safety in C/C++ on IBM z/Architecture and z/TPF or z/OS