Discovery: Integer overflow in functions from scanf() family in MinGW, Cygwin, Embarcadero C and other environments at loading a number to char variable

Sat, 11 March 2017


Biting deeply into the details of error which I describe under my cycle of articles called „C Language - Time consuming errors for which people programmers are jumping from bridges” I noticed very interesting thing, which casues that at the moment of writting this article all programs compiled under Windows system on MinGW GCC, Cygwin GCC, Embarcadero/Borland C compilers AND which loads number data to variable of type char with usage of scanf() family functions are vulnerable to integer overflow bug (!) :) However, this applies only to those programs that use scanf specified specifier.

After a moment of panic, you can go back to the article:) in which I will try to slightly brighten up what the problem is.

What is this error?

The problem lies not so much in the compiler (well, maybe a little bit-about that in a moment) as in Windowsowskiej MSVCRT library that contains an implementation of glibc, and so features such as scanf. The library is not compatible with the standard C99 (ISO 9899:1999) (and at most of the C89) and does not implement all format specifiers the scanf function (exactly the point specifiers contained in section 11 on page 358 [displayed in as p. 370] the C99 specification).

Click here to preview this page specifications

The whole essence of this error is that the functions of the family scanf getting a format specifiers which they do not support, doesn't omit a given element of the format, but forcefully trying to load it. In cases where these unsupported specifiers reduce the size of the type to be loaded (eg. int is getting short int) in each case we have an integer overflow.

MSVCRT not implement, for example, the specifier "h", by which we can indeed write "% HHU" (which means taking 1 byte values ​​of type unsigned char), but the function scanf in these environments, and so fetch us the 4-byte integer (and therefore leave all specifiers "h" format and will treat it as a "% u"), which will end up overwriting an additional 3 bytes of memory placed in storage on our one byte unsigned char. And in the case of the example shown below (which format scanf () keeps in a separate variable - this is important), even we do not inform any warning that does not know the specifier "h" (even if you turn on the parameters -Wall -Wextra). To the compiler MinGW us informed, should serve as a literal format and compile with -Wall parameter (compiler Embarcadero / Borland does not inform us in any case, even when compiling with -w). / Interestingly, from my conversations with Embarcadero that the compiler does not use the library MSVCRT / MSVCRT does not implement, for example, the "h" specifier, by which we can indeed write "%hhu" (which means taking 1 byte value of type unsigned char), but the function scanf in these environments fetch us the 4-byte integer (and so leave all "h" specifiers in format and will treat it as "%u"), which will end up overwriting additional 3 bytes of memory contained over the memory of our single-byte unsigned char variable. And, in the case below the given example (which the format function scanf() keeps in a separate variable-this is important) even does not inform us any warning that does not know the specifier "h" (even if I turn on the parameters-Wall-Wextra).
MinGW compiler can inform you, but you must serve format string as a literal and compile with -Wall (the compiler Embarcadero/Borland does not inform us in any case, even when compiling with the -w parameter). /What's interesting, from my conversations with Embarcadero, it appears that their compiler does not use MSVCRT/

When the compiler will inform us about the ignorance of the specifier "h"
Compiler nameWhen it will inform us?
MinGWonly when parameter format in scanf function will be literal and we turn on "all warnings" the -Wall parameter
Cygwinonly when parameter format in scanf function will be literal and we turn on "all warnings" the -Wall parameter
Borland/Embarcadero C/C++Never. Even with compiling with the -w parameter.

Spice adds the fact that even if you build forcefully in C99 standard (one would think):
gcc c:\main.c -std=c99 -o c:\main.exe
and so MSVCRT library will not be compatible with the standard (that library which is used in MinGW was released in 1998, even before the publication of the C99 standard) and vulnerability will continue to present. Unfortunately, the compiler does not warn you about this.

Vulnerable are all functions from the scanf family: fscanf, sscanf, vscanf, vfscanf vsscanf, in MinGW compilers, Cygwin, Embarcadero BCC32C (really, all of these functions use vsscanf which is a function that holds all the logic of what the scanf and other functions do for us). Visual Studio environments are not affected.

An additional interesting note is the fact that the specification ISO/IEC 9899:1999 (C99) (and so this specification, which defines the "h" specifiers and others) clearly provides that in the case of incorrect format function behaviour is unspecified. The problem is that, however, it should be specified as the bypassing function arguments corresponding to the elements of a format that is not fully supported.

In my conversation with Gynvael Coldwind, the legend of polish hackers scene, he also confirmed that:
scanf should not insist on carrying out the tag in this case.

An example exploit

Take on the workshop the following code: #include <stdio.h> #include <stdbool.h> typedef volatile unsigned char uint8_t; int main() { printf("Enter the number from range 0-255: "); bool allowAccess = false; uint8_t userNumber; char format[] = "%hhu"; scanf(format, &userNumber); if (allowAccess) { printf("Access granted: very secret thing\n"); } printf("Entered number is: %d\n", userNumber); return 0; }

As we can see, allowAccess variable is set to false, and nowhere on does not change explicitly its value to true. Secret data should never be displayed (and so be it as long as we are politely enter the value from a specified range). What happens, however, when we "overflow" our variable userNumber in the way that variable allowAccess will be overwrited of any non-zero number (the variable is evaluated by the processor as true when it has any value that is not a equal zero). So let's try to hack into the program to obtain secret data. To do this, run the program and -asked- let's enter any number that will not fit in a single byte - eg. 256. The result of the action programme will be the following:

Enter the number from range 0-255: 256
Access granted: very secret thing
Entered number is: 0

By appropriate machinations on the input data, we obtained access to secret data to which we should never have access to.

Other functions of the scanf() family: sscanf()

The sscanf of scanf differs only that retrieves data from a text buffer, rather than from standard input (keyboard). Both functions refer to mentioned above function vsscanf, so also sscanf is vulnerable - let's check:
#include <stdio.h> #include <stdbool.h> typedef volatile unsigned char uint8_t; int main() { bool allowAccess = false; uint8_t userNumber; char format[] = "%hhu"; char buffer[] = "257\n"; sscanf(buffer, format, &userNumber); if (allowAccess) { printf("Access granted: very secret thing\n"); } printf("Entered number is: %d\n", userNumber); return 0; }

Another example of exploit: Null character overwriting

UPDATED 11.02.2017

Another interesting example of the application of our error is the ability to override byte 0, which marks the end of a character string. Assume that we have an application requesting a PIN (in the range 0-255 - calm down, it is known that no one as short PINs as it doesn't apply, but it is an example, right?), and then requesting the password (which we don't know). Overwriting a single-byte variable that holds the PIN we can so poison the final character string in memory in the line over the declaration of our variable, that make this poisoned program will print the master password to which it compares the password inputed by the user. Let's look: #include <stdio.h> #include <stdbool.h> #include <string.h> #define PASSWORD_MAX_LENGTH 64 typedef volatile unsigned char uint8_t; void debugPrintMemoryNearVariable(uint8_t* pointer) { for (long long int i=((long long int)pointer)-10; i<((long long int)pointer)+10; i++) { printf("%x (%c)\t", *((uint8_t*)i),*((uint8_t*)i) ); } printf("\n"); } int main() { char strProvidePin[] = "Please provide PIN number [0-255]: "; char strProvidePass[] = "Please provide password: "; char passPhraseToCompare[] = "very secret password to compare which can't leak\n"; char lang[1] = "E"; uint8_t userPIN; char userPass[PASSWORD_MAX_LENGTH+1]; char formatPIN[] = "%hhu"; /* printf("Memory before:\n"); debugPrintMemoryNearVariable(&userPIN); */ printf("%s", strProvidePin); scanf(formatPIN, &userPIN); /* printf("Memory after:\n"); debugPrintMemoryNearVariable(&userPIN); */ printf("%s", strProvidePass); getchar(); fgets (userPass, PASSWORD_MAX_LENGTH, stdin); printf("\n\nSelected language: %s\n", lang); // Password comparing if (strncmp(userPass, passPhraseToCompare, PASSWORD_MAX_LENGTH)==0) { printf("\nYou're logged in.\n"); // authorized operations }else{ printf("\nWrong password.\n"); } return 0; } When you run the program, let's enter to it PIN number: 1094795520. We find out that in the place where it has display the selected language we have the whole secret password instead (without the first two letters) with which the program had to compare the one entered by the user. This is because it overwrited a terminal null character included in the langvariable (this is a byte with a value of 0), the function which displays text didn't noted the end (because I could not), and it displayed out the characters until the next null character (of the next string in the memory) - in this case, the secret master password.
There is one interesting fact related to this code. While in the MinGW actually overwrite all four bytes, in the Embarcadero/Borland C (latest version, 7.20, the time when I'm writing the article) overwrite only two bytes of a text string lang (just lucky for the presented bug is also the last, null termination char of lang string, so the example above works).

Vulnerabilities in various environments

Vulnerable environment
Environment nameVulnerability
Windows environments
32-bit environments
MinGW GCC 4.4.1 (32-bit) Yes
MinGW GCC 5.3.0-3 (32-bit)
[the newest on day 19.02.2017]
Cygwin GCC 4.4.1 (32-bit)
[the newest on day 19.02.2017]
Embarcadero C++ 7.20 for Win32 / bcc32c version 3.3.1 Yes
VisualStudio 2015 (14.0) 32-bit No
VisualStudio 2005 (8.0) 32-bit No
64-bit environments
MinGW 6.3.0 for i686
[the newest on day 19.02.2017]
MinGW 6.3.0 for x86_64 (posix-seh)
[the newest on day 19.02.2017]
Cygwin64 GCC 5.3.0 (64-bit) Yes
VisualStudio 2015 (14.0) 64-bit No
Unix environments
RedHat GCC 4.8.5 (on Linux) No

Why VisualStudio is not vulnerable?

At first moment I thought they have found this problem and fixed it (when you try to use scanf functions we're geting warning to use scanf_s which, however, does not fit in the C99 standard). However, the case looks differently. VisualStudio already use the newer libraries, run-time (derived from MSVCRT) that implement a much larger part of the C99 standard-including the specifier "h". That's why it's not working.
Postfix "_s" suggested by VisualStudio function goes for something different: Unlike the less secure version - i.e. sscanf, functions with postfix "_s" support an additional parameter, buffer size, but only when you use the type specifiers, c, c, s, S (and not in the case of our "%hhu"). The buffer size is given as a parameter to the following parameter-references to a variable (more about VisualStudio-specific features can be found here) - let's look:
wchar_t buffer[10]; // buffer size is 10, width specification is 9 swscanf_s(input_string, L"%9s", buffer, (unsigned)_countof(buffer)); Versions of VisualStudio before the 4.0 and from 7.0 to 13.0 use differently named DLLs for each version (MSVCR20. Dll MSVCR70. MSVCR71 Dll. DLL, MSVCR80. DLL [VS 2005], MSVCR90. DLL [VS 2008], MSVCR100 [VS 2010], MSVCP110. DLL, etc.). With 14.0 version of VisualStudio (2015), the library was moved to the new DLL file named UCRTBASE.DLL (but programs are required to link to the desired version of the library named "VCRUNTIME140. DLL" - with numbers changing in tact to future versions - a veritable hell on Earth).
This software instllers should take care of it that appropriate version of the MSVCRT is present in the system (packages named "Visual C++ Redistributable Package" installed together with some programs are exactly for that purpose). With Windows is installed by default one version of this library.

Gynvael Coldwind accurately noted:

"hh" does not occur anywhere in the specification [Microsoft Visual Studio C/C++ -add Lukas], so you cannot expect that it will work. (...) Interesting is something else - that works on newer versions of the C Library Microsoft correctly. And this is interesting because according to documentation "hh" is not supported:

"he hh, j, z, and t length prefixes are not supported."

In contrast, what wouldn't be here to speak, %hhu is working properly ^_-

What to do, how to live?

The solution of the problem for MinGW

The best solution would of course be linking newer libraries. However, today, after a few hours solving the problem I not discovered how to force MinGW to use another library MSVCRT than that to which it links by default (maybe someone of you succeed). 07.03.2017 UPDATE: it turns out that GCC/G ++ in MinGW environment (unfortunately it does not work on Cygwin, or Embarcadero C) has a special build parameter (-D__USE_MINGW_ANSI_STDIO), which makes the default, old library MSVCRT is replaced by another-later.
So in order to compile program without vulnerability, you can use the following command:

gcc main.c -o main.exe -D__USE_MINGW_ANSI_STDIO
With such a compilation, the compiler will warn us before, that format is not a string literal and that the compiler do not check it.
And it is best to compile with both flags:
gcc main.c -o main.exe -Wall -Wextra -Wformat=2 -D__USE_MINGW_ANSI_STDIO

Additional warnings for safety

UPDATE 05.03.2017

(Thanks to Andrew Pinski from GNU GCC Project): You can force the compiler to warn about the impossibility of checking the parameter format (which is a pointer to a character buffer) in the code quoted above. There is a parameter such as the build -Wformat-nonliteral that you can use, but it is best to use the parameter, which includes a whole group of safety warnings, namely -Wformat=2. So we can compile it using this syntax:

gcc main.c -o main.exe -Wall -Wextra -Wformat=2
With this compilation, the compiler will warn us before that the format is not literal, and it has not checked its correctness.
So, the best is compiling with both flags:
gcc main.c -o main.exe -Wall -Wextra -Wformat=2 -D__USE_MINGW_ANSI_STDIO

A possible workaround

There are still obvious "workaround": even if we want to load the value from range 0-255, that would hold in our 1-byte unsigned char variable, then we should use for this purpose a variable of type integer (and not of type char). In another case, as you can see, our application will be vulnerable to error integer overflow.

Possible errors in applications

Gynvael Coldwind wrote:
I did a grep on my disk in the directory with the programs with %hhu search and it shows some 400 files, most of which was linked with the new URmsvc*.dll, or did not have the scanf (just like printf). However, the fact is some of the URmsvcrt.dll be found, so you probably you could have a look (I encourage You to do it)
So I do when I get some free time and I will describe any findings.


Gynvael Coldwind very wisely wrote about this type of error:
It is certainly some risk/something for what it's worth to pay attention when viewing the application code. The definitive definitely whether something is "error" or "security error" in such cases (errors in the API libraries) always bring to a specific case of a particular application, in which it was badly used.
I wish you the least amount of this type of error. Although, as someone once wisely said:
An expert is a person who has committed all possible mistakes in your garden plot.
I have a big request to people who read the article to click below on the stars in order to rate the material. Thanks!
Discovery: Integer overflow in functions from scanf() family in MinGW, Cygwin, Embarcadero C and other environments at loading a number to char variable Autor opinii: Czytelnicy, data przesłania: 0


Aby zamieścić komentarz, proszę włączyć JavaScript - niestety roboty spamujące dają mi niezmiernie popalić.

Komentarze czytelników

    Nie ma jeszcze żadnych komentarzy.