Bugs in C and D languages

by leonardo maffi
Version 1.0, Nov 24 2007.

Keywords: bugs, c, d language

[Go back to the index]

This is an interesting page that shows some of most common bugs in C programs (expecially the ones caused by the problems in the language itself):
http://www.andromeda.com/people/ddyer/topten.html
http://www.andromeda.com/people/ddyer/top-ten-supplement.html

The best way to avoid most of those bugs isn't to use complex tools, but to use a different language, that cares more about avoiding bugs. The D language helps in many of those problems. This is a little C program that shows some problems (I have packed the code horizontally and vertically, but I usually add empty lines and I use 4 spaces indent. I compile this C code with MinGW):

#include "stdio.h"
int main(void) {
  int i;
  int a[10];
  for (i = 0; i < 20; i++);
    a[i] = i;
  printf("%f\n", a[3]);
  return 0;
}
Some of the problems of that code:
- You have to take care to initialize 'a', otherwise it contains random stuff (if you move it under the include, then a[3] will be 0).
- The syntax of the 'for' repeats the name of the variable three times, and repeating is bad because repeated information tends to become out of sync (in this tiny program if you use j<20 the compiler catches it because j doesn't exist, but in a more complex program j may be defined). This is useful for complex situations where you need the max flexibility, but often loops are simple, and it may be quite useful to have a simpler syntax too.
- The for has a ';' at the end, it's a common mistake. The a[3] contains random stuff.
- The a doesn't know its length, so you can overscan it, causing all kind of problems.
- printf() doesn't know the types of the data you want to print.

This is a first translation to D:
import std.stdio;
void main() {
  int i;
  int[10] a;
  for(i = 0; i < 20; i++);
    a[i] = i;
  writefln(a[3]);
}
Note that using int a[10]; is fine too.
The compiler catches the stray ';' giving:
bugd.d(5): use '{ }' for an empty statement, not a ';'

Now this compiles:
import std.stdio;
void main() {
  int i;
  int[10] a;
  for(i = 0; i < 20; i++)
    a[i] = i;
  writefln(a[3]);
}
But during the run it gives:
Error: ArrayBoundsError bugd(6)

Because I have compiled it without any flag. Using -release you can disable those cheeks when you need more speed and you think your program is correct.
The best way, if you don't need max speed is just to use a foreach, and avoid many possible problems:
import std.stdio;
void main() {
  int[10] a;
  foreach(i, ref el; a)
    el = i;
  writefln(a[3]);
}
But note you have to put that 'ref', otherwise the a array isn't modified (this is a possible source of bugs).
import std.stdio;
void main() {
  string[2] arr = ["Less than 50%", "More than 7%"];
  foreach(s; arr)
    writefln(s);
}
But D has its shares of bugs too. This is a simple example:
import std.stdio;
void main() {
  auto arr = ["Less than 500$", "More than 7$"];
  writefln(typeid(typeof(arr)));
  foreach(s; arr)
    writefln(s);
}
There's a bug, because arr doesn't become an array of strings (dynamic arrays of char) but a static array of static array of chars:
char[14][2]
And the initialization strings have different lengths, this produces trash when you print the second string. To avoid that bug you can use any of the following:
string[] arr = ["Less than 500$", "More than 7$"];
auto arr = ["Less than 500$".dup, "More than 7$"];
auto arr = ["Less than 500$"[], "More than 7$"];
But even the following code has a bug (note the restricted import, in D it's usually a very good idea to specify what names you want to import from a module, to avoid polluting the namespace and avoiding some bugs):
import std.stdio: writefln;
void main() {
  auto arr = ["Less than 50%"[], "More than 7%"];
  foreach(s; arr)
    writefln(s);
}
It gives:
Error: std.format invalid specifier
because % is a special symbol if you use it with writef/writefl. To avoid that the best way I have found in D 1.x is to write my printing functions, put/putr (in the D v.2.x there are write/writeln that can be used that bug, because they ignore %):
import d.string: putr;
void main() {
  auto arr = ["Less than 50%", "More than 7%"];
  foreach(s; arr)
    putr(s);
}
This works correctly, but the compilation is a bit slower and the resulting exe (on Windows) is bigger (189 KB instead of 116 KB) because the putr() requires some code. Sometimes put/putr are 2 times slower than writefln, but in lot of situations they manage things much better and more correctly, and give a nicer output.


Here are some comments on the "topten" page about bugs in C, compared to the the situation in D:

1. Non-terminated comments:

C code:

a=b; /* this is a bug
c=d; /* c=d will never happen */
In D it's common to use the // instead of /* */, and today you usually have an editor that writes comments in a different colors, so you may seen them anyway.
a = b; // This is a comment
c = d; // Another comment

2. Accidental assignment/Accidental Booleans

C code:
if(a=b) c; /* a always equals b, but c will be executed if b!=0 */

This is a similar code in D:
import std.stdio: put = writef, putr = writefln;
void main() {
  int a = 10, b = 20, c = 30;
  if (a = b) // compilation error
    putr("hello");
}
But the compiler avoids it giving:
bugs.d(4): Error: '=' does not give a boolean result

The D compiler catches the other couple of problems too:
if (0 < a < 5) c; /* this "boolean" is always true! */
if (a =! b) c; /* this is compiled as (a = !b), an assignment, rather than (a != b) or (a == !b) */

3. Unhygienic macros


This is the problem in C:
#define assign(a,b) a=(char)b
assign(x,y>>8)
In D there aren't C macros. You use a function for that situation (that may become inlined), avoiding the problem:
import std.stdio: writefln;
char char_cast(T)(T x) { return cast(char)x; }
void main() {
  int x = 2000;
  auto y = char_cast(x >> 8);
  writefln(y);
}

4. Mismatched header files


D has no header files, avoiding many problems. And the D "headers" (.di files) are generated by the compiler itself (and you may need them only in larger programs).


5. Phantom returned values

This is the problem in C:
int foo (a)
{ if (a) return(1); } /* buggy, because sometimes no value is returned  */
This is the code in D:
import std.stdio: writefln;
int foo(int a) {
  if (a)
    return 1;
}
void main() {
  writefln(foo(0));
}
But when you cal foo() the program stops (compiling with -release such automatic asserts are removed):
Error: AssertError Failure bugs5.d(5) missing return expression
A better solution is to analyze the code statically, but this solution is much simpler to implement and it's surely faster too at compile time.
Note that input types, like 'a' here, must be always typed in D.


6. Unpredictable struct construction

So far D has no bitfields, so this problem is partially avoided (but I hope to see fast bitfields in D in the future).


7. Indefinite order of evaluation

At the moment I think D acts as C/C++ in this regard. The autor of D wants to define such expression evaluation order, but it's not easy to implement:
http://www.digitalmars.com/d/archives/digitalmars/D/D_and_expression_evaluation_order._52792.html


8. Easily changed block scope

In C from:
if( ... )
  foo();
else
  bar();
To:
if( ... )
  foo(); /* the importance of this semicolon can't be overstated */
else
  printf( "Calling bar()" ); /* oops! the else stops here */
  bar();                     /* oops! bar is always executed */
D suffers from this kind of errors too. They can avoided by alwsys putting {} even for single lines, making the code become longer (reducing code compactness, that sometimes may reduce readability). Python (and derived languages) avoid this whole class of problems:
if ...:
  foo()
else:
  print "Calling bar()"
  bar()
A significant indentation for D is possible too, but probably lot of people used to {} can't accept it.


9. Permissive compilation

This C code:
functionName,(arg1,arg2,arg3);
Can become the following D code with a bug:
import std.stdio: writefln;
int foo(int a, int b, int c) {
  return a + b + c;
}
void main() {
  auto r = foo,(2+5,1,3+7);
  writefln(r);
}
But the compiler catches it:
bugs.d(6): Identifier expected following comma

This C code:
witch (a) {
int var = 1; /* This initialization typically does not happen. */
             /* The compiler doesn't complain, but it sure screws things up! */
case A: ...
case B: ...
}
I can translate it to D as:
import std.stdio: writefln;
void main(string[] args) {
  if (args.length > 1) {
    switch (args[1]) {
      int x = 1; /* This initialization typically does not happen. */
      case "1": writefln("1! x=", x); break;
      case "2": writefln("2! x=", x); break;
    }
  }
}
Here the results are mixed, the x isn't initialized (this looks like a bug of DMD v1.023, because D is supposed to always initialize vars). On the other hand the code shows the usage of strings into the switch (but it has other limitations still) and if you give "3" as input the code gives:
Error: Switch Default bugs(4)
(This error cheeking is disabled by -release).

This C code:
#define DEVICE_COUNT 4
uint8 *szDevNames[DEVICE_COUNT] = {
         "SelectSet 5000",
         "SelectSet 7000"}; /* table has two entries of junk */
Can become the D code:
import d.string: putr;
const N = 4;
char[][N] names = ["First ",
                   "Second"];
void main() {
  putr(names);
}
In this situation D acts correctly, giving:
["First ", "Second", "", ""]

10. Unsafe returned values

This C code:
char *f() {
  char result[80];
  sprintf(result,"anything will do");
  return(result);    /* Oops! result is allocated on the stack. */
}
int g() {
  char *p;
  p = f();
  printf("f() returns: %s\n",p);
}
Becomes in D:
import d.string: putr;
import std.string: format;
string f() {
  string result = format("anything will do");
  return result;
}
void main() {
  auto p = f();
  putr("f() returns: ", p);
}
Thanks to the GC that's not a problem.


11. Undefined order of side effects
#include "stdio.h"
int foo(int n) {printf("Foo got %d\n", n); return(0);}
int bar(int n) {printf("Bar got %d\n", n); return(0);}
int main(int argc, char *argv[]) {
  int m = 0;
  int (*(fun_array[3]))();
  int i = 1;
  int ii = i/++i;
  printf("\ni/++i = %d, ",ii);
  fun_array[1] = foo; fun_array[2] = bar;
  (fun_array[++m])(++m);
}
</stdio>
The equivalent D code calls the foo() function:
import d.string: putr;
int foo(int n) { putr("foo, n=", n); return 0; }
int bar(int n) { putr("bar, n=", n); return 0; }
void main() {
  int m, i = 1;
  auto fa = [&foo, &bar];
  int ii = i / ++i;
  putr("i / ++i = ", ii);
  fa[++m - 1](++m);
}
Note: i/++i; can't be used in D because /+ is read as the start of a nestable comment.


12. Uninitialized local variables

Short of compiler bugs, D always initialize variables and structs. For the quite uncommon situations where you need max speed you can define variable as uninitialized with void:
int a = void;

13. Cluttered compile time environment

D compile-time environment is much cleaner, and using specified imports you can avoid most of other problems:
import std.stdio: writefln;

14. Under constrained fundamental types

In D all types have defined size. You can also use size_t that's as large as the CPU word. Plus (as in C99 or so) you have std.stdint that gives "exact", "At Least" and "Fast" aliases.


15. Utterly unsafe arrays

This is the C code:
int thisIsNuts[4];
int i;
for ( i = 0; i < 10; ++i )
  thisIsNuts[ i ] = 0;
}
If you don't use the -release trying to access outside array bounds throws an ArrayBoundsError.
But using array.length is often a starting way to avoid such errors:
import d.string: putr;
void main() {
  int arr[4];
  for (int i = 0; i < arr.length; i++)
    arr[i] = 0;
  putr(arr);
}
Even better is to use a foreach with a ref:
import d.string: putr;
void main() {
  int arr[4];
  foreach(ref el; arr)
    el = 0;
  putr(arr);
}
But arrays are already initialized to 0, so that's not necessary:
import d.string: putr;
void main() {
  int arr[4];
  putr(arr);
}
To initialize elements of an array to something else you just need []:
import d.string: putr;
void main() {
  int arr[4];
  arr[] = 3;
  putr(arr);
}
And you can avoid that double initialization to 0 and then to 3 with just:
import d.string: putr;
void main() {
  int arr[4] = 3;
  putr(arr);
}

16. Octal numbers

C code:
int nums[] = { 001, 010, 014 };

D and Python have the same silly problem. I think Python 3.0 avoids it.


17. Signed Characters/Unsigned bytes.

D doesn't have signed char, this avoids some of that problems.
But its bytes are signed by default (and it offers ubyte too), and there is no compilation flag to cheek overflows yet (that other languages like TurboPascal/Delphi have). The automatic casting of signed/unsigned integers is ugly too:
import d.string: putr;
void main() {
  putr("hello".length > -3);
}
That prints "false" because length gives an unsigned int, the second become cast to uint too, a very large one...


18. Fabulously awful "standard libraries"

Using the D std lib gives no problems in this situation:
import d.string: putr;
import std.string: format;
void main() {
  int a = 1, b = 2;
  auto buf = format("%d %d", a, b);
  putr(buf);
}
sformat() is more similar to sprintf, because it takes the char array to write on:
import d.string: putr;
import std.string: sformat;
void main() {
  char[10] buf;
  int a = 1, b = 2;
  sformat(buf, "this is the result: %d %d");
  putr(buf);
}
But that problem is revealed at run time (even if you use -release):
Error: ArrayBoundsError std.string.sformat(0)

Reducing the string len it gives another error (Error: std.format):
import d.string: putr;
import std.string: sformat;
void main() {
  char[10] buf;
  int a = 1, b = 2;
  sformat(buf, "this %d %d");
  putr(buf);
}
This version looks better, but it raises "Error: 4invalid UTF-8 sequence":
import d.string: putr, writefln;
import std.string: sformat;
void main() {
  char[10] buf;
  int a = 1, b = 2;
  sformat(buf, "this %d %d", a, b);
  writefln(buf);
}
This is correct, r is a "light" slice of buf of length = 8:
import d.string: putr, writefln;
import std.string: sformat;
void main() {
  char[10] buf;
  int a = 1, b = 2;
  auto r = sformat(buf, "Hello %d %d", a, b);
  putr(r.length, " ", buf.length, " ", r.sizeof, " ", buf.sizeof);
  putr(r);
  r2 = buf[1 .. 6];
  putr(r2);
}
It prints:
9 10 8 10
Hello 1 2

buf.length is 10 and it can't be changed.
r.length is <= buf.length.
r.sizeof is always 8 (on 32 bit machines) because it's just a light slice, that is a reference to a contiguous part of buf (I don't know how it is represented by D).

[Go back to the index]