Three things that aren’t taught early enough #2: Dynamic Link Libraries

Continuing the series started with Three things that aren’t taught early enough #1: Unicode, I now present thing number two:

Thing Two: Dynamic Link Libraries

Modern operating systems live on dynamically linked libraries (abbreviated to DLLs from here on in, which refers to the general concept, rather than the Windows-specific implementation). Huge amounts of code is provided to applications (there’s over 800MB in my Windows Vista system32 folder) which simply cannot be included directly into any compiled application. Unfortunately, the most a second or third year (or later?) computer science student could tell you about is DLL hell, and they probably won’t get any further than the name.

DLLs are (still) the top step in increasing levels of physical code-separation. Early (very early) programs consisted of a single monolithic source file. (Even earlier ones consisted of a single monolithic punch card…) This gave way to separate files which were compiled separately and then linked (called “object files”, commonly COFF and ELF). Commonly used object files started being collected/archived into static libraries, allowing much greater code reuse than OOP has ever achieved. The big disadvantage of static libraries is all the code gets included into the executable. When 100 applications include the same 1MB of code, that’s 99MB worth of duplication.

Enter DLLs. When an executable starts, the loader determines whether the required DLLs are in memory or not. If they are, the loaded code is used. Otherwise, the DLL is found and loaded from disk. All advances on DLLs have mostly been aimed at reducing dependency issues, including through side-by-side installation of multiple versions (something that has been in *nix forever (or close enough)), tighter interface conventions (for example, ActiveX/COM) and stronger naming conventions (as in .NET assemblies).

<Windows Specific>
(The implementation of DLLs is operating system specific. I don’t have any inside knowledge of *nix style .so files (though I know that they are identical to MacOS .dylib files), so I will only discuss Windows .dll files. Presumably the concepts are applicable to all DLLs, otherwise they would be implementing different things.)

A .dll file is fundamentally identical to a .exe file. In fact, they are completely identical in implementation. An .exe file can be used as a DLL if so desired. All Portable Executable (PE) files include import and export tables. The export table contains a list of function names and addresses which may be called from outside the file. No information on calling conventions or parameters is included as this is meant to be shared separately. Name mangling is used in various forms to better enforce function signatures, though it usually doesn’t include enough information to make shared header files unnecessary. Unfortunately, different compilers perform different name mangling, making DLLs hard to share. For this reason, extern “C” is generally recommended, as this disables name mangling (as well as parameter overloading).

DLLs are used by another program in one of two ways. They are either dynamically loaded, using functions such as LoadLibrary() and GetProcAddress(), or automatically linked. Automatic linking is performed by the loader using the contents of the import table. Information is included in the application about functions that exist in another DLL, generally by use of a small static library. When starting the program, the loader will resolve these references, loading any required DLLs, and fix up function calls to point to the right memory location. If a function or library cannot be resolved, the program will not start. An advantage of dynamically loading a library is that the application can better handle a DLL error.

Dynamic loading also allows for better backwards compatibility. Newer versions of Windows provide API functions that didn’t exist in previous versions. If the loader is asked to resolve one of these functions and it cannot be found, the application will not start. If, however, the function is not essential, it can be dynamically resolved and called if available.
</Windows Specific>

So, what hasn’t been taught about DLLs? I would argue that the bare minimum is missing. The concept of a compiled software project that doesn’t actually run. The concept of exported functions and data (and also the concept of a linker). The issues arising from mismatched interfaces. A brief overview of name mangling and the issues that occur when mixing compilers. Exporting definitions for C++ applications (Microsoft’s dllimport and dllexport specifiers and gcc’s switches and conventions) (the trend is so strongly away from native applications that few other languages offer the ability to make native DLLs). Introduce and play with Dependency Walker and/or a *nix equivalent.

Issues such as redirection, forwarding, specifics about export tables, deployment and registration don’t need to be touched.

DLLs don’t require a huge amount of time dedicated to them, but no time is simply not good enough. Arguably they are not relevant in a pure computer science degree, but since nobody is teaching a pure computer science degree anymore DLLs should definitely get a showing. Along with Jeff Atwood, I also include version control in this sort of category. It needs to be taught, however briefly, before people get a piece of paper saying they can do stuff.

(Now I’ve mentioned version control it should be a big hint that it isn’t thing number 3. Number 3 is a huge one though, might be a couple of weeks off depending how much time I get. It will also include code samples a-plenty. Apologies for the delay, but hopefully it will be worth it.)

Comments (5)

Three things that aren’t taught early enough #1: Unicode

Before we continue, I’ll clarify the title slightly. I’m only talking about programming things and by “early enough” I mean they hadn’t been taught in the first two years of the CS&SE degree I’m doing. The concepts also only really apply to C and C++, though since C and C++ are still the most commonly used programming languages (combined) AND a full understanding of higher level languages can only be obtained through understanding the lower levels, I believe these points are relevant.

(If you disagree, post a comment and we can discuss it. That’s the point of comments. Better yet, write your own blog post and attract readers to yourself. Certainly many of the ByteClub blogs are way too quiet.)

I will post one thing per blog post. Partially through wanting to keep individual discussions on track and also through not wanting to have to write all three before I click “Publish”.

Thing One: Unicode

If I didn’t take an interest in programming outside of university, I would never have heard of Unicode. Blissful ignorance is a great excuse for “beginning” programmers, but the reality is not actually all that complex.

A bit of background. The first major character set (which maps numbers to displayable characters) was 7-bit ASCII (ASCII being the title, 7-bit being relevant information). Hopefully everyone has just worked out that 7-bits equals 128 possible characters. If you consider the English language (which, not surprisingly, the American Standards Association (later ANSI) did) the characters that need to be displayed are upper and lowercase letters A through Z, each digit, and a sprinkling of punctuation. This fits quite well into 7-bits, including some leftovers for control characters.

But then, people in countries where other languages are spoken decided that they deserved to be able to use their own language’s characters. So they started using codes 128-255. Except, these codes were being used in different ways in different places. Greek characters bear little resemblance to French characters, so Greek computers would use a different set of characters in the 128-255 range. Computers would ship with a selection of so-called “code pages“, allowing the user to select their preferred character set.

Now, when sending email from Greece to France (because people sent lots of emails prior to 1991), the code page had to be specified to ensure your ωs weren’t changed into çs. Of course, this whole setup failed miserably. So Unicode was developed.

Rather than assigning each character a single value between 0 and 255, Unicode assigns each character a code-point value. (One of) a variety of encoding schemes are then used to store this code-point, the two most popular being UTF-8 and UTF-16. UTF-8 is based around byte-sized (8-bit) elements, while UTF-16 is based around word-sized (16-bit) elements - to use the C++ data types, char and wchar_t (or unsigned short).

The majority of code-points fit into 16 bits (ie. are less than 0xFFFF), making UTF-16 efficient for when a wide variety of languages (or even only one, providing it’s not English) will be used. The majority of code-points for the English language fit into 8 bits (ie. are less than 0xFF), making UTF-8 more efficient for files consisting primarily of Latin characters. UTF-8 also has the advantange of being able to decode ASCII encoded files, which is why it is the most commonly used encoding.

However, there is a catch. A program that supports UTF-8 can read ASCII files, but not necessarily the other way around. The reason for this is that ASCII is a single-byte character set (SBCS) while UTF-8 is a multi-byte character set (MBCS). Basically, ASCII guarantees that 8 bits is enough to describe every supportable character. UTF-8 guarantees nothing. If a character cannot be encoded into 8-bits (technically 7 bits), it will be split across two bytes. Or three. Or four. Or five. Or six! Suddenly everything taught about ASCII strings goes out the window.

Consider the following code:

char* myString
 
int count = 0;
for(char* c = myString; *c != \0′; ++c)
{
    count++;
}

Now, what does this snippet do? Think about it.

The obvious answer is that it counts the number of characters in the string. This is incorrect. It actually counts the number of bytes used by the string. The difference may not be obvious if you attended the same classes as I did. The char data type is a signed, 8-bit integer - one byte. Each increment counts as one, regardless of the contents of that byte, provided that it isn’t the null terminator.

For a SBCS, like ASCII, each character requires one byte, so the number of bytes will always equal the number of characters. For a MBCS, like UTF-8 (and UTF-16), a single character may be encoded in more than one byte. For an English string this is unlikely. However, throw in some accented characters and counting bytes is an incredibly inaccurate way of counting characters (though it is still relevant in terms of dealing with the encoded string, just not in decoding it back into individual characters).

The work-around? It’s relatively simple, though potentially not nicely portable. Microsoft’s Visual C++ library has macros for traversing a string, and the UTF8-CPP library looks quite similar, though neither are built into the language (no “batteries included”). Operating systems provide suitable functionality and support for character processing, and apparently current C and C++ standard libraries have MBCS support (I am somewhat doubtful with regards to the std::string class).

UTF-16 suffers from the same issue, though not to the same extent. UTF-16 is predominantly used on Windows systems and is the standard for that platform. Most, but not all API calls support using either UTF-8 or UTF-16 (incorrectly referred to as ANSI and Unicode, and providing the current code-page is set to UTF-8) though some more recent functions only support UTF-16. *nix based operating systems usually use UTF-8 and “wide characters” are actually 32-bits, making direct compatibility difficult.

So how should this be taught? A good start is actually Java, C# or VB, since the char data type in these languages represents an entire character, rather than an 8-bit integer. When C++ starts to be used for processing strings, the wchar_t type is a much better option. *nix compilers will treat this as a 32-bit integer, capable of encoding every known character, while Windows will treat it as a 16-bit integer, capable of encoding most. (The char type should only ever be used for strings in cases where memory is more important than processing speed, ie. never.) The standard C++ library provides wide versions of the iostream streams such as wcout and wcin, as well as wstring and wstringstream classes which, while probably not perfect, are considerably better than the 8-bit alternatives.

Finally, teach operating system specific function calls. There is going to be more than one subject where this comes up, so do one on Windows and one on Linux. Specifically, teach documentation reading, how to find functions in MSDN and whatever-the-Linux-equivalent-to-MSDN-is. Most graduates are going to end up programming for Windows anyway, probably in a language where OS specific calls are impossible. Ideologically teaching everything in an OS that doesn’t cost anything is of no help to students. Teaching a full range of content is.

To summarise (for all those tl;dr people), ASCII is dead and has been for a long long time. To still be teaching that one-char always equals one-character is inexcusable and is not setting up the next generation of developers to be able to handle a global audience. Higher level languages provide better support built-in, but an understanding of character sets and code pages is as essential to being a well-rounded software developer as is bits and bytes (ie. very essential, but that’s a different topic (no, not in this series)).

For futher reading, I recommend Joel Spolsky’s “The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!)”.

Comments (2)

stackoverflow.com

For those who don’t read around as much, stackoverflow.com is the brainchild of Jeff Atwood and Joel Spolsky. At the moment there are only two podcasts and a clever cartoon to show for it, but the general idea has been explained.

(A short summary follows. Alternatively you can read Jeff’s announcement, Joel’s announcement or get the podcasts (RSS). Following the summary is my take on the whole thing.)

In short, stackoverflow.com is a cross between Wikipedia and Reddit and a discussion board. Developers/programmers/geeks/people ask questions which are answered in a collaborative form. Responses can be voted up or down to increase your “karma”, which increases your ability to post on more important topics. The aim is to achieve the top Google result for each topic, much like Wikipedia has the top for everything.

(End summary)

Obviously the idea hasn’t been fully developed, or if it has, it hasn’t been expressed clearly. It could be nothing more than a generic discussion board, or it could be a thriving community developing a collection of code snippets. There are some obvious inconsistencies in my summary above, but that is Jeff and Joel’s fault, as far as I’m concerned.

In my opinion, a collaborative code-snippet collection would be great. A Yahoo! Answers site would be pointless. Unfortunately, based on the questions and answers in the second podcast, the latter appears to be more likely.

Specifically, the question was asked whether code would be given in multiple languages, with filtering available similar to the MSDN Library. The answer was along the lines of “it depends on the wording of the question.” A question like “how can I use VB to send an email” gets a different answer to “how can I send emails in any language.”

But the relevant part is sending email. Why does it matter what language they ask about? Why have one answer that gets voted as the ‘best’? Why not have a page titled “Sending email”? Sections of the page can be tagged with the language to allow for filtering. The sections can also be tagged with complete/incomplete and a BugMeNot style vote can be used to indicate whether the snippet worked. “Karma” can be awarded based on a person’s contributions and votes. (This is also similar to the model used at a number of song lyric sites.)

Encouraging questions encourages a few bad habits. Homework questions tend to be quite an issue. “Dorothy Dixer’s” that lead to advertising of a particular library or language are equally to be avoided. To some extent, both of these are avoided under the Wikipedia model. Homework questions don’t appear when the emphasis is on contributing, rather than asking. Blatant advertising can be easily removed by any contributor.

The idea still needs clarification. With the star-power of Spolsky and Atwood behind it the site is sure to start strongly, the difficulty will be in building a culture that works. I don’t read Joel’s discussion board or the comments on Jeff’s blog posts because the amount of noise is so high. If this carries over to stackoverflow.com, the highly skilled programmers simply will not hang around.

Jeff, Joel: you have a chance to do something great. Don’t waste it.

(Also, on the “should all programmers learn C” issue, I disagree. C is much too high-level for the proposed benefits. Everyone should learn assembly language, at least enough to implement the basic set of algorithms (search, sort, recursion, etc).)

Comments

The Benefit of Private Inheritance

(Yes the title is accurate. As you will see by the end of this piece, there is exactly one benefit of private inheritance.)

One of the (few) useful aspects unique to OO is inheritance. In general, inheritance allows you to expand an existing class to change or increase it’s functionality. Inheritance as known by Java and .NET programmers (and beginning C++ programmers) simply pulls down all the functions from a single base/super class and exposes them from your derived/sub class.

In the C++ world, though, inheritance is more complicated. The contents of more than one class may be used to construct a new class, and the accessibility of the imported members may be modified. The ’standard’ inheritance is known in C++ as public inheritance. All imported members have the same accessibility as they did in the base class.

class MyBaseClass
{
public:
    void doSomething();
protected:
    int aNumber;
};
 
class MyDerivedClass : public MyBaseClass
{
public:
    int getTheNumber() { return aNumber; }
};

Could have been written as follows:

class MyDerivedClass
{
public:
    void doSomething();
    int getTheNumber() { return aNumber; }
protected:
    int aNumber;
};

Private inheritance limits the accessibility of the imported members allowing only the immediate derived class access.

class MyBaseClass
{
public:
    void doSomething();
protected:
    int aNumber;
};
 
class MyDerivedClass : private MyBaseClass
{
public:
    int getTheNumber() { return aNumber; }
};

Is effectively the same as:

class MyDerivedClass
{
public:
    int getTheNumber() { return aNumber; }
private:
    void doSomething();
    int aNumber;
};

Protected inheritance exists but is basically never used. The same can be said for private inheritance, since it provides very little benefit over a second option. The example of private inheritance directly above could also have been written like this:

class MyDerivedClass2
{
private:
    MyBaseClass myClass;
public:
    int getTheNumber() { return myClass.aNumber; }
};

These snippets are actually examples of the Adapter pattern.

An adapter allows classes to work together … by wrapping its own interface around that of an already existing class.

To be more specific, the example using inheritance is an example of a class adapter, while the second is an example of an object adapter. Private inheritance is preferred for class adapters, since clients are prevented from modifying the underlying data structure. Programming languages that don’t support private inheritance must use an object adapter to achieve this.

So private inheritance may be used to allow the class you are writing to use a particular object without allowing the client or subclasses to also use it. Obviously (hopefully), this provides absolutely no benefit over simply declaring a private instance variable.

In fact, private inheritance is at a distinct disadvantage to an instance variable. How many other programmers will notice and understand the ‘private’ specification? Composing two concrete or partially implemented classes becomes mind-bendingly confusing and the risk of namespace collisions should turn even the most masochistic developer off private inheritance.

So what’s left? “Performance,” somebody yells. “Accessing a function attached to a member variable requires an extra lookup over accessing a function on the caller.” This is correct, under certain conditions.

Firstly, the function being called must have been marked virtual. A non-virtual function doesn’t appear in the virtual function table (duh) but is called directly.

Secondly, the instance variable must be a pointer to an instance, rather than the instance itself. Why? Because an actual instance must be a fully defined class. The following snippet shows both a valid and invalid example.

class LaunchableOutOfACannon
{
public:
    virtual void launch() = 0;
};
 
class PussyCat : public LaunchableOutOfACannon
{
public:
    void launch() { cout << “Meowwwwwwww!”; }
};
 
// A valid example
class MyObjectAdapterThatWorks
{
private:
    PussyCat obj;
public:
    void fireMe() { obj.launch(); }
};
 
// An invalid example
class MyObjectAdapterThatDoesntWork
{
private:
    LaunchableOutOfACannon obj;
public:
    void fireMe() { obj.launch(); }
};

Spot the difference? An abstract class cannot be instantiated, and so the type of an object that is specified in this manner (that is, not a pointer) must be fully known at compile time. As a result, the exact function call is known at compile time and a virtual function table lookup is not necessary. The other side of the same coin is that if a virtual function table lookup is required for the object adapter, it will also be required for the class adapter.

class MyClassAdapter
 : private PussyCat
{
public:
    void fireMe()
    {
        launch();
    }
};
class MyObjectAdapter
{
private:
    PussyCat obj;
public:
    void fireMe()
    {
        obj.launch();
    }
};
; void fireMe() { launch(); }
 
  mov  eax, [ecx]
  mov  edx, [eax]
  jmp  edx
; void fireMe(); { obj.launch(); }
 
  mov  eax, [ecx]
  mov  edx, [eax]
  jmp  edx

The optimised VC++ 9.0 output is identical for both the object adapter and the class adapter (as is the unoptimised output, but the optimised one is considerably neater). The virtual function table lookup occurs in both cases with the final function pointer ending up in edx. (Removing the pure virtual function from LaunchableOutOfACannon results in direct jumps to the function, as expected. Apparently VC++ 9.0 will not optimise out virtual function lookups.)

So private inheritance doesn’t have the benefit of speed or code size over a private instance variable. It doesn’t have the benefit of readability and can require incredible feats of mental gymnastics to follow code using it. The one and only advantage of private inheritance is that Java developers don’t get to use it. Nya nya nya nya nya nya! :P

Comments (1)

Quick Syntactic Sugar

VB has long held the reputation of making things easier for the developer at the cost of a syntax that can be understood by mere mortals. However, I have just discovered how much further this has gone…

From MSDN’s LINQ to XML Overview:

Creating XML Trees
The ease with which you can create XML trees is particularly significant. For example, to create a small XML tree, you can write C# code as follows:

C#

XElement contacts =
    new XElement(“Contacts”,
        new XElement(“Contact”,
            new XElement(“Name”, “Patrick Hines”),
            new XElement(“Phone”, “206-555-0144″,
                new XAttribute(“Type”, “Home”)),
            new XElement(“phone”, “425-555-0145″,
                new XAttribute(“Type”, “Work”)),
            new XElement(“Address”,
                new XElement(“Street1″, “123 Main St”),
                new XElement(“City”, “Mercer Island”),
                new XElement(“State”, “WA”),
                new XElement(“Postal”, “68042″)
            )
        )
    );

In Visual Basic, the code to construct the XML tree is even simpler, because it uses an XML literal:

Visual Basic

Dim contacts = _
    <Contacts>
        <Contact>
            <Name>Patrick Hines</Name>
            <Phone Type=“Home”>206-555-0144</Phone>
            <Phone Type=“Work”>425-555-0145</Phone>
            <Address>
                <Street1>123 Main St</Street1>
                <City>Mercer Island</City>
                <State>WA</State>
                <Postal>68042</Postal>
            </Address>
        </Contact>
    </Contacts>

The Visual Basic compiler translates XML literals into method calls into LINQ to XML.

Re-read that last bit. The Visual Basic compiler translates XML literals into method calls.

That is really really cool!

I doubt I’ll ever use it, since I’ve practically switched completed to C# (which has nice enough XML generation), but I can’t get over how cool that is! You just write XML right there, in-line with your code, and it parses it.

Wow…

Comments (1)

Can you find…

Here’s a small sheet of fun stuff from Stephan Pastis’s Pearls Before Swine comic. While some of it is a bit obscure and/or weird (par for the course for this comic), the spot-the-difference is pure gold.

Once you have finished/given up on the spot-the-difference part, the answer is reproduced below where it is readable.

1) First panel labeled “PANEL 1,” second panel labeled “PANEL 2″;
2) In Panel 1, Rat feels an internal urge to punch Pig. In Panel 2, urge is diminished;
3) Panel 2 is below Panel 1;
4) There is a flying toaster in Panel 2;
5 and 6) There are no fifth and sixth differences.
The question was, “Can you find at least 6 differences?” The answer was no.

Comments

Timing/Profiling Code

This is genuinely not a rant, nor is it some fluffy post about the virtues of profiling or hand-optimisation or anything like that.

This post contains real code. All of it is written in assembly language. If this is too scary, you may skip this post now before you get nightmares.

I have spent a lot of time programming in assembly language. Probably the only language I’ve used more is VB. Neither of these provides built in profiling, and both can be quite heavily optimised by hand (not so later versions of VB).

This results in a situation where some form of profiling tool is required for code snippets. In assembly language this can be implemented with macros and the RDTSC instruction.

The time-stamp-counter (TSC, as read by the ReaD TSC instruction) on IA32 based processors is a 64-bit integer that contains the number of clock cycles since the value was last reset.

rdtsc
; EDX:EAX now contains the current TSC value 

This instruction can be used to determine the number of clock cycles a snippet of code takes to execute:

rdtsc
push edx
push eax
 
; some code here
; probably looped so an average is taken
 
rdtsc
sub eax, [esp+0]
sbb edx, [esp+4]
add esp, 8
; EDX:EAX now contains the number of clock cycles for the code 

Great. Well, almost. The problem is that processors for many years now have had the ability to execute instructions out of order. The rdtsc command does not have any dependencies, so it may be executed before some of the preceeding code. For the same reason, the final rdtsc may be executed before the code snippet has finished. The general solution is to place a serialising instruction before it. There are only three non-privileged (aka. ring 3, aka. user-mode) serialising instructions, two of which change the instruction pointer (iret and rsm) and are therefore not suitable. The cpuid instruction is used.

cpuid
rdtsc
push edx
push eax
 
; some code here
; probably looped so an average is taken
 
cpuid
rdtsc
sub  eax, [esp+0]
sbb  edx, [esp+4]
add  esp, 8
; EDX:EAX now contains the number of clock cycles for the code contained 

The next problem that arises came with the Pentium M processor - SpeedStep (known as PowerNow! or Cool’n'Quiet by AMD). This technology basically (very basically) reduces the clock speed of the processor when nothing much is happening. Intel explain the issue in a paper:

While SpeedStep technology is good for conserving power when laptops are running on batteries, it changes the processor frequency. If the frequency changes while the targeted code is running, the final reading will be redundant since the initial and final readings were not taken using the same clock frequency. The number of clock ticks that occurred during this time will be accurate, but the elapsed time will be an unknown.

It is possible that since we are using the entire processor for our code snippet (profiling code that includes waits or operating system calls is pointless), SpeedStep will not reduce the running speed at all. Without transitioning into privileged mode to disable SpeedStep (not to mention detecting it or the AMD alternatives), confidence in the timing results must be lowered.

On the side problem of operating system calls mentioned above, generally the testbed will be set to run at the highest priority possible (REALTIME_PRIORITY_CLASS and THREAD_PRIORITY_TIME_CRITICAL, which is actually real-time) which reduces the effect of kernel-mode transitions and waits as much as is possible (ie. not very much).

The biggest problem now is multi-processor systems. The TSC is unique to an individual processor, and there is no guarantee (or even a suggestion) that two separate processors (or cores) will have synchronised counters. The result of this is that the timing value could be very inaccurate in the case where the first rdtsc executes on processor 0 and the second rdtsc executes on processor 1.

This is relatively easy to deal with. The Windows API provides the SetThreadAffinityMask function, which can be used to prevent the thread from being moved to another processor. (The assumption is made that processor 0 (mask of 00000001h) always exists. This is probably safe since the documentation says that Windows Me/9x require the mask to always be 1; these operating systems don’t support multiple processors)

call GetCurrentThread
push 1      ; assuming that processor 0 exists
push eax
call SetThreadAffinityMask
push eax    ; preserve the return value - the old affinity mask
 
cpuid
rdtsc
push edx
push eax
 
; some code here
; probably looped so an average is taken
 
cpuid
rdtsc
sub  eax, [esp+0]
sbb  edx, [esp+4]
add  esp, 8
; EDX:EAX now contains the number of clock cycles for the code
 
call GetCurrentThread
push eax    ; the old affinity mask is already on the stack
call SetThreadAffinityMask
; The original affinity mask has been restored 

Now our timing code is probably good enough to give comparable results on the same machine. However, it is not much good for comparing between different machines. As CISC processors move towards RISC instruction sets at higher clock rates, the number of cycles for a section of code may increase while the actual time taken decreases. Also, instruction pairing may affect the cycle count for the code.

To achieve comparable rates between separate computers, the time-stamp-counter needs to be converted to a ‘physical’ time - preferably nanoseconds, but micro is okay. Ultimately, it is not trivial to do this directly with the TSC, so a wrapper function is provided.

As much as they may be disliked by “purists”, the QueryPerformanceCount and QueryPerformanceFrequency functions do exactly this.

QueryPerformanceCount will return the value of some counter. In single-CPU systems where SpeedStep or similar is not in use, this may well be the hardware time-stamp-counter. However, it is more likely to be a dedicated timer that is unaffected by the current CPU or clock frequency. (The API warns that BIOS or HAL bugs may cause different processors to return different values, so SetThreadAffinityMask is still recommended)

The big advantage of using QueryPerformanceCount is its partner: QueryPerformanceFrequency. Using these two functions together allows the time taken by the code snippet to be converted to seconds, milliseconds, microseconds or smaller. Now it is possible to compare the real time taken by a section of code.

push 0
push 0
mov  eax, esp
push eax
call QueryPerformanceCounter
 
; some code here
; probably looped so an average is taken
 
push 0
push 0
mov  eax, esp
push eax
call QueryPerformanceCounter
pop  eax
pop  edx
 
sub  eax, [esp+0]
sbb  edx, [esp+4]
; EDX:EAX now contains the number of clock cycles for the code
 
mov  ecx, esp
push ecx
call QueryPerformanceFrequency
 
push64 1000000000           ; some theoretical macro for pushing 64-bit numbers
push eax                    ; the low and high DWORDs of the cycle count
push edx
call UnsignedInt64Multiply  ; some theoretical function for multiplying
 
; The frequency value is already on the stack
push eax                    ; the return values from UnsignedInt64Multiply
push edx
call UnsignedInt64Divide    ; some theoretical function for dividing
 
; EDX:EAX now contains the number of nanoseconds for the code 

And of course, the start and finish code gets wrapped up in nice macros. Probably thread affinity and priority code would also be included in the hope that they will make some difference at some point. Quite likely a little bit of range checking, though there isn’t really much that can be done that twos-complement subtraction won’t handle anyway.

An easier to follow coding style might also be used, but where’s the fun in that? :)

Comments (3)

Java Virtual Machines

Not a rant, just a little something else that I discovered to dislike about Java.

As everyone (who is reading this blog) knows, Microsoft ships a dodgy JVM with Windows.

As everyone (who is reading this blog) also knows, you should replace this with the Sun JVM as soon as possible.

Except!

After I last formatted my computer, I didn’t install the latest JRE, or any JRE for that matter.

I don’t use Java at all for programming - .NET is better in every way I can be bothered counting before I stop and assume that it’s better in every way (yes, the ideology behind .NET is better too :-P ) - so installing the JDK is irrelevant to me.

On my internet travels, I use one Java applet more than once every few months. Before formatting my computer (the format was routine - nothing to do with Java being dodgy), this applet would take at least two minutes to start and would cause graphical errors throughout Windows until logging off (even if all Java processes were killed before then). External applet windows would also take minutes to load and display.

Using the Microsoft JVM, I’ve noticed a few improvements. Firstly, the graphical errors don’t occur. Secondly, the text rendering is (IMHO) better under the Microsoft JVM (partially a slightly clearer font). Thirdly, the applet I mentioned above loads well before I have a chance to make coffee - which is something I’d regularly schedule in before. Applet windows load equally fast.

I haven’t done proper benchmarks partially because I don’t want to install Sun’s Java again and partially because it’s blatantly obvious to me that the Microsoft JVM is faster. Precise times mean nothing when you can measure one in cups-of-coffee and the other in steps-towards-the-kitchen.

The missing libraries don’t bother me because the only Java stuff I use is this one applet and the various animations that appear on physics sites around the net. If a Java application needs more than that then I don’t need the application. Write it in a real language.

Okay, that sounded more like a rant than I intended when I started, so we’ll flick the category switch on.

Comments

Too many controllers…

(Note: Due to the size of the images and generally annoying layout, I’ve put this post into an HTML page here for easier (though less colourful) reading.)

Ever looked at the list of your connected USB devices? It probably looks something like this:

I can count four USB controllers and four root hubs. A quick guess of one hub per controller and four ports per hub means I should have sixteen USB ports on my computer. Unfortunately not. I have only two ports on the back of my laptop (Inspiron 8600).

Both of these ports can operate at high-speed, which means they belong to the USB 2.0 Enhanced Host Controller rather than any of the others, so why bother wasting space on the motherboard for the other three?

Potentially one controller belongs to the USB-sized hole in one side of the case, but since there is no actual port there I can not test whether it also belongs to the USB 2.0 controller.

Time to go to the tools. USB View is a ‘free’ tool from Microsoft, in the sense that to get it you need to download and install the Windows Driver Development Kit and build the example yourself. The first link is my copy which I build myself for Windows XP. (Apparently it may not work correctly on older versions of Windows, just as a version built for Windows XP original (ie. no service packs) or earlier won’t work properly on WinXP SP2.)

Running it, the first thing to notice is that my guess of four ports per root hub was wrong.

The three USB 1.1 controllers have two ports each, while the USB 2.0 controller (highlighted) has six. (As you can see, I’ve disconnected everything so it’s just the controllers and their hubs.) It’s time to test the theory about which controller has which port.

First I connect a USB Mass Storage device to what I’ll call USB port 1 (the top one):

No surprises there. Looking at the speed value on the right, we can see that it is connected at high speed. (We can also see that the device manufacturer hasn’t bothered to get their own vendor ID, since they’ve just used Cypress Semiconductor’s, who makes the USB controller.)

Moving the device to USB port 2 (the bottom one):

Interestingly, it’s connected to the third port of the root hub. So perhaps the potential USB port which isn’t attached to this motherboard goes to the second port? Unfortunately, I have no way of testing this theory.

So ports 1 and 3 of the USB 2.0 controller are connected to the two ports on the back of the laptop. I am assuming that the missing port also connects to this controller. So why are there three others?

The answer comes from the specifications for the Intel 82801DB/DBM controller, section 5.17.8 on page 202.

The EHC shares the six USB ports with three UHCI Host controllers in the ICH4. The USB UHCI controller at D29:F0 shares ports 0 and 1; the USB UHCI controller at D29:F1 shares ports 2 and 3; and the USB UHCI controller at D29:F2 shares ports 4 and 5 with the EHCI controller. There is very little interaction between the USB EHCI and the USB UHCI controllers other than the muxing control which is provided as part of the EHCI controller.

So there are actually a total of six ports available, each of which is capable of acting at any speed. When a high speed device is connected, the USB 2.0 controller (called USB EHCI in the Intel document) takes responsibility. Otherwise, one of the other controllers does. This is presumably to lessen the workload on the high speed controller.

Time to test this out. USB View says:

So that’s it. The keyboard has actually connected to a different controller. If we unplug both devices and reconnect the keyboard only, we can see that it stays in the same place.

So that confirms that the controllers actually share the ports. Which leaves the minor problem of where the other plugs are…

The Intel documentation indicates that there are six actual ports exposed by the chipset. On the laptop, there are two on the back and one on the side (which doesn’t really exist…) which means three are hidden. One of these, I’m told, is used for versions of the laptop that include a BlueTooth module. Since I don’t have this module, I can’t be sure which port it is, but through much plugging and unplugging of devices, I’ve been able to label the rest of the ports.

Port Location High Speed Host
Device ID (Port)
Full/Low Speed Host
Device ID (Port)
Comments

Back of laptop 24CD (1) 24C2 (1) The top port

Unknown 24CD (2) 24C2 (2) Either the extra port or a BlueTooth module

Back of laptop 24CD (3) 24C4 (1) The bottom port

Unknown 24CD (4) 24C4 (2) Possibly the extra port or a BlueTooth module

Media Bay 24CD (5) 24C7 (1) The optical drive didn’t appear, but a floppy drive did

Port Replication 24CD (6) 24C7 (2) a.k.a. docking station

The setup is very similar on all the PCs I was able to get at to test though a desktop is more likely to expose the ports rather than use them internally.

I will probably post more on USB soon, since I’ve been working with drivers for USB devices quite a bit recently.

Oh, and too many controllers didn’t spoil anything, in this case. :)

Comments (1)

Knights of the Round Goto

Computer scientists are zealous in their beliefs, and when the discussion turns to gotos, they get out their jousting poles, armor, and maces, mount their horses, and charge through the gates of Camelot to the holy wars.

Steve McConnell - Code Complete

There has been far too much emphasis on go to elimination instead of on the really important issues

Donald Knuth - Structured Programming with go to Statements

These quotes are from 14 and 33 years ago respectively, which may mean I’m flogging a dead horse here, but there’s always some life in this one. Since I hadn’t come across the two articles above (the Knuth requires an ACM login, Swinburne people can log in here) I figured not many other people had either, so I thought I’d bring them out into the open.

Both articles bill themselves as unbiased reports on the state of the goto vs. gotoless programming war and both do quite well.

McConnell presents a brief summary of the arguments on both sides, which summarise as (my summary, not his):

  • No gotos equals higher quality
  • Some gotos equals higher quality
  • Lots of gotos is bad

Both sides agree that lots of gotos is bad (though one, unnamed side never seems to accept that the other side has made that concession). The gotoless side argues that gotos create code which is “hard to format”, “defeats compiler optimizations” and gotos are like “termites in a rotting house”. The goto side argues that gotos “can eliminate duplicate code”, can “reduce the danger of forgetting to deallocate resources” and gotos were “included as part of Ada.” (More on the silly sounding reasons soon.)

Knuth takes a more algorithm based approach. It’s important to recognise that the article linked above was one of the first major papers on the topic, so most of his work starts with algorithms involving conditional and unconditional branches which he attempts to remove. His results are almost perfectly balanced between gotoless and non-gotoless code in terms of improved efficiency and readability. (They are, in fact, so balanced it would appear the examples were contrived to produce a balanced result.)

Part way through, he makes the following observation:

Certain go to statements which arise in connection with well-understood transformations are acceptable, provided that the program documentation explains what the transformation was.

However, his final stance on the topic is that gotos should not be taught, “at least as an experiment in training people to formulate their abstractions more carefully.” This followed an observation that students believed they needed gotos when it was in fact their algorithm which required work. However, his stance on whether gotos should still exist in modern languages is that they should remain until suitable control structures exist such that a goto never provides any improvement over another form.

Many authors have suggested language features which provide roughly equivalent facilities, but which are expressed in terms of exit, jump-out, break, or leave statements. Kosaraju has proved that such statements are sufficient to express all programs without go to’s and without any extra computation, but only if an exit from arbitrarily many levels of control is permitted. [Emphasis mine]

The emphasised case is the biggest issue preventing gotos from being completely eliminated from C-based languages. As I discussed here, the break statement is not clear about which scope/s it is leaving, nor is it possible to neatly leave more than one nested loop (and no, named loops don’t count).

A smaller, but not less important (to some people), issue is the matter of recursion optimisation. Conventional logic dictates that iteration is always faster than recursion, which is true for most realistic cases. However, not all algorithms can be converted to iterative form with the basic control structures provided in C-based languages, as Knuth demonstrates.

The other place where a goto provides improved clarity is the n+1/2 problem which is extremely common everywhere really, but especially in first year programming subjects:

int value = readInt(“Enter an integer: “);
while(value >= 100)
{
    println(“Value must be less than 100″);
    value = readInt(“Enter an integer: “);
}

More formally, the problem is this: For each iteration, operation A, comparison B and operation C are performed in order. Comparison B depends on the results of operation A. If B returns a certain result, operation C does not execute and the iteration ends.

The example above uses a while loop and has operation A (the readInt()) performed before the loop. The comparison B is performed on entry to the loop, so if the particular result is obtained, operation C (the println()) never executes and the iteration never begins. If, however, the opposite result is obtained, operation C executes followed by operation A again. Operation A now exists in two separate locations.

Inverting the loop to produce a do-while construct causes the condition to be duplicated:

int value;
do
{
    value = readInt(“Enter an integer: “);
    if(value >= 100)
    {
        println(“Value must be less than 100″);
    }
}
while(value >= 100);

Either way, you run the risk of code becoming inconsistent as one version changes without the other.

I won’t insult anyone by assuming you need a version involving goto, so here’s one that doesn’t duplicate the input code or the condition or use goto.

int value;
do
{
    value = readInt(“Enter an integer: “);
    if(value >= 100)
    {
        println(“Value must be less than 100″);
        continue;
    }
}
while(false);

Instead we’ll settle for a continue statement (I personally don’t mind continue statements as long as they are very close to the loop statement and no more than one indent away). If you don’t like the condition ‘false’ there, you can swap it around.

int value;
while(true)
{
    value = readInt(“Enter an integer: “);
    if(value < 100)
    {
        break;
    }
    println(“Value must be less than 100″);
}

As I mentioned above, I hate the break statement, no matter where in the loop it appears.

As for the argument summary earlier, I wasn’t being serious. Those are in fact all listed on McConnell’s page, though they aren’t a complete list of the arguments for and against. Let’s hit them one by one (all the arguments, not just the ones I listed earlier):

Code without gotos is higher quality code
As I suggested earlier, quality is meaningless without a proper definition. A direct result of this is that this argument is meaningless. So let’s skip it.

Gotos make code hard to format
I really don’t know what to do with this one. Gotos only really have one way of being formatted and that’s at the same indentation as the code around it. Labels are either at the start of the line or one indent less than the surrounding code. Reasonable names are obviously required as for everything else involved in programming and label scope is generally limited to the current function. I think this argument generally reflects a lack of experience, instruction and practise in using goto statements. (They may also be scared of them, but who am I to judge?)

Use of gotos defeats compiler optimisation
This is true. Compilers use all sorts of fancy algorithms that rely on knowing the contents of registers and the location of the stack pointer at all points within a function. Throw a goto in there and the compiler needs to reload everything from memory and ensure everything is balanced. This will clearly reduce the performance of a tight loop. However, goto is generally the wrong way to optimise a tight loop. Branchless code is much better. (Branchless code contains the absolute minimum control structures, including if statements and logical operators.)

However, if you’re simply trying to get out of four nested loops and continue from somewhere else, it’s highly unlikely that the compiler had the state set up perfectly for what comes next. In this case, you’re losing very little and gaining a huge amount in terms of readability. Note that break and continue will also break optimisations in a similar way.

Goto supporters claim that they can make code faster and/or more efficient
Apparently this is an argument against using gotos. I personally find the C2 Wiki’s argument “most Goto defenders have died” much more compelling.

Use of gotos tend to violate structured programming principles
Considering a major intention of structured programming was to reduce reliance on the goto statement, this isn’t surprising. Arguing that something breaks rules which were designed to supress it contributes absolutely nothing to a debate.

Experience has shown the badness of goto-laden code
Again, as I said earlier, a certain side of the debate conveniently ignores that both sides actually agree on this. Code full of gotos is harder to follow than code that uses control structures.

Goto is like termites in a rotting house
Once the first one gets in, he’s going to call all his friends and before you know it, there’s gotos everywhere and your house has collapsed. This is the slippery slope argument and relies on the developer knowing what they are doing (easier said than done).



And the arguments for the other side (supporting the use of goto statements where appropriate):

Goto can make up for a gap in a structured language’s capabilities
I’ve already been through some of the gaps in question. Being unable to cleanly leave multiple nestings is the big one, followed by recursion optimisation and the n+1/2 problem.

Gotos can eliminate duplicate code
Again, the n+1/2 problem is the classic example of this. Also, a single goto can remove the need for many tests of a flag as demonstrated here.

Gotos are useful for cleaning up in one section of the code
This ties in with the previous point and the demonstration of it is here and a more specific example here.

Goto can make code faster and/or more efficient
This is clearly the reason for the very strange argument against gotos. It’s true. Goto can make code faster and/or more efficient. Note the words can make. That doesn’t mean always makes. The point of this argument is more that goto should not be banned completely, as opposed to arguing that gotos should take the place of if, while, for, Aunt Betty and Eddie McGuire.

Research has been inconclusive in demonstrating goto’s badness
Another reason not to outright ban them. Gotos have their place and always will until someone comes up with another instruction that performs exactly the same thing.

Goto was included as part of Ada
This is actually quite a legitimate point. Ada is “the most carefully engineered programming language in history” and it includes goto statements. C# is going through a similarly brutal process and so far it also includes goto (with some interesting new uses for it). If these experts are out designing languages that include goto, it can’t be all that bad, can it?


It’s been 39 years since Dijkstra wrote his article “A Case against the GO TO Statement” (renamed by Niklaus Wirth to “Go To Statement Considered Harmful”) and the debate has not been fully resolved. The apparent consensus is that goto is unnecessary, and yet it has never been deprecated or removed from a major language. This can only be interpreted as an indication that, despite all the talk, goto is useful and will remain an essential part of programming languages for many years.

Comments (1)

« Previous entries ·