Inside Windows Versioning: Why letting legacy code stick around isn’t the best idea

by Lucas2 years ago

If you happen to be a sharp eyed Insider, you might have noticed the build number 17000 appear on BuildFeed a few days ago only to be seemingly superseded by build number 16350. Some speculate that these mark two completely different development stages but given that that’s not an actual reason and has been refuted by employees, we at Inside Windows want to offer a different explanation.

The first and rather obvious pointer can be found on Twitter. Microsoft employee Brandon LeBlanc has hinted at the number 17000 being an issue in the following tweet:

The issue isn’t really apparent at first glance and the best most could do is guess. After all, why would a simple increase of the build number cause an issue that’s too complex for a tweet? Doubting the truthfulness of this statement wouldn’t be unreasonable either as not every PR person has a good track record when it comes to being honest on social media.

We wanted to find out whether this statement holds up on our own. To kick off the entire plan, we had to do a little logical backtracking. Primarily — what is the build number used for? Versioning, of course. The first task was to figure out how do programs obtain the version of the currently running Windows instance. After following a trail of forwarded exports across multiple DLLs, we’ve ended up all the way in NTOSKrnl which is the heart of NT-based operating systems since their inception.

The first surprising thing we’ve come across was the fragmentation. If we asked you to guess how many times can the build number be found in the kernel, what would you say? One, perhaps two times? The truth isn’t all that pretty to hear — the build number is stored in 17 different places.

Our first idea was to modify the kernel to identify itself as build 17000 and look for issues in the running OS. After fiddling around with a disassembler to find the offsets we needed, a hex editor to produce the patched kernel, and a BCD editor to actually boot said kernel, we managed to start a virtual machine running Windows 10 Build “17000”. Everything seemed to work just fine — the boot didn’t take unexpectedly long, we logged in without any issues, and the desktop didn’t seem any different either. This impression, however, didn’t last too long as here’s what we saw once we opened CMD.

What is this? Build 616? That can’t be right. The kernel clearly states that it’s build 17000 and we double checked our patches before applying them. At this point we were more curious than ever, we had to know what causes this to happen. Trying various mathematical operations to find out how this number is calculated seemed like a logical starting point, and it indeed was.

17000 – 616 = 16384

If you’ve been around the block for long enough, you may recognize the number 16384 (0x4000). It is the RTM revision number used in Windows versions starting with Vista and ending with Windows 10 TH1. There is another way to get the number 616 from 17000 besides the obvious subtraction of 16384 — you have to perform an AND operation with the number 17000 and 16383 (0x3FFF). In case you are not familiar with the AND operation, here is a simple picture showing what happens. As you can see, only the matching bits persist.

With no evident usage of 0x4000 in version related routines, we set off to look for 0x3FFF. This time, we actually hit something. A function called MmCreatePeb makes use of the build number as well as an AND 0x3FFF operation. Looking up the function name on Google has yielded an interesting result — an article from 2009 on the Microsoft Press Store which happened to detail the creation of a process’ PEB where one of the steps involved the Build Number variable, an AND operation, and the number 0x3FFF. Jackpot.

At this point we had a grasp on what causes the build numbers to practically loop over once they reach a multiple of 16384 and we wanted figure out what was the first build to show this behavior. We initially expected this logic to have been implemented at some point during Vista’s development. After all, it was the first Windows version to utilize 16384 anywhere in its build tag. But to our surprise, even the very first build of Windows Vista available to public already had this piece of code in place, thus we decided to go even further back in time.

As we were doing that, our jaws kept dropping lower. Present in Windows Server 2003? Check. XP? Check. 2000? You must be kidding me! Check. This continued all the way until Windows NT 3.51. We now knew that the AND operation was implemented at some point during Windows NT 4.0’s development. Thanks to friends that had beta builds of NT 4 saved locally, we were quickly able to find the very first publicly available build with this “feature” — build 1264.

To test our theory, we modified the kernel of build 1264 to identify itself as build 16384. This showed us the two expected faces of the build / the code — the raw data and the handled version info. While the boot screen shows the number as 16384 (raw), the OS’ interface shows the number 0 (handled as a version).

This adventure has left us with an interesting conclusion — the reason why build 17000 was most likely superseded by 16350 has its roots all the way back in March 1996, effectively making it a 21 year old issue. We hope Microsoft will be able to resolve this issue in a timely manner considering there is about a month left until they reach 16384 (assuming one build per day) which is the number where hell breaks loose.

It’s reasonable to assume that practices like this may also be used in other parts of the OS and its build system, so there may have been even more issues than those we found.

Did you enjoy this more in-depth article? Please let us know in the comments down below 😀

Article by @tfwboredom