> The 8086/8088 do not provide consistency ... self-modifying code can be used to determine the queue length, distinguishing the 8086 from the 8088 in software.
I sincerely doubt I was the first to work that out; but I remember being so incredibly happy when I figured that one out, when it solved a problem i had.
Cannot now recall why the difference was significant, something about installing different routines for bashing serial ports i think.
In the era before internet, you either figured things out on your own or you couldn't get anything done.
If you were very lucky, some magazine might have mentioned it. Another way out was to just use disassembler if some other software package performed the same thing.
You still do on occasion. Some years ago I’ve tried to write a precisely timed bitbanging loop on a STM8 microcontroller (a 650x/680x-alike with an allegedly 3-stage pipeline and a small prefetch buffer; cents per chip pre-Covid), and the instruction prefetch completely screwed me up. (At least I think it was the instruction prefetch? The thing depended on branch target alignment, whatever it was.) The one relevant question on Stack Overflow that hasn’t received any answers in years; the manufacturer documentation mentions its instruction timings are “simplified”, aka a lie, and gives a more elaborate model of the pipeline that it admits is also a lie and can’t reproduce the timings I’m seeing.
Many things are immeasurably easier than what I remember as a middle-schooler with an utterly anachronistic 286 in post-Soviet early 2000s Moscow, so that’s nice. It doesn’t make the blasted loop work, though.
(Many others are also worse. Today’s me could work the motherboard design of the 286 by looking at it, even without the manuals; my current laptop’s manufacturer’s refusal to release the schematics annoys me enough that I’ve half a mind to ask some physicists if they have a CT machine they could run the board through.)
Compared to now, back then it was a lot more you had to figure out on your own.
I remember coding a game on C64 in the eighties. Just to figure out how to print the players score so that it is sufficiently fast was a challenge. Dividing by 10 with modulo to convert numbers to digits was just way too slow.
My method was not to use normal math, but to directly manipulate screen RAM characters when the score increased.
That was a very cheap way to increase the players score by say 1000 – you didn't even have to care about 3 lowest digits, just inc thousands place by 1, if it overflowed past 9, increase next position left, etc.
Looks to me like you, and perhaps others, could benefit from a set of timing tests.
You know, setup a test harness. With timers and such, then walk through the cases.
Of course, that is exactly what the manufacturer should have done! I always wondered at the high errata metrics associated with some catalog parts. It is just not enough to work through the circuit and hope for the best!
There were all kinds of tricks prior to cpuid to figure out what kind of cpu you were running on . I had actually forgotten about that - thanks for reminding me !
CPUID has been added first in Pentium (66 MHz), in 1993.
Nevertheless, there have been some late variants of 486 that have been introduced after the first Pentium, in 1994 or later, and which had CPUID, e.g. the Intel 486DX4 (100 MHz).
AMD had 2 generations of 486DX4 (and of 486DX2), the first did not have CPUID (and it had a write-through cache memory), while the second had CPUID (and it had a write-back cache memory).
Some Cyrix CPUs with properties intermediate between 486 and Pentium had CPUID, but it was disabled by default and it could be enabled in the BIOS.
Measuring the length of the prefetch queue was the standard method to identify 8088 vs. 8086 and this was available in several commercial CPU detection utilities that were available for MS-DOS, e.g. in Norton Utilities or the like.
At that time I have discovered this by disassembling such a utility program.
In Pentium the cache memory became split into instruction cache and data cache.
This has forced the introduction of the snooping workaround, otherwise the stores into the data cache would not have influenced the content of the instruction cache.
it also may have something to do with why CPUID is strongly serializing? that always really confused me..its not like, the CPU type is going to race with a load or a store
I sincerely doubt I was the first to work that out; but I remember being so incredibly happy when I figured that one out, when it solved a problem i had.
Cannot now recall why the difference was significant, something about installing different routines for bashing serial ports i think.