In the old dayz, the first real pc by intel was the 8086. 16 bit archetecture. however, it was too pricy / complex / whatever, so they came out with the 8088. down to 8 bits. how's that for progress?
After that came breaking the 640Kb RAM barrier with 16 bit architecture -- the 80286, or 286 for short. Some might remember the clunky all-in-one-monitor-and-cpu boxes called "ps/2". anyway, it sucked. It could only handle 1 MB directly, DOS took up a significant portion of that, but could be told to run above the 640K line (mostly) and thereby free up "real" memory. It still had to have a residue (about 22K) in low mem, but the rest (about 40K i think) could be placed in high mem. But the point is moot becuase there are several address spaces up there that are reserved for outdated hardware (even today), like A000-AFFF for a monochrome monitor, E000-EFFF for EGA, C000-CFFF for ROM programs and other necesities, etc. Dos could be configured to use the monochrome space for ram since mono's were out of phase by then, but whatever. It could also have more than 1 MB by a memory trick called paging -- but paging could only be done in 16K chunks. And I think it was limited to 2MB of addressable RAM. It might have been 16, i can't remember.
The real significant improvement in archetecture started with the 386 (80386). It could address 4 GB of physical ram (using virtual ram to make up the difference), handle memory swapping pseudo-automatically, and even execute programs that resided above 1MB without requiring the code be swapped into a low mem spot to execute. But in order to do that, a new concept came into being -- protected memory. Memory now came in 4 flavors, called rings, that denoted access priveleges. 0 is the highest, generally where some kernel code resided, and the interrupt table, and various other spots needed by the bios. applications resided in ring 3, the lowest. for some reason, no one ever writes for rings 1 and 2. anyway, processes can demote themselves to lower rings, but only those that run at boot time (during which the switch from real mode to virtual or protected mode happens) have ring 0 privs. (my recollection might be a little off, but I think that that's right.) no code in any ring can execute or look at code or data in a higher ring. this is where the greatest number of windows GPF's come from. an application tries to inspect code/data that the kernel is maintaining in ring 0. The entire os doesn't run in ring 0, just parts of it; but that is an aside. This is why syscalls are important. An app has to ask the kernel to do things in ring 0 that it cannot do on it's own. Another aside, windows 386 was brought into being by taking a mini-disassembler (like DEBUG, but only slightly more sophisticated), stepping through windows 286, placing it into protected mode, and removing the GPF's one -- by -- one. That is why the win9x kernels, which swollowed all the code from windows 386 (later renamed windows 3.0), are so damn buggy. it has nothing to do with supporting the DOS FS. but anyway.
Then came the dreaded 486 (80486). Great idea, poor implementation, but that's for later. In this version of CPU's, intel decided to "pipeline" the instruction cycle. Instead of the CPU taking one instruction, executing it, and using the results, each instruction enteres the execution pipeline. There are three stages: fetch/pre-decode, branch-prediction, and decoding. The instruction is then handed to the CPU. This allowed up to four instructions to be placed in a queue to the CPU, greatly increasing code processing throughput (measured as the time it takes to load an execution and execute it). And the branch prediction tried to guess which way a conditional would go, and tell the prefetch unit to grab the next expected codes and place them on the pipeline. At worst, the BPU would incorrectly guess each and every time, and you would end up with a 386, since the wrong execution path would have to be dumped out of the pipeline. The horrendous part? Intel decided to go with a 16k buffering cache instead of the 4k used in the 386. If the prefetch unit had to access data on a boundary, 32k of data would have to be loaded into the buffer. If the reference that is in the pipeline was sufficiently modified at execution, that data would have to be dumped and another 32k loaded. Remember memory back then was 60-70ns access time, not 4-8ns. This is when optimising code became important. A reference had to be spaced far enough away from anything that would modify it so that the prefetch would load the correct data.
Then Intel said "what if we can get two instructions in two pipelines side by side?" Enter the U and V pipes for the 80586 architecture. (note that I don't call it a 586...) Caveat: the V pipe could only handle about 8 instructions, and they could only be executed side by side with about 20 instructions in the U pipe. I think there were about 80 instructions in the 80586 instruction set at the time. Very inefficient. Another way of puttin git is that there were only 8 instructions for the V pipe that were compatible with 20 instructions in the U pipe. But it was groundbreaking. Furthermore, the pipelines were lengthened to 5 stages (prefetch, predecode, branch prediction, decode, decode) and allowed references that were in the prefetch/predecode stage to be modified by an instruction in one of the decode stages without causing the entire queue to be dumped. There would be a small bubble in the pipe, but it's better than losing the entire pipe (although that was still possible). Cyrix (remember them?) heard about the design and said "we can do this better", and they did. Intel used to be the leader in chipmaking until this point -- now the clones were not satisfied with reverse-engineering finished products, but rather creating better implementations of Intels not-yet-released designs. Cyrix also beet intel to the punch and patented/copyrote the "586" name first. Hence the "pentium" name. ( i think intel decided to say "pentium II" instead of "sextium" due to the potential for a bad phonetic joke). Cyrix chips and boards were bug-prone and had a lot of hardware conflicts, so they did not last long. but right behind them came AMD. Back to archetecture: the clone pipelines were "x" and "y", and "y" could handle the entire instruction set (minus 4 or 6 instructions), and all of them were compatable with all instructions in the "x" pipe (minus 4 or 6). A pipeline dump would still happen if an instruction being executed in one pipe modified a reference in the other's decode stages. But here ends the great strives forward in architecture. The rest is mainly cosmetic.
Intel, in the 80686 line, fixed the oversights in the 80586, added a few instructions to allow the CPU to directly handle multimedia (MMX anyone?)(the pre P-II MMX was mainly experimental), and they moved the L2 cache off of the CPU die onto the motherboard. yeah, huge strides forward.
80786 / 80886 were mainly PR-masturbation about clock speeds. (OOOOH!!! I HAVE A 2GHz Email/Web Browser!!!!!!) They might have put additional pipes in place, i stopped paying attention.
I might be off about x86 numbers, I think intel retarded the number but advanced the name somewhere around the P-II.
someone correct me if I am wrong...
-t.