Recently, I published a blog post on my EDA360 Insider blog about the ARM Cortex-M0 processor core and its expected influence on mixed-signal, low-power IC design. (See “What effect does the ARM Cortex-M0 core have on mixed-signal microcontroller design?”) As usual with my blog posts, I also posted a note about the blog as a discussion in several LinkedIn groups including the “ARM Based Group.” What has followed in that particular group is a really interesting technical discussion about 8- and 32-bit microcontrollers used for low-power designs. This is a compelling set of arguments and you really should read through them if you have anything to do with microcontroller design. I guarantee that these perspectives will help you with your next design.
I reproduce a substantial part of the LinkedIn discussion thread here because there is so much meat in this discussion that it is a shame to keep it confined to one relatively small social-media sphere. Watch carefully as the discussion becomes more and more technical:
Bill Giovino: This is an old argument. There is no migration from 8-bit to 32-bit. To claim a migration is being driven by 8-bit owners is also false.
Growth of 8-bit microcontrollers continues to surpass growth of 32-bit microcontrollers. Witness 2008-2010 where 32-bit growth faltered but 8-bit growth remained strong amongst companies focused on 8-bit.
Andy Neil: @BGiovino – have you told the guys at @NXP? They’re killing off the 8-bitters in favour of 32…
Bill Giovino: Andy, I’m not sure what is the basis of your statement. No one is “killing off 8-bitters”
It’s simple – if 8-bit sales continue to grow, then no one is killing them off because they are thriving.
Benoit Dupuy: Hi Bill and Andy,
Today I am in a flat spin when I see 32-bit processors in a watch
http://www.youtube.com/watch?v=5xa_GUzTb00&feature=related (with MTK6516, http://forum.xda-developers.com/wiki/index.php?title=Chinese_Clones_MTK6516)
One decade before, just 8-bit architecture could be the right architecture with this market. This is not the case today. I do not say the bicycle is going to disappear because there will be the electrical car market or an electrical motorcycle market for example. The bicycle was and will be but this is not a priority for the market. It was. Manufacturing only bicycles does not put a company as the top ten in term of net profit.
I have spoken about iwatch but I can also speak about electrical domestic, future metering market. It is sure, there are always bicycle manufacturers.
Bill Giovino: Benoit, 8-bit put Microchip in the top ten in terms of net profit. The same could be said for NEC/Renesas and Freescale.
The fact is, for interrupt-driven applications an 8-bit architecture will be more efficient and lower power. Even the most efficient 32-bit architecture can use more than twice the instruction cycles than an 8-bit when vectoring to an interrupt.
Andy Neil: @Bill – sorry, I meant specifically *NXP* are killing off *their* 8-bitters in favour of *their* 32-bitters – Cortex-Mx in particular.
See: http://www.8052.com/forum/read/181200 – noting an NXP presentation in which the NXP Product Marketing Manager says they have “no roadmap” for 8-bit.
There have been more NXP 8-bit discontinuations since then – and, as we know, Microchip have been quick to capitalise on that…
Andy Neil: Sorry – the link in the 8052.com post to the NXP presentation now points to something completely different.
Bill Giovino: @Andy, you are absolutely right – thanks for clarifying.
However, the problem NXP is having is that they are inviting in their competition to scalp their 32-bit business. See, many complex systems have a 32-bit along with an 8-bit that does peripheral processing. Since NXP doesn’t bid on 8-bit anymore, the customer buyer has to invite NXP’s competitors that are 8-bit suppliers like ST, Atmel, etc. to bid on the 8-bit. And while they are there, the competitors gaze at NXP’s 32-bit socket and lick their chops.
It’s not a technical issue; it’s a sales strategy issue. An 8-bit or low-end 16-bit is necessary in the portfolio to prevent poaching of the 32-bit socket, even if you never win the 8-bit socket. ST knows this. Texas Instruments has an official strategy based around this.
There have been times when I’ve had to quote business I knew I couldn’t win just to get visibility into the project and protect business I had already won.
Zoltán Kócsi: @Bill Giovino:
Well, I don’t know. I can get a 32-bit ARM from NXP at the price of an 8-bitter. It has as good as or better peripherals. Has the same amount or more FLASH/RAM. It has a much more powerful core. Why would I need to have an 8-bitter?
There are special cases when you need a *particular* 8-bit or 16-bit chip, because it has some specific peripheral unit or some feature which is a must and only that chip offers it. But for general microcontroller applications the low-end ARM offering beats the 8 and 16 bit chips in price, peripherals and computing power while it is not more expensive and doesn’t have higher power consumption.
If I have a 32-bit ARM system which needs a peripheral processor for whatever reason, I would slap on one of the low-end 32-bit ARM processors. Same architecture, same compiler, same everything. That is, code maintenance is simpler, sharing code between the chips is simpler, everything is simpler. Why would I struggle with an 8-bitter when I can have an ARM for the same price?
Let’s compare the NXP LPC11Xxx series and the Atmel 8-bit AVR chips, an arguably very popular family.
For the same amount of FLASH/RAM the NXP is cheaper. However, the FLASH consumption of the AVR will be higher, because many things which cost you a single 16-bit instruction on the ARM will cost you several 16-bit instructions on the AVR. The ARM runs at 50MHz, the AVR at 20. The ARM has 32-bit timers, high-speed SPI with a FIFO, a UART with a FIFO, a 12-bit ADC. The AVR’s SPI is slower and no FIFO, the UART has no FIFO, the timers are 8 or 16 bits (and you have a lot less of them) and the ADC is 10-bit and slower. The NXP chip has pin-compatible drop-in variants with USB or with more analogue stuff or with a dual CAN controller on board. The ARM has a unified address space, making life very simple, the AVR has an explicit Harvard architecture, where data access to the FLASH requires special instructions and all sorts of hackery.
The only reason I’d use the AVR if I absolutely, positively needed 5V operation: the NXP is 5V tolerant, but can’t drive 5V logic levels directly. That’s about it, but even then I’d be tempted to use the ARM and slap some cheap level translator chip on the lines which drive 5V logic.
I used a lot of 8-bit chips, 8051, Z80, 6502 derivatives, 68HC1x, AVR. I also used 683xx microcontrollers a lot. When the 683xx was my 32-bit (well, 16/32 bit) main processor, indeed I probably would have used an 8-bitter for mundane slave processing functions or as the main processor for simple devices, basically because of the huge difference between the 8-bitter and the 683xx in terms of price, power consumption, board real-estate and circuit complexity. There is absolutely no complexity with the NXP Cortex-M0 based series. You give it power, that’s it. Doesn’t even need a crystal. Simpler than an 8-bitter. It is small. It is cheap. It is low-power.
I don’t really see any reason to keep using 8-bit microcontrollers except in very specific cases where a feature of the chip is needed – but that feature is independent of the core’s word width.
Bill Giovino: @Zoltán, I don’t think you understand the 8-bit segment. 8-bit applications aren’t driven by performance. Let me state that again – 8-bit applications aren’t driven by performance.
See, the vast majority of 8-bit applications can be handled by a mere 8MHz PIC16. So, for most 8-bit applications most any 8-bit microcontroller can do the job. Understand?
The technical aspects of an 8-bit application are driven by EFFICIENCY of architecture and interrupt response. For example, if it’s low power, every single clock cycle counts.
For example, interrupts on the Cortex-M0 require at least 12 clock cycles. Some 8-bitters require only four. In some interrupt-driven applications, those 12 clock cycles are a very long time. It’s actually possible for some 32-bit microcontrollers to be slower than an 8-bit for a particular application because of interrupt response.
So, you have a socket looking for an 8-bit where 95% of the many thousands of 8-bit micros are ridiculously overpowered for the socket. What does a 32-bit bring to the socket if performance is unimportant?
I’ve been in semiconductor sales & marketing for 20 years and I can tell you with confidence that the most powerful microcontroller almost never wins the socket. The final decision is always based upon the channel.
Also, remember my post on how 8-bit is a defensive strategy.
And let’s not get started on 16-bit. TI’s 16-bit MSP430 is possibly the fastest growing microcontroller out there – its sales are staggering and by itself it is bigger than many mid-sized semiconductor companies. The MSP430′s sales growth alone is bigger than most semiconductor startups.
Benoit Dupuy: Thank you Bill for your interesting answers. I closely follow this discussion. In the same time, I have tried to find some documentation about the same subject. Here are some links to that documentation:
Now Bill, in this following STMicroelectronics ‘ presentation, slide 23, we can see the 32-bit MCU trend (in term of Millions of Dollars) is very higher than 8 and 16 bits market:
I advise many people to read this presentation called “Microcontrollers
* Basics and Trends- written by Anders Pettersson, FAE Manager Nordic and Baltic
Bill Giovino: Beniot, first, the ST presentation contains many mistakes. Slide 7 incorrectly defines CISC vs RISC.
Second, the MCU trends is on slide 21 (not 23) and unfortunately is by IC Insights. They have been discussed here before – to put it politely, their figures don’t add up and disagree with everyone else. I don’t know what IC Insights’ methodology is, but they even get the past trends wrong.
Zoltán Kócsi: @Bill:
Well, I believe you, however:
The 12 clock cycles of the IRQ response on the Cortex costs you 12 * 20ns = 240ns. That corresponds to 4 cycles on a 12.5MHz 8-bitter. Plus, due to the higher clock frequency, the interrupt latency jitter is lower on the Cortex and if you are really touchy about the IRQ response time, the latency jitter is an important factor. Furthermore, if it then takes 2 more instructions to actually *respond* to the interrupt, that would take 40ns for the Cortex but 160ns on the 12.5MHz chip, so you’d need 22MHz 8-bitter to match the Cortex and your jitter would still be higher.
Also, I stated that there can be specific requirements which would warrant the use of a specific chip. You need a particular feature more than anything else, you find a chip which has that feature. If the chip happens to be 8-bit, so be it. If it is 32-bit, the better.
We can argue how many embedded applications require some extra-special feature. Power consumption is one such feature for battery powered devices; that is a frequent enough thing. Extra fast interrupt response, I believe, is not a requirement for the overwhelming majority of cases and where it is, most of the time you need a fast processor anyway and/or you use HW for the sub-microsecond responses (e.g. engine management systems).
I don’t care if the 32-bit chip is ridiculously overpowered and and double overkill for blinking a LED. As long as it is as cheap as the 8-bit chip one and its overall power consumption (for 99% of the time the CPU will idle with its clock stopped) is within my power budget, it achieves its design goals, therefore is adequate. If then I factor in the development time, tools, ease of use and other features which are not directly technical, the 32-bit chip will win. Yes, it is wasted silicon and as an engineer I don’t like wasting resources. But I consider that waste as the price to pay for a simpler, cleaner system, with more reserves in it.
Again, as I mentioned, if you have a specific problem which mandates the use of a specific chip or architecture, that’s one thing. But your garden variety consumer and industrial embedded systems do not have extra special requirements and in that case, I think, the low-end 32-bit chips will, more often than not, offer a cheaper and simpler solution.
It is my personal viewpoint. If you have proof that the market thinks otherwise, then that means that the majority of embedded engineers disagree with me. I am open to arguments: convince me that when there are no specific requirements which force you to use a specific chip, there is still merit in using an 8-bit chip instead of a faster, cheaper, simpler 32-bit one.
Bill Giovino: @Zoltán, where power consumption is king (and it is for many of today’s 8-bit sockets) efficiency is what rules. Cranking up the clock speed to make those 24 cycles go faster isn’t an efficient solution if an 8-bitter can do it at a slower clock speed.
Plus, an 8-bitter will always be lower power because it is a simpler architecture with one-quarter the bus size.
As I wrote before, the vast majority of 8-bit applications can be handled by an 8MHz PIC16. If performance isn’t the issue, and efficiency is, then a 32-bit offers no advantage over an 8-bit.
As you wrote, if you need a particular feature more than anything else, you find a chip which has that feature. If the chip happens to be 8-bit, so be it. If it is 32-bit, so be it – it is not “better”, it just is
Now, this is based upon my understanding and experience. I suppose that if I worked for ARM I might be privy to a wider range of examples that would make me see things differently. Because after all this, there are some 8-bit applications that CAN be served better by a 32-bit – usually when either the 8-bit reaches a 20MHz clock speed or the code has so many threads it finally requires a more complex RTOS.
To me the time when an 8-bit needs to move up in architecture is when the firmware has so many tasks that it needs a more complex RTOS. Without giving away proprietary numbers, I can tell you that most 8-bit sockets are at or below 16K program memory. Above that, unless it is linear code the 8-bit is encroaching on 16- and 32-bit territory.
But at that point the technical considerations take a back seat to political considerations. You have a Buyer for 8-bit MCUs at a large corporation that is soliciting bids for 8-bit MCUs. To try to get them to replace it with a 32-bit is to threaten his job.
Jonny Doin: @Bill Giovino:
> “[...] if performance isn’t the issue, and efficiency is, then a 32bit offers no advantage over an 8bit”
Much on the contrary. Efficiency has very little to do with registers size. Efficiency, in a given semiconductor process, has to do with energy-delay product (EDP). It is related to the amount of energy taken to perform some logic activity, versus the time taken.
EDP is a function of the process (i.e. the feature size for a transistor) and also of the architecture of a computing circuit.
Specifically speaking of 8bit processor like the 8MHz PIC16 versus a 32bit core based on ARM Cortex-M0, there are large differences on process (size) and architecture.
From an EDP (efficiency) perspective, a smaller process that is optimized for low-power has smaller transistors with lower gate capacitances and much lower energy per bit. It is also much faster, meaning that the same amount of logic processing can be done in less time. For the comparison of PIC16 vs Cortex-M0, the latter is much more energy efficient, due to the differences in process of the two chips.
Now, from the architecture perspective, both are simple architectures, with small pipelines, but very different core designs.
The ARM core has a large number of 32bit registers that are very close to the core, and take very little energy to be used. The instruction set can take advantage of those close registers and perform complex operations with just a few instructions. Furthermore, several chains of operations can be performed with different registers, so for example you can use multiple pointers to manipulate structs very efficiently, and several temporary registers to perform long expressions. Another architectural benefit is that the ALU can perform much more work in a single clock, and all operations are 32bit, meaning that the program needs no stitching of multiple registers to perform real-world math. Yet another important thing is that memory is a flat addressing space, so the program needs no banking selection to perform jumps or tables, or subroutine calls.
The PIC16 core architecture, on the other hand, was created in a time when the process size was huge, and was designed with severe limitations in core logic gates, to keep it small and fast. Those limitations, however, translated in a very constrained core architecture, with a single working register, limited ALU operations, a single index register and a heavily banked addressing scheme. As every other 8bitter, you have to combine registers in memory to perform any useful numerical computing, using carry-chain and a single operation ALU. The ALU has only ADD/SUB and logical ops, making simple multiplication and division extremely slow operations. Another severe limitation is that the stack is only 8 calls/interrupts deep, making interrupt code especially crafty, and limiting a call chain to very shallow functions.
The effect of the too simple architecture of the PIC16 makes just about any operation to be almost an order of magnitude less efficient than on a Cortex-M0, while some operations are 1000′s of times less efficient. Even interrupt response, as you mentioned, is much longer in a PIC16. At 200ns cycle time, (max core clock rate = 5MHz), it is 10x slower than a Cortex-M0@50MHz, and takes 4cycles (pipeline latency) + 8instructions (minimum context save+bank), or 2400ns for any interrupt to reach the first instruction.
Comparatively, a Cortex-M0 will take 12 cycles, including context switch, or 240ns. That is a 10x longer latency for the PIC16. Furthermore, any interrupt routine will take much more instructions on the PIC16, due to the instruction set, ALU and register limitation. Additionally, all the firmware in the PIC16 will need to be written in assembly, due to the very compiler-unfriendly microarchitecture. As a comparison, the Cortex-M0 is C compiler efficient, so even interrupt handlers can be written in C with high code efficiency.
In any scenario where efficiency is involved, it is impossible to argue that an 8bit PIC16, or any 8bit processor, is better than an ARM Cortex-M0. The M0 is 10 to 100 times more efficient, depending on the 8bit processor being compared.
In any scenario when performance is necessary, the M0 advantages are even clearer.
Maybe the only scenario where a PIC16 would be better than an ARM Cortex-M0 is for very high temperatures and noise, due to its inherently more robust (and older) silicon process.
So, except for extremely specific applications, choosing an 8bit over an ARM these days is hard to defend.
Jonny Doin: @Bill Giovino:
Perhaps some hard data makes the point very clear, without any adjectives.
- Idd @ 20MHz, 5V: 7mA(typ), 15mA (max)
- Idd @ 4MHz, 5V: 1.6mA(typ), 4mA (max)
- Idd @ 4MHz, 3V: 600uA (typ), 2mA (max)
- Idd @ 50MHz, 3.3V: 5.5mA (typ)
- Idd @ 12MHz, 3.3mA: 1.4mA (typ)
- Idd @ 6MHz, 3.3V: 850uA (typ)
- Idd @ 3MHz, 3.3V: 600uA (typ)
So, simple and direct, you have more processing power, for less energy.
Bill Giovino: @Jonny, C’mon now. Let’s play fair.
You are specing an NXP part introduced four months ago against a Microchip part introduced NINE YEARS AGO.
I’m not sure why you chose to compare a modern NXP part against an obsoleted Microchip part.
This PIC16(L)F1939 is two years old and still beats the pants off the LPC1100XL. Plus, the PIC16 has a higher degree of integration, including an LCD controller.
Here is some honest hard data:
PIC16(L)F1939 (March 2010)
- Idd @ 4MHz 3.3V 380uA(typ) 450uA(max)
- Idd @ 4MHz 5V 450uA(typ) 520uA(max)
- Idd @ 3MHz, 3.3V 320uA(typ) 390uA(max)
PIC16(L)F1939 Sleep current: 20nA
LPC1100XL Sleep current: “below 2uA”
As you can see, the PIC16 uses half the run current of the NXP part, and 1% (one percent) of the NXP’s sleep current.
Bill Giovino: Efficiency is the instruction cycles needed to get work done. In heavy interrupt-driven applications (a typical 8-bit app) interrupt latency is key. For the PIC16(L)F1939, interrupt latency is spec’ed at 3-4 instruction cycles for synchronous and 3-5 instruction cycles for asynchronous.
So for a typical synchronous interrupt, the PIC16(L)F1939 needs a maximum of 10 cycles to round-trip service the interrupt verses 24 for the NXP part. If we assume only 8 cycles to process a simple request then the PIC16 has an overhead of 125% versus 300% for the NXP part. Nest those interrupt and the NXP part looks weaker and weaker.
If you look at the above linked article, you’ll see that Steve bases his argument on the statement:
“Microcontrollers typically have applications where [they] wake up, take a sensor reading, and go back to sleep. Processors in the Cortex-M range are able to do this in fewer cycles and effectively reduce the amount of the active duty cycle for the device. A communications stack typically has 32-bit addresses. Moving this around with an 8-bit microcontroller, an 8051 for example, is going to take more cycles, so the entire device is powered up longer.?
As we can see, for modern 8-bit microcontrollers, this statement is false.
Zoltán Kócsi: @Bill:
Well, what is that application category where you need to respond to interrupts very fast but otherwise you do not need to do any calculations, processing, communication or any other activity? I understand from what you said that that’s the main application field of the 8-bit processors, but I wonder what that field is?
Anyway, let’s take a look at your claims.
I am not a PIC man, so I downloaded the PIC16F1939 datasheet and looked at it in a bit more detail. It turns out that the oscillator of the PIC can run at 32MHz. Very respectable. Except that a CPU cycle is 4 clocks.
Therefore, a PIC running at 32MHz will have an effective CPU cycle rate of 8MHz. In your example the PIC services the interrupt in 18 cycles, that is, 2.25us. The ARM at 50MHz needs 32 clocks (and 1 clock there is 1 cycle), i.e. 0.64us. The interrupt latency on the PIC, according to the datasheet, is up to 5 instruction cycles, that is, 625ns. On the ARM it is at most 3+12 clocks, that is, 300ns. The PIC latency timing diagram in the datasheet, on pages 89 and 90 shows the relationship between the clock and the CPU cycles pretty clearly.
You said “Nest those interrupts and the NXP looks weaker and weaker”. Well, you brought it up, so let’s examine this issue in a bit more detail then.
The PIC does not have a stack to save the context, it saves them in shadow registers. Which means that you can’t do nested interrupts, unless you save the shadow registers in memory explicitly, in which case of course your assumption about the 8 instruction per interrupt is blown to pieces.
In fact, your 8 instructions per interrupt is rather questionable in the first place. The PIC does not have vectored interrupts, you need to determine the source of the interrupt by reading the pending interrupt registers and see which bit is set in which register. Since the chip is 8 bit and there are 20 interrupt sources, it means going through 3 registers. This is done in your interrupt routine, by your code, not by hardware. You are burning clock after clock just to work out which peripheral to attend to.
In contrast, the ARM has a proper vectored, nested interrupt controller, with user programmable interrupt priorities and all. Each interrupt source has its own interrupt service routine, thus you do not need to work out who asked for the IRQ. Furthermore, the ARM does interrupt chaining, saving lots of clocks when the interrupt load on the chip is high.
Now, as per power consumption. Considering that your aim is to minimise your interrupt latency and response time, I assume you run the processor at its maximum clock speed. Thus, you run the PIC at 32MHz. It consumes about 3mA @ 3.3V and the core is running at 8 M cycles. Alas, you can’t put the chip into low-power sleep mode, because it needs 1024 clocks or 32us to get out of sleep, which of course is not an acceptable interrupt latency figure. To match the interrupt response time the ARM has to run at 14.22MHz. In sleep mode (the ARM can be put to sleep, as it wakes up immediately) at that frequency the ARM needs just a tad more than 1mA, i.e. one third that of the PIC. If you can afford the PIC’s power budget, then you can run the ARM almost at its full speed, in which case of course the PIC’s interrupt latency is over twice as long and even with the 8 instructions per IRQ assumption, its response time is 4 times slower than that of ARM.
The LPC1100XL sleep (well, deep sleep, actually) current is specified as below 2uA. But the deep power-down mode, when you stop everything except the circuitry needed to wake up, however, is only 220nA. Admittedly, the PIC is still 10% that of the ARM, I give you that.
So, as we can see, your selected modern 8-bit microcontroller is still massively inferior to the ARM in all your selected measures, except the deep power down power consumption.
Jonny Doin: @Bill Giovino:
Continuing the analysis @Zoltán started, lets pick some Cortex-M0 that is really low power, for example the EFM32ZG103, a very new part from EnergyMicro.
This chip process is optimized to low power, and has a top frequency of 32MHz.
Its figures are very comparable to the PIC16LF1939, with a 45uA/MHz @ 32MHz.
All the conditions that @Zoltán mentioned are applicable. The enhanced PIC will save the minimum context, but usually the user needs to detect the interrupt source, dispatch the interrupt and set the BSR, before running any ISR code. That can add an average of 10 cycles to the 4 cycles of hardware latency. At 32MHz (8MHz core), these 14 cycles take 1750ns. If nested interrupts are enabled, the new interrupt will need to save context to memory and dispatch to the new ISR, manually. That will add at least 10 more cycles, or 1250ns.
Comparatively, the ARM at 32MHz (with one clock per cycle, but at the same current) will take 12 cycles to vector to user code, or 375ns. For tail-chaining of nested interrupts, the ARM takes only 6 cycles, or 187ns. So compared, this ARM is from 6 to 10 times faster at the same clock as the PIC, and can be put to sleep at 600nA current, and 2us wakeup.
This is both faster and lower power than the best PIC16 cores.
Despite some improvements targeted to better C compiled code, the PIC instruction set is still much less efficient than the 16bit THUMB instruction set, and takes much more instructions to execute the same operations.
All that still points to the same answer, be it based on performance or on efficiency.
I must say that I used the PIC cores for several years, having designed dozens of circuits with them, and I am very fond of them. However, today I cannot use PICs or any other 8bit cores, even for the smaller functions. Using ARMs everywhere, I can reuse code like communication protocols, filters, and core functions.