Getting started with the Nucleo-F030R8

Many months ago, I thought I'd get myself an ARM development board to play with at home. I have access to plenty of ARM hardware at $work, so I am quite familiar with ARM MCUs, but my home development stack is mostly a bunch of old MSP430 boards. They have aged well, but it's getting harder and harder to find people familiar with it, and this makes IRC channels somewhat less fun.

So I bought a STM32 Nucleo-F030R8 development board. It's embarrassingly cheap, around 10 EUR, and comes with an on-board ST-LINK programmer/debugger, meaning that this board and an USB cable is literally all you need. It doesn't have any fancy on-board peripherals, but exposes all the useful pins, which is exactly what you want in a development board.

What you'll find here: a quick guide for setting up a GCC-based toolchain and a minimal Makefile-based project, along with instruction on how to use OpenOCD and gdb. The astute reader is expected to bring their own IDE.

Note: technically, this guide applies for Linux (that's what I tested it on), but the instructions apply for FreeBSD as well (except that the toolchain is available in the ports collection). On OpenBSD, you will need to compile the toolchain by hand, but I expect everything else will work as well. The toolchain can be downloaded in binary form for OS X; I expect everything should work in the same manner there.

Summary

Here is what we'll do today:

  1. Set up the toolchain.
  2. Set up a minimal project, using some core CMSIS functions (if you are new to ARM, you probably don't know what CMSIS is. We'll cover the basics of that as well)
  3. Flash our minimal program using OpenOCD, and debug it using gdb.

Note: do not shy away from gdb. If there is one tool in the GNU ecosystem that you should learn, that tool is gdb. It may look unfriendly, but it's very powerful, and very scriptable. Learn to configure it well and use it well, and going back to debuggers with point-and-click interfaces will be very frustrating.

A few words about the toolchain: we are going to use GCC, which is freely available. I will show you how to create a minimal Makefile-driven project. This is portable information — it should be useful to adapt to any IDE you wish to use.

Ready?

Toolchain

The toolchain is available here. Statically-linked binary downloads are available for Linux and OS X; I suggest you get those, for minimal hassle. FreeBSD users will find this toolchain under devel/gcc-arm-embedded.

Extract the binary somewhere nice, or install it using your system's package manager, whichever is easier for you.

The directory structure I usually set up for these things is essentially like this:

    +- $HOME/workspace/stm32
    +- toolchain
    |  |
    |  +- gcc-arm-none-eabi-7-2017-q4-major
    |  +- STM32Cube_FW_F0_V1.9.0
    |
    +- projects
    +- setupenv.sh
  

Everything platform-specific goes in its own directory. The toolchain, development libraries and other useful tools go in toolchain/ under that directory, and projects go in a separate directory. The setupenv.sh script sets up the right environment variables in the shell session where it is sourced, in order to make life easier when writing scripts and interacting with the toolchain from the terminal.

Your system will need to find these new binaries in its PATH variable. If you installed it using the package manager, it should land in a system that is already in PATH. If not, you will have to set that up manually.

The way I like to do it is using the setupenv.sh script I mentioned above. I will show you how to set up such a script below (it's really just two lines, in our case), but if you would rather set things up manually or globally, what you need to do is:

$ export PATH=$PATH:path/to/gcc-arm-none-eabi-7-2017-q4-major/bin
  

Libraries

We have a compiler and the standard library. This is enough to compile ARM code, but the hardware-specific bits are missing. We have to get thos separately.

Detour: If you're new to ARM land

In ARM land, the hardware-specific bits are implemented as part of, and abstracted behind, something called the Cortex Microcontroller Software Interface Standard or CMSIS.

CMSIS is a set of specifications that every ARM vendor implements for its own hardware. Now, CMSIS is offers a vast array of functionality, and I mean vast. It includes specifications for DSP, for a RTOS API, a driver API, debugger interfaces — a lot of useful things, but what we will be interested in today is only one part of it, called CMSIS-CORE.

CMSIS-CORE is a hardware abstraction layer (HAL) that sits over all core (and many non-core) peripherals such as the system timer, and over the Cortex core itself.

Critically, it contains a unified interface for the initialization code, which performs the clock and system configuration needed to get the MCU to run useful code.

If you come from Linux land, you can think of CMSIS-CORE as a sort of supercharged combination between a BSP and a subset of what would normally go into the BIOS firmware.

OK, detour over.

What we need is in the STM32CubeF0 package, which you can get here. You will have to give your data to ST for this (but if you want to write embedded software for a living, you may want to subscribe to ST's newsletter. Yes, newsletters from all sillicon vendors are big loads of marketing bullshit, but they are useful sources of news, too).

Extract this package in a nice place, too (preferrably the same place where you extracted the toolchain).

We now have everything we need to write and compile useful code.

Setting up a minimal project

Setting up a Makefile-based project for a new hardware platform of this type, where you cross-compile bare metal aplications, involves five things:

  1. Teaching your system about your toolchain
  2. Teaching your system how to get object files from C code and what object files it needs to compile (this is, of course, true for any Makefile)
  3. Teaching your system where to find any platform-specific code besides whatever came with your compiler.
  4. Teaching your system how to link this code.
  5. Teaching your system how to convert the resulting file in a format that you can write to non-volatile storage.

All this is done as part of the Makefile. First, I will show you how to do it, then I will show you the resulting makefile for a minimal program.

These five things don't appear in this order in a Makefile, but it's going to be a lot easier for me to explain them in this order, so I'm going to show you each piece individually, and then we'll look at the resulting makefile.

Preamble

Remember what we said about the PATH variable? Your system needs to know where to find the right gcc binary to run. This step is optional, but it makes things a little easier later, because you don't need to work with absolute paths.

Here's a small script that I use for this:

export PATH=$PATH:$PWD/toolchain/gcc-arm-none-eabi-7-2017-q4-major/bin
export TOOLCHAINROOT=$PWD/toolchain
  

Source it from the top directory, above the place where you extracted the toolchain and the STM32Cube package, and you will get the right magic in $PATH for that terminal session (and an environment variable, $TOOLCHAINROOT, that points to the toolchain/ directory; it will be useful when we write the makefile)

Our first program

Here is our first program, in all its glory:

#include <stm32f0xx.h>

int main()
{
        volatile int x = 32;
        volatile int y = 64;
        volatile int z;

        z = x + y;

        for (;;)
        {
                z--;
        }

        return 0;
}
File: main.c

It does not do much, but it is enough to be able to validate our environment via gdb. We can flash LEDs later.

Now, time to assemble a Makefile and turn this into sweet machine code.

Teaching your system about your toolchain

With modern GCC, we need to give make explicit instructions only about our compiler:

CC = arm-none-eabi-gcc

Compiling your program

A note on compilation: by compiling, I mean object code generation — everything up to linking. If you are not sure what this means, this is a very good overview.

This stuff is often obscured by higher-level tools (it doesn't help, of course, that the entire process is called compilation — it's not the tools' fault) and, in all honesty, it is somewhat less relevant when you're writing higher-level applications for full-blown operating systems. It's more relevant on embedded systems for reasons that will become apparent immediately.

What do we need in order to turn our program into a working binary?

Obviously, first we need the program code. It's in main.c, we just wrote it.

In order for that program to run, though, a number of peripherals need to be in working order. At a minimum, the system clock needs to be set up, so that the CPU core has a steady clock to work with. This part is performed by a function called SystemInit. SystemInit is part of the System module in CMSIS-CORE.

This function is found under STM32Cube_FW_F0_V1.9.0/Drivers/CMSIS/Device/ST/STM32F0xx/Source/Templates/system_stm32f0xx.c. You can use that file in-place or copy it into your tree, since it's BSD licensed. I prefer the latter.

Astute readers are probably unhappy about how I handwaved my way through this whole in order for that program to run thing. OK, SystemInit initializes the core peripherals, but who calls that? And who calls main(), for that matter? There is no dynamic loader on a tiny ARM core.

This part, astute reader, is called the startup code. The startup code takes care of a lot of things: it sets up default interrupt handlers, it sets initial values for SP and PC — and, yes, it calls SystemInit() and main().

The startup code is CPU-specific. In our case, it's under STM32Cube_FW_F0_V1.9.0/Drivers/CMSIS/Device/ST/STM32F0xx/Source/Templates/gcc/startup_stm32f030x8.s. This, too, is BSD-licensed, so you can copy it to your source tree.

In short: you need the program code (main.c), the system initialization code from CMSIS-CORE (system_stm32f0xx.c) and the startup code (startup_stm32f030x8.s).

Now all that's left is to tell our compiler how to produce object code from this:

system_stm32f0xx.o: system_stm32f0xx.c
     $(CC) $(CFLAGS) -c system_stm32f0xx.c -o system_stm32f0xx.o
  
main.o: main.c
     $(CC) $(CFLAGS) -c main.c -o main.o
  
startup_stm32f030x8.o: startup_stm32f030x8.s
     $(CC) $(CFLAGS) -c startup_stm32f030x8.s -o startup_stm32f030x8.o

Teaching your compiler about platform-specific code

Our compiler knows about ARM cores in general, but not about each platform in particular. The very first line of our program, #include <stm32f0xx.h>, should light up a red light somewhere in your brain. Clearly, GCC can't know about every ARM platform in this world — how is it supposed to know where to find this header file?

Well, these files are also part of the STM32Cube package. We just need to tell GCC what directories to look under (well, among other things):

CFLAGS = -I$(TOOLCHAINROOT)/STM32Cube_FW_F0_V1.9.0/Drivers/CMSIS/Device/ST/STM32F0xx/Include \
         -I$(TOOLCHAINROOT)/STM32Cube_FW_F0_V1.9.0/Drivers/CMSIS/Include \
         -Wall -mcpu=cortex-m0 -DSTM32F030x8 -Os -g

This is where the TOOLCHAINROOT environment variable in setupenv.sh comes in handy. Again, it's entirely optional; it just saves you from having to deal with absolute paths.

Note that we are also explicitly defining the platform that we use, by passing an inline preprocessor directive — -DSTM32F030x8. This is specific to ST's source tree, it's not specific to GCC. However, it is a very common idiom, that you will see on many other platforms that have to support a wide array of MCUs with the same CPU core. We indicate which CPU core that is by passing -mcpu=cortex-m0.

We are also passing -Wall (print all warnings), -Os (optimize for size) and -g (produce debug symbols) to the compiler. The last option is only useful if you want to debug code; it will cause the binary to swell, and you should disable it for production code.

Teaching your compiler how to link code for this platform

OK, our compiler now knows everything it needs to know about how to produce object code for our CPU. Now we need to teach it about the last part of the compilation process — linking.

Why is this such a big deal that we need to talk about it separately? Normally, you don't think about this step at all when writing a Linux application; you know it's there, but not much else, right?

Things are a little different if you don't have an operating system, a loader and a dynamic linker to automate things away for you. Think about it for a moment: the fact that your program runs means that somehow the startup code knew where main() was. It means that, somehow, a memory section has been set aside for data, and objects with static storage duration work right, so we're talking about both initialized and uninitialized data. You can call functions, so clearly the C runtime knows about a stack. In other words, the compiler and the startup code know about .text, .data, .bss, about where the stack begins, and so on. How?

If this is not your first time working with this kind of system, you know the answer; for everyone else: all this is specified in something called a linker script.

The one we are going to use is available under STM32Cube_FW_F0_V1.9.0/Projects/STM32F030R8-Nucleo/Templates/SW4STM32/STM32F03R8-Nucleo/STM32F030R8Tx_FLASH.ld. You can use it in place, or copy it to your own source tree and modify it, but be careful — you are not allowed to distribute it. If you do need to distribute a copy of a linker script with your program (maybe you don't like the defaults), you will have to write one from scratch.

It's not hard once your learn the syntax, but the defaults will do for us (I warmly advise you to read it; you don't need this information most of the time, but it's useful to know what happens under the hood).

Now, we need to tell the linker to use this script (among other things:)

LDFLAGS = -Wl,--gc-sections -TSTM32F030R8Tx_FLASH.ld

...and we need to tell the linker how to link everything together in order to finally give us the aplication binary:

minimal.elf: system_stm32f0xx.o main.o startup_stm32f030x8.o
        $(CC) $(CFLAGS) $(LDFLAGS) system_stm32f0xx.o main.o startup_stm32f030x8.o -o minimal.elf
  

Our application binary is going to be called minimal.elf. There's just one problem with it, and we'll figure it out in a minute.

Converting the application binary to HEX

We are going to use OpenOCD to program our device. There is just one problem here: OpenOCD has no idea about this ELF thing. It wants a file in Intel HEX format.

Fortunately, this is easy to do: objecopy knows how to do it. We just need to tell make where to find objcopy:

OBJCOPY = arm-none-eabi-objcopy

...and what to do with it:

minimal.hex: minimal.elf
        $(OBJCOPY) -Oihex minimal.elf minimal.hex

Putting it all together: the minimal Makefile

OK, let's assemble all that in a single file:

CC = arm-none-eabi-gcc
OBJCOPY = arm-none-eabi-objcopy
CFLAGS = -I$(TOOLCHAINROOT)/STM32Cube_FW_F0_V1.9.0/Drivers/CMSIS/Device/ST/STM32F0xx/Include \
         -I$(TOOLCHAINROOT)/STM32Cube_FW_F0_V1.9.0/Drivers/CMSIS/Include \
	 -Wall -mcpu=cortex-m0 -DSTM32F030x8 -Os -g
LDFLAGS = -Wl,--gc-sections -TSTM32F030R8Tx_FLASH.ld

minimal.hex: minimal.elf
	$(OBJCOPY) -Oihex minimal.elf minimal.hex

minimal.elf: system_stm32f0xx.o main.o startup_stm32f030x8.o
	$(CC) $(CFLAGS) $(LDFLAGS) system_stm32f0xx.o main.o startup_stm32f030x8.o -o minimal.elf

system_stm32f0xx.o: system_stm32f0xx.c
	$(CC) $(CFLAGS) -c system_stm32f0xx.c -o system_stm32f0xx.o

main.o:	  main.c
	$(CC) $(CFLAGS) -c main.c -o main.o

startup_stm32f030x8.o: startup_stm32f030x8.s
	$(CC) $(CFLAGS) -c startup_stm32f030x8.s -o startup_stm32f030x8.o

clean:
	rm -f *.o *.~ minimal.elf
File: Makefile

At this point (don't forget to source setupenv.sh from the root directory if you use it, or to edit your Makefile as needed if you don't), if you run make, all the magic should happen by itself.

You can find a ready-made package of this stuff (minus the linker script, which I cannot redistribute) here.

Loading and running the binary

We are going to use OpenOCD to flash the code, and gdb to run and debug it.

What's the deal with these things? OpenOCD is a Free and Open On-Chip Debugging, In-System Programming and Boundary-Scan Testing tool. That's a mouthful; in short, it's a program that knows how to talk with the ST-LINK debugger/programmer on this board (among many other things).

Now, OpenOCD has a low-level debugger interface. You can use it directly if you want, but it's not friendly. What you really want to do is use a propper debugger, that talks to OpenOCD in a platform-agnostic language. That, of course, is gdb.

OpenOCD needs only one parameter — it needs a config file, which (briefly) tells it what's at the other end of the USB cable: what programmer, what CPU and so on. Recent OpenOCD versions have the right one for this board under /usr/share/openocd/scripts/board/st_nucleo_f0.cfg. The config file to use is specified when launching OpenOCD, using the -f switch.

Once launched, OpenOCD opens three ports: one for a telnet interface, one for a TCL interface, and one for gdb. By default, these ports are 4444, 6666 and 3333, respectively. On my box, one of those ports is taken by another service, so I used a slightly modified config file, which exposes the telnet interface over port 10000 and the gdb interface over port 9000.

So, let us launch OpenOCD:

$ openocd -f ./st_nucleo_f0.cfg

Then in another terminal, let us connect to its telnet interface:

$ telnet localhost 10000
Trying ::1...
telnet: connect to address ::1: Connection refused
Trying 127.0.0.1...
Connected to localhost.
Escape character is '^]'.
Open On-Chip Debugger
> 

Now to write the image:

$ telnet localhost 10000
Trying ::1...
telnet: connect to address ::1: Connection refused
Trying 127.0.0.1...
Connected to localhost.
Escape character is '^]'.
Open On-Chip Debugger
> reset halt
Unable to match requested speed 1000 kHz, using 950 kHz
Unable to match requested speed 1000 kHz, using 950 kHz
adapter speed: 950 kHz
target halted due to debug-request, current mode: Thread 
xPSR: 0xc1000000 pc: 0x08000280 msp: 0x20002000
> flash write_image erase minimal.hex
auto erase enabled
device id = 0x20006440
flash size = 64kbytes
target halted due to breakpoint, current mode: Thread 
xPSR: 0x61000000 pc: 0x2000003a msp: 0x20002000
wrote 3072 bytes from file minimal.hex in 0.231592s (12.954 KiB/s)
> reset
Unable to match requested speed 1000 kHz, using 950 kHz
Unable to match requested speed 1000 kHz, using 950 kHz
adapter speed: 950 kHz
> reset halt

At this point, the program is written, and the CPU is halted. We are going to use gdb to run it and examine its state:

$ arm-none-eabi-gdb minimal.elf     
GNU gdb (GNU Tools for Arm Embedded Processors 7-2017-q4-major) 8.0.50.20171128-git
Copyright (C) 2017 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later 
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "--host=x86_64-linux-gnu --target=arm-none-eabi".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/$gt;.
Find the GDB manual and other documentation resources online at:
<http://www.gnu.org/software/gdb/documentation/>.
For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from minimal.elf...done.
(gdb) target remote localhost:9000
Remote debugging using localhost:9000
Reset_Handler () at startup_stm32f030x8.s:65
65        ldr   r0, =_estack
(gdb) b main
  

Great; now let's run the program; we'll see it stopping at the breakpoint that we placed at the entry point:

Breakpoint 1 at 0x80002e0: file main.c, line 5.
(gdb) c
Continuing.
Note: automatically using hardware breakpoints for read-only addresses.

Breakpoint 1, main () at main.c:5
5               volatile int x = 32;
(gdb)

From this point on, nothing too fancy happens, of course. You can interrupt the program and check the value of z — if you want any more indication that the program is running properly. The tricky part, that of getting a program to run on this thing, is done. From here on, taking over the world is a simple matter of programming :-).

Ending Notes

Where to go from here? CMSIS is a useful thing to be familiar with. It's a pretty portable skill, and one that does not tie you to a manufacturer's MCU. It does tie you to ARM MCUs, but these are good times to know ARM.

Once I go through the really basic stuff, like flashing LEDs and registering button presses, my favourite exercise for a new development board is to interface it with a PC on one end (e.g. via USART), and with a small speaker and character display on the other end, then use it to play cute sounds and show things like new mail alerts. It's a straightforward enough project, but representative of several peripherals, as well as of the CPU's programming model and processing power.

On IDEs, toolchains and whatnot. There are various IDEs you can use, including big names like Keil. My advice, especially when it comes to this sort of devices, is to figure out how to program them without an IDE. Once you understand everything that is happening behind the scenes, you can use an IDE if you are more productive with one.

On the matter of debuggers and ICEs. The relative... unfriendliness of embedded development tools, and the lack of patience of many hobbyists, has led to a curious state of affairs, where it is somehow considered acceptable to do embedded programming without an ICE and without any useful debugging features.

This is not OK. If a device does not support that, then you should not be doing production work on it. If you work somewhere and they ask you to do it, quit.

The only reasonable exceptions are legacy devices that simply do not support an ICE, and extremely small programmable devices (think tens of bytes of RAM and hundreds of bytes of flash) that do not have any useful debugging capabilities.

That's all for today. Happy hacking, folks!

Thursday, May 10 2018