PSoC5 bare metal

Structure   (Top)   PSoC5 bare metal
This tutorial has two purposes
a) demostrate bringing up an ARM microcontroller from bare metal.
b) Creating a basic open-source toolset for the PSoC microcontroller from Cypress.

The code for these can be found on github: (external link)
Various programming tools can be found under (external link)

General Bare Metal Requirements

To work with baremetal one needs to understand:
  • the system's memory map - ie address/size ROM and RAM
  • size of heap and stack needed
  • how the compiler/assembler/linker decide what code and data goes where
  • how to bring up the system from reset
  • how to handle interrupts/exceptions

Requirements for GCC
  • GCC ARM cross compiler for Cortex M3 (thumb instruction set).
  • gcc linker script (gcc.ld)
  • bare metal GCC startup file (startup_ARMCM3.S)
  • test program to compile

One typically also needs:
  • libc or equivalent
  • system specific startup code

System Requirements

This tutorial and the PSoC environment was developed under OS X, however it is intended to work under Linux as well.
For the PSoC side of things I used the CY8CKIT-050 PSoC 5LP Development Kit (external link). In general any PSoC5 ARM-based development kit should be fine - however the programming software I've written relies on the onboard FX2LP USB interface chip to program the PSoC. (Note PSoCs earlier than the 5 series used different microcontrollers).

GCC Compiler

Any GCC compiler that generates ARM thumb instructions should work. The differences are typically in the linker scripts and support libraries and we're supplying our own.
The compiler I used was downloaded from: (external link)


The first thing we need to do is tell the gcc linker about the above information
gcc.ld contains both general and device specific linker instructions.

The attached gcc.ld is a simplified linker script that I put together. It should be fine for most embedded C programs. As a pragmatic, usable, teaching tool it deliberately omits the additional complexities introduced by C++, exceptions, debugger information etc.

Stack and Heap

The stack and heap sizes are explicitly set by two variables in our linker script. Note that being very low level variables that could be exposed in the same namespace as the main C program they are prefixed by double underscore.
Alternative schemes may create output sections for the compiler for heap/stack. These slightly increase the on-disk ELF executable size (not the bin file size) but allow extra checking of stack space usage etc.
The heap grows up from the end of used RAM (__etext) and the stack grows down, normally from the top of RAM.
The calculations at the end of the linker script perform a basic check that the expected heap size doesn't overlap the expected stack size. Note that this checks the theoretical calculations only, nothing enforces this.

Linker Background

There are plenty of explanations about the gnu linker on the net. In summary - a program uses roughly the following types ("sections") of memory:
  • text - read only program code - ROM
  • data - variables with initial non-zero values - RAM
  • bss - variables with initial value of zero - RAM
  • rodata - read only data - ROM
  • heap - runtime dynamic allocation (from heap) - RAM
  • stack - runtime dynamic allocation (from stack) - RAM

where ROM implies not just (EP)ROM technology but equivalent primarily read-only storage eg Flash.

All of this information finds its way into the ELF file (and can be explored as shown later). Ultimately however there is just ROM and RAM so the above sections reduce in a bare metal binary file to user code plus code to initialise RAM contents appropriately (eg by copying data from ROM to RAM).

The linker script below defines two types of memory "FLASH" (ie ROM) being read only and executable "rx" starting at address 0 and 256K long and 64K RAM unusually centred on address 0x20000000 (hence the 32K subtraction).
Unfortunately the linker does not let us use constants for ORIGIN and LENGTH calculations so the RAM size (0x10000) has to be used explicitly multiple times.
Side: Some development systems seem to remove heap and stack from the "RAM" section (ie by reducing LENGTH by these two sizes). There are pros and cons so I've just kept things simple.

There are three output sections here .text, .data and .bss:
  • .text is primarily the code and generally read-only data
  • .data contains variables with initial non-zero values
  • .bss contains variables initialised to zero (these exist in a distinct section to save space for historical reasons )

Both .data and .bss must be copied/initialised in RAM as the variables need to be read-write.
Read-only data (rodata) is supported and is copied into the read-only text section.
Occasionally some code and even rodata may be explicitly placed into RAM because RAM is much faster to access that ROM/Flash. This is typically done by creating special sections in the linker script, flagging the relevant functions in the high level code and adding startup code to manage the copying.

For PSoC use there are a couple of points to note apart from the MEMORY values:
  • ENTRY(Reset_Handler) specifies the program entry point. It is typically ignored in an embedded system in which only one program is run and the entry point of that program is defined by other mechansisms.
  • the input section .isr_vector is critical. This is the first text input section because the isr_vector is a table of exception address handlers that must be located based at address 0 when the CPU comes out of reset.

  1 . /* This linker script does NOT support
  2 .  *   - C++
  3 .  *   - exception handling
  4 .  *   - debugging
  5 .  */
  6 . 
  7 . MEMORY
  8 . {
  9 .   FLASH (rx) : ORIGIN = 0x0, LENGTH = 0x40000 /* 256K */
 10 .   RAM (rwx)  : ORIGIN = 0x20000000 - (0x10000 / 2), LENGTH = 0x10000  /* 64K */
 11 . }
 12 . 
 13 . __StackSize = 0x1000;
 14 . __HeapSize  = 0x1000;
 15 . 
 16 . 
 17 . ENTRY(Reset_Handler) /* this isn't used in bare metal */
 18 . 
 20 . {
 21 . 	.text :
 22 . 	{
 23 .             . = ALIGN(4);
 24 .             *(.isr_vector)
 25 .             *(.text)
 26 .             *(.rodata)
 27 .             . = ALIGN(4);
 28 .             __etext = .;
 29 . 
 30 . 	} > FLASH
 31 . 
 32 . 		
 33 . 	.data : AT (__etext)
 34 . 	{
 35 .             . = ALIGN(4);
 36 .             __data_start__ = .;
 37 .             *(.data)
 38 .             . = ALIGN(4);
 39 .             __data_end__ = .;
 40 . 
 41 . 	} > RAM  /* AT>FLASH */
 42 . 
 43 . 	.bss :
 44 . 	{
 45 .             . = ALIGN(4);
 46 .             __bss_start__ = .;
 47 .             *(.bss)
 48 .             *(COMMON)
 49 .             . = ALIGN(4);
 50 .             __bss_end__ = .;
 51 . 	} > RAM
 52 . 	
 53 . 	/* Set stack at top end of RAM */
 54 . 
 55 . 	__StackTop = ORIGIN(RAM) + LENGTH(RAM);
 56 . 	__StackLimit = __StackTop - __StackSize;
 57 . 	
 58 .         __HeapStart = __bss_end__; /* use highest/last RAM section */
 59 . 	__HeapLimit = __HeapStart + __HeapSize;
 60 . 
 61 . 	/* Check if data + heap + stack exceeds RAM limit */
 62 . 	ASSERT(__HeapLimit < __StackLimit, "Heap overlaps Stack")
 63 . }

System Reset

When the PSoC (ARM Cortex M3) comes out of reset it performs a number of initialisation steps (PSoC 5 ARchitecture TRM section 20.3ff). From a software perspective there are two key steps:
  • Sets the stack pointer
  • Executes code at a specified address.

Interrupt Service Routine (ISR) vectors

A key configuration table is the ISR_vector (Interrupt Service Routine) jump table. This is primarily a table of function addresses. Each address refers to a function that is executed when a specific hardware condition occurs. The PSoC/M3 has 15 standard "exceptions" generated by the CPU and an additional user 32 configurable IRQs.

The two most critical entries in the ISR vector table are entries 0 and 1:
  • Index 0 in the ISR_vector table is special - it isn't an interrupt address, instead it is the initial stack pointer value.
  • Index 1 is the address of the first instruction to execute . After coming out of reset and having initialised the stack pointer, the CPU jumps to the address at Index 1.

The code extract below shows the first few (minimal) ISR table entries.

Assembler extract of ISR table
.section .isr_vector
	.align	2
	.globl	__isr_vector
	.long	__StackTop            /* Top of Stack (index 0) */
	.long	Reset_Handler         /* Reset Handler (index 1) */
	.long	NMI_Handler           /* NMI Handler */
	.long	HardFault_Handler     /* Hard Fault Handler */
	.long	MemManage_Handler     /* MPU Fault Handler */
	.long	BusFault_Handler      /* Bus Fault Handler */
	.long	UsageFault_Handler    /* Usage Fault Handler */
	.long	0                     /* Reserved */

  • ISR Addresses are all 4 bytes long.
  • Unlike some other ARM cores, the Cortex M3 ISR table contains addresses only, NOT instructions (typically branch/jump instructions on other cores). "The NVIC of PSoC 5 devices provides low latency by allowing the CPU to vector directly to the first address of the interrupt service routine, bypassing the jump instruction required by other architectures." (Section 1.3.2 PSoC 5 ARchitecture TRM) and "The call of the interrupt service routine corresponding to an interrupt line is not a branch instruction. The address of the interrupt service routine is stored in the vector table, which results in the direct call of the routine. This method of execution prevents latency in the call of the interrupt service routine." (Section 7.4.3)
  • Since the M3 only executes thumb code, all jump addresses have their LSB set to 1 (this is done automatically when thumb-code functions are defined).
  • See PSoC TRM Section 7.4.3 for more details on interrupt handling.
  • The ISR vector table in initially stored starting at the first ROM address (thus stack top is at ROM address offset 0). For performance or alteration it can be useful to relocated the vector table into RAM. For the PSoC this also means adjusting the ISR vector table address register appropriately.

C Environment Startup

Having jumped to the Reset_Handler address (an assembler function) the real work begins.
The Reset_Handler must perform the following minimum tasks for C code:

  • initialise RAM as expected by C code
    • copy data section from ROM to RAM
    • zero BSS space in RAM
    • optionally relocate interrupt vectors from ROM to RAM (for performance or adjustment)
    • optionally relocate "performance" code from ROM to RAM (RAM is faster than ROM)
  • initialise system specific hardware SystemInit()
    • setup/adjust system clocks as needed
    • initialise device config
  • call _start()
    • call main()

  • C++ code needs additional startup and finalisation code (for example to initialise static constructors) and additional sections in the linker file.
  • The ISR and fast code relocation are omitted in this simple version.

C Variable Initialisation

The C program that is about to be run uses variables - however RAM contents post reset are initially undefined (or at best zeroed). The initial values of C variables are actually stored in ROM in what was the .data linker section however the program is compiled to expect the variables a specific RAM addresses (gcc.ld instruction: .data AT(__extext) {} > RAM ). We initialise the C variables by copying the ROM values into RAM. For historical efficiency reasons this is two part process.
  1. Variables with an initial non-zero value are copied from ROM to RAM
  2. Variables with an initial value of zero (from .bss linker section) do not need to be copied, instead we just need to initialise the block of RAM they occupy (grouped together by the linker) to zero.
Read-only data does not need to be copied to RAM (except for performance reasons)

C variable initialisation is normally done in assembler although in principle it can be done with carefully written C code (bearing in mind the C environment is not yet set up).

copy data section from ROM to RAM pseudo-code
uint32_t *ram_addr = __data_start__;
	uint32_t *rom_data = __etext;
	uint32_t data_len = __data_end__  -  __data_start__;

	for (i=0; i<data_len; i++)
		* ram_addr ++  =  * rom_data ++

zero BSS space in RAM pseudo-code
uint32_t *ram_addr = __bss_start__;
	uint32_t data_len = __bss_end__  -  __bss_start__;

	for (i=0; i<data_len; i++)
		* ram_addr ++  =  0;

The code extract below shows the actual assembly instructions found in startup_ARMCM3.S (where .S means pre-processed assembler source).

startup_ARMCM3.S : Reset_Handler
// copy data section
	ldr	r1, =__etext
	ldr	r2, =__data_start__
	ldr	r3, =__data_end__

	cmp	r2, r3
	ittt	lt
	ldrlt	r0, [r1], #4
	strlt	r0, [r2], #4
	blt	.loop1

// zero BSS
	ldr	r1, =__bss_start__
	ldr	r2, =__bss_end__

	movs	r0, 0
	cmp	r1, r2
	itt	lt
	strlt	r0, [r1], #4
	blt	.loop2

// call system init
	bl	SystemInit

// call _start
	bl	_start

  • the variables xxx are defined in the linker file gcc.ld (In C one defines them as char xxx or uint8_t _xxx but only ever take their address never the value).
  • The assembler calls two functions ("bl" is effectively a subroutine call instruction) SystemInit and _start. These can be C functions provided the prototype is void func(void).
  • DMA is used on some platforms to bulk copy data however on the ARM PSoCs assembler code is at least as fast as DMA for this use case.

Application Startup

There are three key functions in starting up a C application on an embedded system, SystemInit(), _start(), main(). The names are not magical but they are more-or-less convention and existing code may rely on using these names.

In our initial version SystemInit() is shown as an empty function. In practice however it will probably need to contain hardware specific instructions to adjust system clocks and correctly septa up special registers in a complex device like the ARM Cortex series). This leaves us with _start() which can be defined as shown below.

application startup
void main(void);

void SystemInit(void)
	// hardware specific initialisation
	// ...

void _start(void)
	// pre-main startup.

	// main should never return.

void main(void)
    // your cool app here 

  • In this case main is defined as void main(void) - given it probably needs no parameters and need not return anything.
  • main() should never exit. If it does there's a "protective" infinite loop in _start. This loop could be moved into the exit() family of functions if necessary but such functions should never be called in most embedded systems. the infinite loop could be replaced by a CPU halt instruction if appropriate/available.

Standard C Library

This leads us onto the standard C library, historically a mishmash of functions. Some make sense for an embedded system, others do not. There are several embedded friendly lib libraries (eg newliv uclibc) that may be used.
A bare metal system does not exit() for example, it does not even know how to malloc() or memcpy() out of the box let alone the more challenging printf() etc.

In the spirit of starting light we don't need much with which to begin, indeed for our demo we'll only implement a few routines:
  • sbrk() - for heap allocation (although I haven't included malloc)
  • memset()
  • memcpy()
  • printf() (or a version thereof)

Note that dynamic memory allocation is often frowned upon in embedded systems that need to be 24x7 reliable. If you've ever had to reboot a computer that's become 'slow' you'll understand why.

  • + : A leading plus sign indicates that this word must be present in every object returned.
  • - : A leading minus sign indicates that this word must not be present in any row returned.
  • By default (when neither plus nor minus is specified) the word is optional, but the object that contain it will be rated higher.
  • < > : These two operators are used to change a word's contribution to the relevance value that is assigned to a row.
  • ( ) : Parentheses are used to group words into subexpressions.
  • ~ : A leading tilde acts as a negation operator, causing the word's contribution to the object relevance to be negative. It's useful for marking noise words. An object that contains such a word will be rated lower than others, but will not be excluded altogether, as it would be with the - operator.
  • * : An asterisk is the truncation operator. Unlike the other operators, it should be appended to the word, not prepended.
  • " : The phrase, that is enclosed in double quotes ", matches only objects that contain this phrase literally, as it was typed.


Related Sites