The other day I ran into a very strange issue while debugging a problem
at work. We were evaluating a new microprocessor for use in one of our
products and had been trying to exercise all the different
functionalities the processor had to offer.
I was testing out the input capture functionalities and discovered that
the first capture (rising edge of input pulse) always worked correctly,
but the second edge was never detected. After some digging, it appeared
as though the I2C communication routines were interfering with the
capture.
I turned on debugger, ran code and it broke inside the I2C routine,
hanging waiting for a reply which was never going to come (test code had
no time outs). I commented out all calls to the I2C code within the
program, reran it, and the debugger once again broke inside the I2C
routine. Somehow, the program counter was jumping to an unused block of
code. I know it was unused because the compiler complained about the
unused segment.
Putting on my debugger hat, I placed a break point immediately after the
first capture, and stepped through the code, one line at a time. On the
following line, something strange happened
Instead of advancing to the next instruction, the program jumped almost
to the beginning of memory (15 bytes after the end of the memory mapped
register range). I repeated this several times, always with the same
result. As an experiment, I rearranged a few lines of code and reran.
This time, the code jumped on a different line, but landed in the same
location.
The only thing I know of in microprocessor land which can change your
execution point is a faulty processor, or more likely, a interrupt
occurring. After poking though the map file, I realized the I2C routine
was the first executable block in code, and the decoded instruction 0xFF
which fills all unused space is a move instruction.
It makes sense, that a bad jump into unused space would eventually land
the code in the I2C routine, but that was triggering the interrupt? A
five minute search yielded the solution.
The code we were using was based on example code provided by the vendor.
I hacked the various pieces together and changes from interrupt driven
function to polling driving functions. Unfortunately, I missed the line
in the initialization routine where the input capture interrupt
occurred. Most surprisingly, the compiler allows us to enable an
interrupt, without defining an interrupt handler. (In this processor,
all interrupt controls had direct bit access. They were not grouped
together requiring a byte mask.)
The interrupt event occurred and the program counter was loaded with the
default value from the interrupt table. We have no idea how 0xFFFF from
the table was translated into 0x034A which is where execution picked
back up. Commenting out the interrupt enable line resolved our issue and
we were able to finish evaluating the board.
Morals:
1) Interrupts are evil. They introduce too much variability into the
execution process.
2) Sample code is a great place to start from, but don’t base your
demonstration code on it
3) Debugging takes time and patience, take a break once in a while to
relax and clear your thoughts