![]() |
runtime thread execution monitoring (identify cause of hung code)
See following posts for some interesting ideas. |
Re: runtime thread execution monitoring (identify cause of hung code)
The method that a lot of embedded systems watchdogs I've come across is the following:
High priority task maintains a collection of bits for other periodic tasks. High priority task (at the rate that other tasks are expected to run) sets bits false. Other tasks set the bits true when they run. If the High priority task, when it goes to set the bits notices that they haven't been flipped back, then it knows that the low priority task has not run in that timeframe. Once it realizes that a certain number of frames have passed without the other task 'reporting in', it knows something is wrong. You don't really need mutual exclusivity or interrupts to maintain this, so long as you can ensure that the 'task monitor' thread is of elevated priority. There's no metrics available on 'how much' of the other tasks available cycle was consumed before it was completed, but that could be added fairly trivially. I work with several systems like this, if the 'tried and true' notion is of any value. :P |
Re: runtime thread execution monitoring (identify cause of hung code)
Quote:
Has anybody else used this or something similar, and if so, did it prove helpful? |
Re: runtime thread execution monitoring (identify cause of hung code)
This bit system is similar to how the LV diagram does execution hilighting. The bits are cleared by the diagram before node code begins, and nodes flip their bit as they execute. But this data structure is problematic on multi-core architecture performance since you get cache coherency issues as each node sprinkle bits as a side-effect of their run. It is still correct, but the overhead of this tracing goes up significantly with parallel cores and especially parallel packages.
Also, can you give more detail on why you don't need a mutex? From your description I can see folks writing a read/modify/write piece of code with no mutex, and getting very confused. Are you using something like __sync_fetch_and_or()? Greg McKaskle |
Re: runtime thread execution monitoring (identify cause of hung code)
Quote:
Asserting a bit in a word is not an atomic operation. You have to read the word, or it with a mask, and write it back. You could get interrupted in the middle of the sequence. Since memory is not an issue (for FRC), instead of flipping bits you could use an array of words*. Then there'd be no contention among the threads. * the length of which is whatever is atomic for the processor and most efficiently handled by it |
Re: runtime thread execution monitoring (identify cause of hung code)
How about using a counter instead of a flag? Code:
* start thread * |
You're both right, an atomic fetch and set is required at a minimum. Once you accept that, and expand to fetch and increment and fetch and decrement operations, there's a wide variety of metrics that can be prepared and transmitted later on to a dashboard or logged in a file.
I'm imagining pursuing this with functional global variables in labview with our team this year. |
Re: runtime thread execution monitoring (identify cause of hung code)
If you're using C++, vxworks has a built in watchdog that can be useful.
You can use it as such. You need to include wdLib.h Code:
WDOG_ID watchDog = wdCreate();sysClkRateGet() function. If your code completes in time you cancel the call with Code:
wdCancel(watchDog);There may be equivalents for the other programming languages. If you are looking for more tools for evaluating profiling you can see the profiler we wrote. Our code is open source and posted on our website lynbrookrobotics.com |
| All times are GMT -5. The time now is 00:48. |
Powered by vBulletin® Version 3.6.4
Copyright ©2000 - 2017, Jelsoft Enterprises Ltd.
Copyright © Chief Delphi