odroid-c2:application_note:software:watchdog_timer

Watchdog on Linux/Ubuntu

Watchdog timers are commonly found in embedded systems and other computer-controlled equipment where humans cannot easily access the equipment or would be unable to react to faults in a timely manner. In such systems, the computer cannot depend on a human to reboot it if it hangs; it must be self-reliant.

Odroid C2 support watchdog driver gxbb_wdt to control the PMU.

Watchdog driver gxbb_wdt is configurable for Odroid C2.

You should be able to see /dev/watchdog and /dev/watchdog0 device files being created.

target
odroid@odroid64:~$ ls -la /dev/watchdog*
crw------- 1 root root  10, 130 Feb 11 11:28 /dev/watchdog
crw------- 1 root root 248,   0 Feb 11 11:28 /dev/watchdog0
odroid@odroid64:~$

Watchdog daemon will trigger and reboot if we access the device file manually.

target
root@odroid64:~# echo 3 > /dev/watchdog
[  186.570231] watchdog watchdog0: watchdog did not stop!
root@odroid64:~#

To manually stop watchdog to reboot.

target
# echo V > /dev/watchdog

To install watchdog daemon

target
sudo apt-get install watchdog

Create dir for watchdog logs files

target
sudo mkdir -p /var/log/watchdog

Append the default watchdog configuration. /etc/default/watchdog

# Start watchdog at boot time? 0 or 1
run_watchdog=1
# Start wd_keepalive after stopping watchdog? 0 or 1
run_wd_keepalive=1
# Load module before starting watchdog
watchdog_module=gxbb_wdt
# Specify additional watchdog options here (see manpage).
watchdog_options="-s -v -c /etc/watchdog.conf"

You need to edit the /etc/watchdog.conf file to un-comment and so actually use the /dev/watchdog device access to the module. Otherwise the watchdog will not use the hardware and rely only on its internal code to soft-reboot a broken machine.
This configuration example sets the WDT timeout at 15 seconds. If you need a faster reboot, reduce the value of “watchdog-timeout”.

target
$ cat /etc/watchdog.conf
#ping                   = 172.31.14.1
#ping                   = 172.26.1.255
#interface              = eth0
#file                   = /var/log/messages
#change                 = 1407
 
# Uncomment to enable test. Setting one of these values to '0' disables it.
# These values will hopefully never reboot your machine during normal use
# (if your machine is really hung, the loadavg will go much higher than 25)
#max-load-1             = 24
#max-load-5             = 18
#max-load-15            = 12
 
# Note that this is the number of pages!
# To get the real size, check how large the pagesize is on your machine.
#min-memory             = 1
 
#repair-binary          = /usr/sbin/repair
#repair-timeout         =
#test-binary            =
#test-timeout           =
 
watchdog-device = /dev/watchdog
 
# Defaults compiled into the binary
#temperature-device     =
#max-temperature        = 120
 
# Defaults compiled into the binary
admin                   = root
interval                = 1
logtick                = 1
log-dir         = /var/log/watchdog
 
# This greatly decreases the chance that watchdog won't be scheduled before
# your machine is really loaded
realtime                = yes
priority                = 1
 
# Check if rsyslogd is still running by enabling the following line
pidfile         = /var/run/rsyslogd.pid
 
watchdog-timeout        = 15

Note : watchdog-timeout will generally determine after which watchdog failed to keep-alive, then it will trigger reboot.

For more configuration please follow link below. http://www.sat.dundee.ac.uk/psc/watchdog/watchdog-configure.html

on Ubuntu 14.04.x enable watchdog service status

In order to start watchdog service we need to append /etc/rc.local

target
service watchdog restart

on Ubuntu 16.04.x enable watchdog service status

In order to start watchdog service we need to create soft links of service as below.

target
sudo ln -s  /lib/systemd/system/watchdog.service /etc/systemd/system/multi-user.target.wants/watchdog.service
target
sudo systemctl enable watchdog.service
sudo systemctl start watchdog.service

Check for watchdog service is running successfully.

target
root@odroid64:~#
odroid@odroid64:~$ service watchdog status
● watchdog.service - watchdog daemon
   Loaded: loaded (/lib/systemd/system/watchdog.service; static; vendor preset:
   Active: active (running) since Wed 2016-06-22 01:52:23 EDT; 10s ago
  Process: 1384 ExecStopPost=/bin/sh -c [ $run_wd_keepalive != 1 ] || false (cod
  Process: 1959 ExecStart=/bin/sh -c [ $run_watchdog != 1 ] || exec /usr/sbin/wa
  Process: 1955 ExecStartPre=/bin/sh -c [ -z "${watchdog_module}" ] || [ "${watc
 Main PID: 1961 (watchdog)
   CGroup: /system.slice/watchdog.service
           └─1961 /usr/sbin/watchdog -s -v -c /etc/watchdog.conf
 
Jun 22 01:52:30 odroid64 watchdog[1961]: still alive after 6 interval(s)
Jun 22 01:52:30 odroid64 watchdog[1961]: was able to ping process 483 (/var/run/
Jun 22 01:52:31 odroid64 watchdog[1961]: still alive after 7 interval(s)
Jun 22 01:52:31 odroid64 watchdog[1961]: was able to ping process 483 (/var/run/
Jun 22 01:52:32 odroid64 watchdog[1961]: still alive after 8 interval(s)
Jun 22 01:52:32 odroid64 watchdog[1961]: was able to ping process 483 (/var/run/
Jun 22 01:52:33 odroid64 watchdog[1961]: still alive after 9 interval(s)
Jun 22 01:52:33 odroid64 watchdog[1961]: was able to ping process 483 (/var/run/
Jun 22 01:52:34 odroid64 watchdog[1961]: still alive after 10 interval(s)
Jun 22 01:52:34 odroid64 watchdog[1961]: was able to ping process 483 (/var/run/
lines 1-20/20 (END)

Once the watchdog demon is configured it tries to continuously reset the watchdog timer. When/if it fails to do it (because of unresponsive system), the timer will expire and the board will reboot.

Another way to test watchdog device is killing the watchdog demon after it has started.

target
root@odroid64:~#
root@odroid64:~# pkill -9 watchdog

To test watchdog daemon.

Be careful when using these commands.

The commands below will cause the kernel to crash.

Use caution when following these steps, and by no means use them on a production machine.

echo c > /proc/sysrq-trigger

This will force the Linux kernel to crash. If the watchdog works properly, it will reboot the system after 15 seconds.

target
root@odroid64:~# echo c > /proc/sysrq-trigger
[  133.451497] SysRq : Trigger a crash
[  133.451542] Unable to handle kernel NULL pointer dereference at virtual address 00000000
[  133.457380] pgd = ffffffc04cdae000
[  133.460742] [00000000] *pgd=0000000056721003, *pmd=0000000000000000
[  133.466956] Internal error: Oops: 96000046 [#1] PREEMPT SMP
[  133.472474] Modules linked in: fuse ir_lirc_codec ir_mce_kbd_decoder ir_sanyo_decoder ir_sony_decoder ir_jvc_decoder ir_nec_decoder ir_rc6_decoder lirc_dev ir_rc5_decoder meson_ir zram lz4_decompress lz4_compress meson_gpiomem gxbb_wdt ipv6 autofs4
[  133.494298] CPU: 0 PID: 1356 Comm: bash Not tainted 3.14.79-115 #1
#1
[  134.610802] Call trace:
[  134.613233] [<ffffffc001088e40>] dump_backtrace+0x0/0x128
[  134.618565] [<ffffffc001088f8c>] show_stack+0x24/0x30
[  134.623574] [<ffffffc001888f44>] dump_stack+0x88/0xac
[  134.628571] [<ffffffc001090018>] handle_IPI+0x1c0/0x1d0
[  134.633745] [<ffffffc00108143c>] gic_handle_irq+0x84/0x88
[  134.639091] Exception stack(0xffffffc05b133e30 to 0xffffffc05b133f50)
[  134.645475] 3e20:                                     5b130000 ffffffc0 0189d000 ffffffc0
[  134.653583] 3e40: 5b133f70 ffffffc0 01085138 ffffffc0 011075a0 ffffffc0 00000000 00000000
[  134.661690] 3e60: 741add14 ffffffc0 00000000 01000000 01de7000 ffffffc0 00006924 00000000
[  134.669798] 3e80: 1bf14d20 00069134 08568253 00000000 3b9aca00 00000000 5b133da0 ffffffc0
[  134.677905] 3ea0: 00000400 00000000 f5da50b0 0000007f 4c2ff636 ffbd4ebe 0003ab80 00000000
[  134.686013] 3ec0: 10535586 8c220af8 00300000 00000000 01232ce0 ffffffc0 915441a8 0000007f
[  134.694121] 3ee0: 00000038 00000000 5b130000 ffffffc0 0189d000 ffffffc0 01d59000 ffffffc0
[  134.702228] 3f00: 01c89a6c ffffffc0 01d583ac ffffffc0 01aa5270 ffffffc0 5b130000 ffffffc0
[  134.710335] 3f20: 02181000 00000000 01080220 ffffffc0 00000000 00000040 5b133f70 ffffffc0
[  134.718441] 3f40: 01085134 ffffffc0 5b133f70 ffffffc0
[  134.723445] [<ffffffc001083dac>] el1_irq+0x6c/0xd8
[  134.728192] [<ffffffc0011075a0>] cpu_startup_entry+0x238/0x288
[  134.733968] [<ffffffc00108fa5c>] secondary_start_kernel+0x11c/0x128
[  134.740176] CPU1: stopping
[  134.742851] CPU: 1 PID: 0 Comm: swapper/1 Tainted: G      D      3.14.79-115 #1
[  134.750093] Call trace:
[  134.752511] [<ffffffc001088e40>] dump_backtrace+0x0/0x128
[  134.757859] [<ffffffc001088f8c>] show_stack+0x24/0x30
[  134.762861] [<ffffffc001888f44>] dump_stack+0x88/0xac
[  134.767863] [<ffffffc001090018>] handle_IPI+0x1c0/0x1d0
[  134.773037] [<ffffffc00108143c>] gic_handle_irq+0x84/0x88
[  134.778384] Exception stack(0xffffffc05b12be30 to 0xffffffc05b12bf50)
[  134.784768] be20:                                     5b128000 ffffffc0 0189d000 ffffffc0
[  134.792876] be40: 5b12bf70 ffffffc0 01085138 ffffffc0 011075a0 ffffffc0 00000000 00000000
[  134.800984] be60: 741a1d14 ffffffc0 00000000 01000000 01de7000 ffffffc0 00006924 00000000
[  134.809091] be80: 1bf14d20 00069134 08568253 00000000 3b9aca00 00000000 5b12bda0 ffffffc0
[  134.817199] bea0: 000003ff 00000000 00000000 00000000 00000000 00000000 00000000 00000000
[  134.825306] bec0: 00000000 00000000 00000000 00000000 011fb1f8 ffffffc0 96171938 0000007f
[  134.833414] bee0: 00000000 00000000 5b128000 ffffffc0 0189d000 ffffffc0 01d59000 ffffffc0
[  134.841521] bf00: 01c89a6c ffffffc0 01d583ac ffffffc0 01aa5270 ffffffc0 5b128000 ffffffc0
[  134.849629] bf20: 02181000 00000000 01080220 ffffffc0 00000000 00000040 5b12bf70 ffffffc0
[  134.857735] bf40: 01085134 ffffffc0 5b12bf70 ffffffc0
[  134.862737] [<ffffffc001083dac>] el1_irq+0x6c/0xd8
[  134.867482] [<ffffffc0011075a0>] cpu_startup_entry+0x238/0x288
[  134.873260] [<ffffffc00108fa5c>] secondary_start_kernel+0x11c/0x128
[  134.879469] CPU3: stopping
[  134.882145] CPU: 3 PID: 0 Comm: swapper/3 Tainted: G      D      3.14.79-115 #1
[  134.889387] Call trace:
[  134.891805] [<ffffffc001088e40>] dump_backtrace+0x0/0x128
[  134.897152] [<ffffffc001088f8c>] show_stack+0x24/0x30
[  134.902154] [<ffffffc001888f44>] dump_stack+0x88/0xac
[  134.907156] [<ffffffc001090018>] handle_IPI+0x1c0/0x1d0
[  134.912331] [<ffffffc00108143c>] gic_handle_irq+0x84/0x88
[  134.917678] Exception stack(0xffffffc05b137e30 to 0xffffffc05b137f50)
[  134.924062] 7e20:                                     5b134000 ffffffc0 0189d000 ffffffc0
[  134.932170] 7e40: 5b137f70 ffffffc0 01085138 ffffffc0 011075a0 ffffffc0 00000000 00000000
[  134.940278] 7e60: 741b9d14 ffffffc0 00000000 01000000 00000418 00000000 00000014 00000000
[  134.948385] 7e80: 47195ee0 000657fb 08560d23 00000000 3b9aca00 00000000 5b137da0 ffffffc0
[  134.956492] 7ea0: ffffbf61 00000000 00000000 00000000 00000000 00000000 00000000 00000000
[  134.964600] 7ec0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
[  134.972708] 7ee0: 00000014 00000000 5b134000 ffffffc0 0189d000 ffffffc0 01d59000 ffffffc0
[  134.980815] 7f00: 01c89a6c ffffffc0 01d583ac ffffffc0 01aa5270 ffffffc0 5b134000 ffffffc0
[  134.988923] 7f20: 02181000 00000000 01080220 ffffffc0 00000000 00000040 5b137f70 ffffffc0
[  134.997029] 7f40: 01085134 ffffffc0 5b137f70 ffffffc0
[  135.002031] [<ffffffc001083dac>] el1_irq+0x6c/0xd8
[  135.006775] [<ffffffc0011075a0>] cpu_startup_entry+0x238/0x288
[  135.012554] [<ffffffc00108fa5c>] secondary_start_kernel+0x11c/0x128
GXBB:BL1:08dafd:0a8993;FEAT:EDFC318C;POC:3;RCY:0;EMMC:800;NAND:81;SD:0;READ:0;CHK:0;
TE: 99044
no sdio debug board detected
 
BL2 Built : 11:44:26, Nov 25 2015.
gxb gfb13a3b-c2 - jcao@wonton
 
Board ID = 8
set vcck to 1100 mv
set vddee to 1050 mv
CPU clk: 1536MHz
DDR channel setting: DDR0 Rank0+1 same
DDR0: 2048MB(auto) @ 912MHz(2T)-13
DataBus test pass!
AddrBus test pass!
Load fip header from SD, src: 0x0000c200, des: 0x01400000, size: 0x000000b0
Load bl30 from SD, src: 0x00010200, des: 0x01000000, size: 0x00009ef0
Sending bl30........................................OK.
Run bl30...
Load bl301 from SD, src: 0x0001c200, des: 0x01000000, size: 0x000018c0
Wait bl30...Done
Sending bl301.......OK.
Run bl301...
D, src: 0x00020200, des: 0x10100000, size: 0x00011130
 
 
--- UART initialized after reboot ---
[Reset cause: unknown]
[Image: unknown, amlogic_v1.1.3046-00db630-dirty 2016-08-31 09:24:14 tao.zeng@droid04]
bl30: check_permit, count is 1
bl30: check_permit: ok!
chipid:Load bl33 from SD, src: 0x00034200, des: 0x01000000, size: 0x00073510
 ef be ad de d f0 ad ba ef be ad de not ES chip
[0.213749 Inits done]
secure task start!
high task start!
low task start!
NOTICE:  BL3-1: v1.0(debug):4d2e34d
NOTICE:  BL3-1: Built : 17:08:35, Oct 29 2015
INFO:    BL3-1: Initializing runtime services
INFO:    BL3-1: Preparing for EL3 exit to normal world
INFO:    BL3-1: Next image address = 0x1000000
INFO:    BL3-1: Next image spsr = 0x3c9
  • odroid-c2/application_note/software/watchdog_timer.txt
  • Last modified: 2018/03/16 17:47
  • by moon.linux