Recover from Unexpected Shutdowns with Lassie (Shutdown Watchdog)

Traducciones al Español
Estamos traduciendo nuestros guías y tutoriales al Español. Es posible que usted esté viendo una traducción generada automáticamente. Estamos trabajando con traductores profesionales para verificar las traducciones de nuestro sitio web. Este proyecto es un trabajo en curso.
Create a Linode account to try this guide with a $ credit.
This credit will be applied to any valid services used during your first  days.

Linode Compute Instances have a featured called Lassie (Linode Autonomous System Shutdown Intelligent rEbooter), also referred to as the Shutdown Watchdog. When this feature is enabled, a Compute Instance automatically reboots if it ever powers off unexpectedly.

Shutdown Recovery Behavior

The Shutdown Watchdog feature detects when a Compute Instance is powered off and checks if that directive came from the Linode platform (such as the Cloud Manager or Linode API). If the power off command did not originate from the Linode platform, the shutdown is considered unexpected and the Compute Instance is automatically powered back on.

Note
Shutdown Watchdog can power back on a Compute Instance up to 5 times within a 15 minute period. If there is a recurring issue that is causing 6 or more shutdowns within this time period, the instance remains powered off until it is manually powered back on. This is to prevent endless reboot loops if there is an issue with the internal software of a Compute Instance.

Enable (or Disable) Shutdown Watchdog

By default, Shutdown Watchdog is enabled on all new Compute Instances. If you wish to disable or re-enable this feature, follow the instructions below:

  1. Log in to the Cloud Manager and navigate to the Linodes link in the sidebar.

  2. Select the Linode Compute Instance that you wish to modify.

  3. Navigate to the Settings tab.

  4. Scroll down to the section labeled Shutdown Watchdog.

  5. From here, click the corresponding toggle button to update this setting to the desired state, either enabled or disabled.

Reasons for an Unexpected Shutdown

An unexpected shutdown is when a Compute Instance powers off without receiving a power off command from the Linode platform (such as one issued by a user in the Cloud Manager or API). In general, this is caused within a Compute Instance’s internal system or software configuration. The following list includes potential reasons for these unexpected shutdowns.

  • A user issues the shutdown command in the shell environment of a Compute Instance. In Linux, a system can be powered off by entering the shutdown command (or other similar commands) in the system’s terminal. Since Linode has no knowledge of internal commands issued on a Compute Instance, it is considered an unexpected shutdown.

  • Kernel panic: A kernel panic can occur when your system detects a fatal error and it isn’t able to safely recover. Here is an example of a console log entry that indicates a kernel panic has occurred:

    Kernel panic - not syncing: No working init found.
    
  • Out of memory (OOM) error: When a Linux system runs out of memory, it can start killing processes to free up additional memory. In many cases, your system remains accessible but some of the software you use may stop functioning properly. OOMing can occasionally result in your system becoming unresponsive or crashing, causing an unexpected shutdown.

    kernel: Out of memory: Kill process [...]
    
  • Other system crashes, such as a crash caused by the software installed on your system or a malicious process (such as malware).

Note
The Shutdown Watchdog feature never causes a Compute Instance to shut down and only ever powers on an instance if it detects an unexpected shutdown.

Investigate the Cause of a Shutdown

The underlying cause of these issues can vary. The most helpful course of action is to review your system logs.

  1. Open the Lish console. This displays your system’s boot log and, if your system boot was normal, a login prompt appears. If you do not see a login prompt, look for any errors or unexpected output that indicates a kernel panic, file system corruption, or other type of system crash.

  2. Log in to your system through either SSH or Lish and review the log files for you system using either journald or syslog. For systems using systemd-journald for logging, you can use the journalctl command to review system logs. See Use journalctl to View Your System’s Logs for instructions.

    • journalctl -b: Log entries for the last system boot
    • journalctl -k: Kernel messages

    For systems using syslog, you should review the following log files using your preferred text editor (such as nano or vim) or file viewer (such as cat or less).

    • /var/log/syslog: Most logs as recorded by syslog.
    • /var/log/boot.log: Log entries for the last system boot
    • /var/log/kern.log: Kernel messages
    • /var/log/messages: Various system notifications and messages typically recorded at boot.

    You may also want to review log files for any other software you have installed on your system that might be causing these issues.

Note
Unexpected shutdowns are primarily caused by issues with the internal software configuration of a Compute Instance. To investigate these issues further, it is recommended that you reach out to your own system administrators or on our Community Site. These issues are generally outside the scope of the Linode Support team.

File System Corruption

In some cases, unexpected shutdowns can cause file system corruption on a Compute Instance. If an error message (such as the one below) appears within your console logs, your file system may be corrupt or otherwise be in an inconsistent state.

/dev/sda: UNEXPECTED INCONSISTENCY; RUN fsck MANUALLY.

In cases like this, it is recommended that you attempt to correct the issue by running the fsck tool in Rescue Mode. See Using fsck to Find and Repair Disk Errors and Bad Sectors for instructions.

This page was originally published on


Your Feedback Is Important

Let us know if this guide was helpful to you.


Join the conversation.
Read other comments or post your own below. Comments must be respectful, constructive, and relevant to the topic of the guide. Do not post external links or advertisements. Before posting, consider if your comment would be better addressed by contacting our Support team or asking on our Community Site.