Linux gets frozen, what do you do?

TacticalCoder · on March 14, 2014

I'm using Linux since 20 years and used to do this all the time, back when Linux --or more specifically X Window system-- was less stable than today. I'd make sure to always compile the kernel with MagicSysRQ (you need it for the combo mentioned in TFA).

It wasn't even mandatory to restart the system: the trick was to use first MagicSysRQ and then issue a "vgareset" (IIRC I couldn't even see what I was typing, but the command was taken into account) and then, miracle: I could unlock frozen X Window sessions (more specifically: kill X, "reset" the GPU and then restart a new X Window session).

Note that very often X is fine: it's just X which is frozen. Heck, if you have another machine on your LAN and allow SSH in, you can SSH and kill X / vgareset without needing MagicSysRQ.

But since quite a few years X is so stable that such hackery ain't needed anymore. Moreover if I recall correctly "vgareset" did only exist for 32 bits system (at least at one point). Nowadays my Linux workstation regularly reaches 6 months of uptime (there are only very rarely know remote root exploits mandating a kernel upgrade) so I've kinda "forgot" how MagicSysRQ works ^ ^

arethuza · on March 14, 2014

When I read that I thought of "Stop A" on Sun workstations - and indeed that is mentioned on the wikipedia page:

http://en.wikipedia.org/wiki/Magic_SysRq_key

Mind you I don't think I ever used "Stop A" on a Sun for anything constructive...

dice · on March 14, 2014

>Mind you I don't think I ever used "Stop A" on a Sun for anything constructive...

Back at the dot-com I worked at, we used to have a SPARCstation 5 that we used for SPARC-II arch builds. It was a nice machine, but it had a dead NVRAM battery so it would lose its configuration every time we lost power in the building (which was a lot, because rolling blackouts used to be a thing in California).

Anyway, an interesting quirk about the SPARCstation machines is that the MAC address for the NIC is stored in NVRAM, and when you lose the NVRAM settings it defaults to ff:ff:ff:ff:ff:ff AKA the broadcast address. So by default, on boot, this machine would start sending out DHCP requests and other network traffic with a source address of the broadcast address. The switches we were using did not like that, and would start flooding all of their ports with traffic. The only way we found to fix it was to reboot the switches.

So, there is at least one constructive use for "Stop A": you can use it to configure a MAC address on your SS5 so that it doesn't inadvertently bring down the whole network in a massive broadcast storm.

mturmon · on March 14, 2014

After stop-A dropped you into the monitor, you could boot into single-user mode with

  >b -s

and then, at the sh prompt, as the single (super) user, edit password

  # ed /etc/passwd

to add in a second root account for yourself. That could be pretty helpful.

gaius · on March 14, 2014

I thought Stop-A launched the Forth interpreter ;-)

jmj42 · on March 14, 2014

Sort of. Stop-A drops you into the firmware prompt (OpenBoot), suspending the OS in the process. OpenBoot, of course implements a forth interpreter shell.

Ahh, the good old days... I once implemented a nice little boot device selecting utility in for for some of my Suns. All written in Fourth, and executed by the firmware on every boot.

arethuza · on March 14, 2014

Actually, I did a lot of RPN development on Suns. Not in Forth but in PostScript within NeWS/OpenWindows developing front ends for large scale Lisp applications.

matt-attack · on March 14, 2014

Yes, I believe that's true. Stop-A pops you to a prompt (which I believe is all running Forth) where you can do all kinds of cool stuff. It completely suspends the OS. I remember I was once able to TFTP in a custom boot logo at that gets loaded right into the Bios.

matt_heimer · on March 14, 2014

I've also seen one graphical app hang X. If you Ctrl-Alt-F1 and switch to your getty session you can sometimes kill the one app and return to your X session.

michaelmior · on March 15, 2014

> Note that very often X is fine: it's just X which is frozen

You mean "Linux" is fine?

jimmaswell · on March 14, 2014

Slightly off-topic, but was your first language French? The "using since 20 years" seems like something someone with a French background would say.

javanix · on March 14, 2014

Sysrq is kinda neat.

You can force a kernel panic by 'echo c > /proc/sysrq' if you really want.

graylights · on March 14, 2014

That is disabled in some distros, it can be toggled via /proc/sys/kernel/sysrq

dllthomas · on March 14, 2014

I remember those days, though ctrl-alt-backspace usually solved things for me.

nly · on March 14, 2014

It's a lot easier just to reboot and let fsck patch up your filesystem... flushing buffers isn't going to help you get back the last paragraph of your PhD thesis if Libre Office hasn't called write().

In any case, in ~8 years of using Linux, I don't think I've had a freeze that was the kernel and not just X.

hdevalence · on March 14, 2014

One easy way to get kernel lockups is to use fglrx.

But agreed on 'just reboot': you should be using a resilient filesystem anyways, and MagicSysRq only works when it's enabked beforehand, and only for some kinds of lockups, and if the data is in the frozen application, syncing the disks isn't going to help.

cdi · on March 15, 2014

>One easy way to get kernel lockups is to use fglrx.

Yup. Changed my GPU to nvidia, no more core lockups.

hdevalence · on March 16, 2014

For me, I kept the hardware, and switched to the open-source driver.

latj · on March 14, 2014

This is the most elaborate scheme to raise one's relative global uptime I have ever seen.

slashdotaccount · on March 14, 2014

And what if it's not your PhD thesis in Libre Office but a busy database server which can easily get corrupted if you don't flush.

> I don't think I've had a freeze that was the kernel and not just X.

Device drivers sometimes have bugs, especially when the device is not working properly (I had system freezes when there was a misbehaving capacitor on the graphics card).

zorlem · on March 14, 2014

> And what if it's not your PhD thesis in Libre Office but a busy database server which can easily get corrupted if you don't flush.

No properly configured and working ACID compliant RDBMS should lose any data when the server is reset or stopped. If it does, then it is either a problem with the hardware, OS, configuration or the RDBMS itself. The application must also be able to handle the DB disappearing, though. Sadly this is often not the case.

dalore · on March 14, 2014

You mean lose data after it's been committed. I can send something to the database just as it's dying and it's been lost.

fulafel · on March 14, 2014

SQL commit is not allowed to finish before the data is safely on disk (that's the D in ACID).

icebraining · on March 14, 2014

That's assuming that the RDBMS can tell. The disks may lie:|

Filligree · on March 14, 2014

You should strive not to use such disks in your server. A machine reset won't power off the disks, though.

fulafel · on March 14, 2014

Or the CPU or the memory or the OS...

plorg · on March 14, 2014

Thus the 's' in reisub?

kelnos · on March 15, 2014

The disk may lie. The 's' will get the OS to send everything out to the disk. Actually writing it to the platter (or flash part, or whatever) is at the disk's discretion.

fulafel · on March 17, 2014

It's at the disk's discretion as far as the laws of physics are concerned, but this would a severely broken disk prone to losing data and if it was a major server disk vendor, the vendor would take a pretty serious hit to its reputation.

dalore · on March 18, 2014

Yes, but my point was that I could send data to an ACID compliant server, and kill it before the commit happened and data will be lost. Just trying to point out to the parent poster that sending is not enough, you need to wait for the commiting.

garblegarble · on March 14, 2014

> And what if it's not your PhD thesis in Libre Office but a busy database server which can easily get corrupted if you don't flush.

Wouldn't that make it an unsafe database server?

slashdotaccount · on March 14, 2014

Yes, I agree, but just several weeks ago I had to recover a MySQL InnoDB database after a power problem. MySQL is very popular, and it turns out it is an unsafe database server according to your definition. Well maybe it is.

Edit:

Besides, hard drives have write caches, and they can report a successful write operation to the OS when the data is still in its cache physically.

marcosdumay · on March 14, 2014

Yep, MySQL is an unsafe database. The fact that it's popular just goes to show how powerful marketing is on th emind of most people.

Hard drivers are supposed to flush their caches before they report the end of a flush operation to the OS (some flush into flash, but they flush). If your does not, it's defective. Go ahead and make use of the warranty.

deelowe · on March 14, 2014

I don't believe MySQL is ACID compliant. You may have just ran into one of the reasons how.

cookiecaper · on March 14, 2014

MySQL is only ACID compliant under a very specific set of configuration parameters. This makes it even more dangerous than an RDBMS with binary acidity.

fulafel · on March 14, 2014

Re HD lying, this is rare except with dubious cheap flash sticks. The caching command set is well defined and operating systems know how to issue the relevant SCSI/SATA commands. It's critical for correct functioning of journaling filesystems such as NTFS and Ext4.

spaznode · on March 14, 2014

"sync; sync; sync; shutdown -r now"

I remember some sort of magic invocation like that years ago for our medical "device" product we had to manage hundreds of remote node instances of. Something along those lines. I don't remember why the developer (Gabe) came up with sync three times being magic number. Maybe three times was just paranoia. =)

thejosh · on March 14, 2014

"sync, sync, sync your file systems gently down the shutdown -r..".

mikeash · on March 14, 2014

Back in the bad old days of pre-UNIX Macs, it was a common troubleshooting step to reset your PRAM. This is battery backed up RAM that holds some basic settings, and if it got corrupted somehow it could cause weird problems. You'd reboot while holding down command, option, P, and R, then wait for the boot chime to sound a second time indicating that it had been reset, then release the keys and boot normally.

Somehow this advice got mutated so that you'd keep holding the keys until you heard two boot chimes (thus resetting the stuff twice). And then it started to grow. Three was common. Some people would advise more. I'm pretty sure that doing it more than once never helped anything, but there we are.

(The cmd-opt-P-R sequence still works on modern Macs and I actually used it to resurrect a machine that wouldn't start up just a month ago, but it's far less frequently needed now.)

celebril · on March 14, 2014

It's the same with the battery stats resetting and the Dalvik cache wiping these days in Android land. You do it three+ times.

Or the "Repair permissions" thing in OS X. You do it several times as well.

It's like whenever there's this one-step fix thing that a system utility does, the Common Man will interpret it as needing to repeat 3+ times in order for it to be effective.

nzp · on March 14, 2014

One for the Father, one for the Son, and one for the Holy Ghost.

indrax · on March 14, 2014

Possibly related: http://brad.livejournal.com/2116715.html

>Run it and be amazed how much your disks/raid/OS lie. ("lie" = an fsync doesn't work)

>It seems everything from PATA consumer disks to high-end server-class SCSI disks lie like crazy. Yes, that includes SATA there in the middle. I'll discuss fixing your storage components in a second.

slight · on March 14, 2014

I believe the thinking is because sync isn't instant, especially on older slower hard drives, having to type it again give it time to actually complete.

halfasleep · on March 14, 2014

sync will block until it completes.

derekp7 · on March 14, 2014

Back in the day (old Unix), the sync call would return right away, and the kernel would sync in the background. Unless there was a current background sync happening -- then sync would block until the first one finished, which is why you would have two sync's in a row. The third sync was thrown in just for luck.

mcguire · on March 15, 2014

I picked up the "sync three times" thing from an AIX kernel developer, who did it from before AIX had a "shutdown". (Yes, he would do "sync, sync, sync, power-off".) My theory was that using it three times gave the system time to actually sync the data.

linker3000 · on March 14, 2014

Somewhere in the dim and distant past I was told (or read) to use:

"sync; sync; sync; halt"

Presumably so you had thinking time before automatically restarting a possibly sick system.

jmj42 · on March 14, 2014

This is likely a sun thing. Halt on (SPARC) Solaris does not shutdown the OS. It issues a reset command to the firmware. The halt command on Solaris is roughly equivalent to pressing the reset button on a PC.

When your Sun is particularly hosed we used sync;sync;sync; halt to reset and (hopefully) not lose any data (sync forces OS write buffers to purge)

bad_user · on March 14, 2014

I wouldn't advise other people doing this as the filesystem can get seriously broken under certain circumstances.

For example, on a Linux laptop with the hard-drive encrypted with dm-crypt, I simply lost access to my drive due to repeated hard reboots. I don't know if I could have recovered my data from it or not, but after repeated attempts of googling for the error message and following advice I simply gave up and later reinstalled everything from scratch (it's a good thing I constantly make backups ;-)).

On Linux REISUB has been my friend.

slug · on March 14, 2014

when doing kernel development on embedded systems or on a real host (not within a virtual machine), it's sometimes useful to get system information when it crashes. the cool thing is that it's also possible to send the sysrq through a console serial port by first sending a break command.

anyway, someone already mentioned a few mnemonics, I learned one, a quick googling lead me here:

http://fosswire.com/post/2007/09/fix-a-frozen-system-with-th...

sz4kerto · on March 14, 2014

I had many because of a not 100% stable hard drive + mdadm. Kernel panics during prolonged periods of high HDD load.

freiheit · on March 14, 2014

Or, if you're on anything approaching a modern system:

1. Tap the power button (one single brief tap)

2. Wait a minute

3. If the system hasn't already shut down or isn't obviously in the process of shutting down or you just get impatient: hold the power button down

4. Now that the system is off: tap the power button.

Modern systems (whether server, desktop tower or laptop) have a "soft" power button, that when tapped briefly sends a signal to the OS. Most flavors of linux are configured so that receiving that signal initiates a proper shutdown, just like "shutdown", "poweroff" or a GUI shutdown. All of this has been true for quite a few years now.

If the system is so locked up that the power button initiated shutdown doesn't work, you might as well just pull the power, which holding the button down for a second or two will do. Even if this happens, you're probably using a journaling filesystem that comes back up cleanly.

None of that will help with unsaved data in an application, because you almost certainly have to tell the application to save, and if X has locked up there is no way to do that. A clean shutdown might signal the app with a terminate signal and allow for data to be saved before doing a hard kill.

Nursie · on March 14, 2014

This is not for the situation in the article, but a slightly different one. You have a remote server, there's an ssh session still open to it but it's somehow lost access to all its mount points including / You have nothing to work with but bash builtins and /proc, you're 100 miles away and you need to get it up and running again NOW. Emergency reboot -

    echo 1 > /proc/sys/kernel/sysrq
    echo b > /proc/sysrq-trigger

herokusaki · on March 14, 2014

>it's somehow lost access to all its mount points

How could that happen and leave your system in a state where it is still accessible and can be fixed by a reboot?

If you think you might face that kind of trouble you should keep a copy of Busybox and whatever other tools you might need on a RAM drive. You'd have an opportunity to figure out if rebooting would lead to a usable system.

Nursie · on March 15, 2014

The machine would occasionally stop responding completely. I got into the habit of leaving an open ssh session going from another box so I could try and poke around. The root drive was on a (very fast) usb 3.0 stick on an internal header on an add-in card.

There was a problem with the card or the driver as every so often something would go wrong and everything would stop responding again. The shell I left open revealed almost nothing as the root drive was gone, but it could be used to reboot the machine (thanks to the trick above), which would then be good again for another few days.

I now have / on a proper SSD...

peterwaller · on March 14, 2014

Do a `sync` first! Also This is quite like yanking the power chord and plugging it back in.

https://en.wikipedia.org/wiki/Sync_(Unix)

joncp · on March 14, 2014

http://www.folkpeople.com/guitar-lesson/pic/power_chord_tabl...

pit · on March 14, 2014

At first, I wasn't sure why you posted this, but I was too busy rocking out to mind.

Nursie · on March 14, 2014

Sync never returns, all filesystems have been lost.

Yes, it's exactly like pulling and replugging, which was exactly what I needed!

--edit-- is sync even a bash builtin? Looks like it's /bin/sync on my systems. / had been lost (it was on a usb stick on an internal header on an unreliable usb3 card, I later found out.)

--edit 2-- if you meant also echoing s to the sysrq-trigger, it seemed to kill the session

eikenberry · on March 14, 2014

Another way to deal with X freezing.

Use acpi to set your power button to switch you to console. Just edit your /etc/acpi/power.sh and change it to...

    #/bin/sh
    /bin/chvt 1

The acpi system is separate from the X/keyboard interface and still works. So this way you can switch to the console even if your keyboard is locked.

Note you can use other keys as well. On my laptop I use my "thinkvantage" key for this. Any key that has an associated acpi event will work.

ingenium · on March 15, 2014

Which script in /etc/acpi/ corresponds to the thinkvantage key?

eikenberry · on March 15, 2014

  $ cd /etc/acpi
  $ cat events/thinkvantage-button 
  # Thinkpad 'ThinkVantage' button
  event=button/prog1
  action=/etc/acpi/thinkvantage.sh
  $ cat thinkvantage.sh
  #!/bin/sh            
  /bin/chvt 1

Hope this helps.

danparsonson · on March 14, 2014

Handy that 'reisub' is 'busier' backwards

chris_wot · on March 14, 2014

You rock.

Yuioup · on March 14, 2014

This sounds like like a feature that needs to be automated, probably with a very handy key combination. I don't know ... maybe with Ctrl+Alt+Delete?

i386 · on March 14, 2014

Agreed. Why isn't this default behaviour? Is there some negative aspect that's not obvious?

marcosdumay · on March 14, 2014

That's because the default behaviour of CTRL + ALT + DEL is to issue a proper reboot.

That, and because X consumes those keys, making them useless when X goes bad (what is about all the times that Linux freezes and it's not hardware fault).

Yuioup · on March 14, 2014

Okay maybe another key combination?

powerbook5300CS · on March 14, 2014

It's possible that their computer is frozen enough to need this but it's more probable that this person hasn't figured out how to disable DontZap in newer versions of xorg. You need to disable it in your xorg.conf in order to have ctrl-alt-bksp work again:

    Section "ServerFlags"
        Option "DontZap" "false"
    EndSection

snogglethorpe · on March 14, 2014

Note: you might want to think twice about doing this if you use Emacs...

(...from painful experience...)

jcastro · on March 14, 2014

Alt + SysRq + k will also do this if you don't feel like writing a xorg.conf.

mpyne · on March 14, 2014

That was one of my favorite Magic SysRq keys. It's the Linux analog to Windows's Ctrl-Alt-Del.

It's the "Secure Access Key" (SAK): You press that key and it kills all programs hooked to the TTY (incl. X, in your case) and displays a proper login prompt so that you can know what you're about to login to was run by the system and not a clever malware trying to steal your password.

Stolpe · on March 14, 2014

Raising Elephants Is So Utterly Boring

namarkiv · on March 14, 2014

Raising Skinny Elephants Is Utterly Boring

jeffcox · on March 14, 2014

Reboot Even If System Utterly Broken

notfoss · on March 15, 2014

So you find it more interesting to keep feeding fat elephants :P

jedanbik · on March 15, 2014

Linux gets frozen, what do you do? I remember to press nine buttons on my keyboard, duh. and why? Because a hard reset will "make you a lot of problems?" I am not convinced, and I am not impressed.

This is a laughably awful shortcut for anything. Why make the supposedly safe stuff so out of reach? I would have to use a separate computer to look up how to type this command. Even if I knew the command off the top of my head, what if my keyboard lacks a PrtSc (SysRq) button?

When folks are pointing to Windows for design cues, the open source community ought to reconsider some aspects of how it has implemented certain features.

clarry · on March 15, 2014

How do you perform the equivalent operations on Windows?

jedanbik · on March 15, 2014

It isn't equivalent, but it is approximate; folks were mentioning ctrl-alt-dlt. Three keys instead of nine keys represents a 66% increase in efficiency.

clarry · on March 15, 2014

I would've said it was approximate enough in whe Windows 3.1 days when it could perform a soft reboot. But since Windows 2000, it has had completely different functionality (sometimes allowing you to log in, sometimes giving you a process manager oslt), not even remotely approximate to the magic sysrq sequence presented here. Do tell me if this has changed since Windows Vista.

Linux, by the way, can also respond to Ctrl-alt-del, and depending on the desktop it might perform functions similar to Windows.

TorKlingberg · on March 14, 2014

Before this try Ctrl+Alt+F1. You might get a full-screen terminal where you can log in and kill the offending process. Ctrl+Alt+F7 to get back to the desktop.

I had to use this just today when ddd stole all other mouse and keyboard input.

gnarbarian · on March 14, 2014

I used to use Ctrl+alt+backspace but it doesn't work anymore on many desktop environments.

anon4 · on March 14, 2014

That's because it's turned off by default on modern X.

You can enable them in the keyboard options of your DE, or directly with setxkbmap on startup (sorry can't look up the exact options right now) if you're not using one.

talles · on March 17, 2014

This. It's pretty rare cases in that I have to actually restart the system.

markhahn · on March 14, 2014

if your X is hung (ie, NOT "linux"), why not switch to a VC and fix it? yes, it's good to know about magic-sysrq, but most people don't even understand the layering of X on a VC, and the fact that other VCs are available.

of course, fixing why your X hung would be wise too, since for any normal distro and mainstream hardware, that's just not going to happen. if you really want to be fubar, have that f@cking POS systemd die on you... (yes, on-topic, since only sysrq saves the day.)

stonogo · on March 14, 2014

Because X11 has trashed your video interface. You can't switch to a virtual console. X11 has ring0 access to the video card, to take advantage of DRM.

laumars · on March 14, 2014

More often than not, it's your WM, DE or a GUI application eating up your RAM (Chromium used to be deadly for this) that's causing the freeze.

In more than a decade of running Linux on my desktop (and at work), I genuinely can't think of a single instance when I've not been able to pull a virtual console from a frozen desktop (albeit it often performs laggy).

stonogo · on March 14, 2014

Thanks for trying to diagnose my computer over a web forum, but I am competent enough to identify when and how my computer has failed. I'm not interested anecdotes from users. I develop video drivers. Am I allowed to experience these lockups now?

laumars · on March 14, 2014

Firstly, I'm not just a Linux user. Like you, I'm a developer too. Given the demographic of this forum, it would pay for you not to assume that you're the only one on here that works in the industry (in fact I even hinted at that when I said I use Linux at work - but never mind)

Secondly, I was making a general comment about peoples desktops rather than talking specifically about your example (given the lack of details you posted, it would be insane of me to assume I could diagnose your fault with any precision). My point was that generally when people think their computer has locked up / X has crashed, it's actually one of the items I mentioned earlier that's at fault.

The snappy reply was appreciated though </sarcasm>. But given just how unusual your circumstances are (assuming what you said is true) and how much you seem to hate it when others discuss these topics with you; it might be an idea if you clarify your position a little better the next time such a topic arises. Like maybe saying "my crashes aren't typical because I'm a kernel developer, but.....". This way people don't accidentally post something that hits one of those raw nerves you have and it saves us all from a lot of unnecessary condescension.

andrewflnr · on March 14, 2014

When I learned this trick, it was because an X input driver was locking up[0]. The virtual consoles were completely inaccessible. Occasionally it would show me a blank screen for my efforts, but more often not even SysRq would work.

[0] I'm actually not 100% sure about that. Thank God I don't have to worry about it anymore.

herokusaki · on March 14, 2014

Will Wayland be different?

fulafel · on March 14, 2014

KMS fixed this.

jjsz · on March 14, 2014

If most people don't know about this then it should be in Arch Linux's Unofficial Beginner's Guide.

mikeweiss · on March 14, 2014

If its a Linux Desktop alot of the time its just the X session that is frozen. Ctrl ALT F2 and sudo killall X will simply restart X

colechristensen · on March 14, 2014

Something very important to note which is missing in the comments and the original article: waiting!

Each one of these commands (r, e, i, s, u) takes or can take a few seconds to complete successfully, so let them do their thing.

ttsiodras · on March 16, 2014

Indeed!

In particular, if you have HD-activity LEDs, watch and wait until they stop blinking - after the 's' (sync) key.

blcknight · on March 14, 2014

Only if you have magic SysRq keys enabled in the kernel... are kernels shipping these days with this by default?

mistercow · on March 14, 2014

I just tried it on my Ubuntu 12.04 install, and it worked. I definitely didn't change any kernel config options.

ominous_prime · on March 14, 2014

They used to at least. I had to used this quite a bit years ago when working on RHEL5 systems with new/experimental drivers.

Aardwolf · on March 14, 2014

When visiting the site I get:

unused The server encountered an internal error or misconfiguration and was unable to complete your request. Please contact the server administrator, (etc...)

Did their Linux get frozen?

[edit] it works again

manca · on March 14, 2014

Yeah, being first on Hackers News is not good for your box :)

AaronFriel · on March 14, 2014

What about when an SSH terminal gets locked up? If I'm running something and I want to abort it, I often find I can't because of the kernel thrashing. Often I see a nontrivial amount of CPU time given to kswapd but sometimes not, and more often I don't have any monitoring up to see what's going on, or the terminal I'm monitoring is also nonresponsive.

This is especially a problem with Docker, I've found, because when I've got a container thrashing the system like this, I can't access the host or any other containers.

Is this because I'm running single-core instances and the problem goes away once I have more than one thread available? Or should I be restricting workloads to a block device not used for caching? Does anyone else experience this?

derefr · on March 14, 2014

Docker can set CPU, memory, and network-bandwidth quotas for containers. If you can guarantee that the aggregate total of each quota is less than the amount of that resource available to the host, then in the worst case you'll always have some of each left for a "rescue" container.

This throws away one of the neat advantages of containers compared to VMs, though, which is that you can overcommit (or "thin-provision", if you want to be charitable) the host's resource allocation, since it's extremely unlikely that all your services will balloon in resource-consumption simultaneously.

A less guaranteed, but more economical strategy, is to just set quotas on each container such that if it individually starts using more than, say, 80% of the host's resources for an extended period, it'll be terminated. This doesn't save you from a bad interaction between containers that makes everything explode in parallel (e.g. all your containers attempting to reconnect to a stuck service with no backoff) but it should save you in the majority of cases where each host has a heterogeneous container-load, and horizontal scale happens between hosts rather than internal to them.

(The dev-side solution to this problem, though, is to just set up your software architecture such that any host that gets into such a state can hard-reboot, without you losing anything of consequence. http://12factor.net -type containers excel at this, and it looks to be the strategy https://coreos.com embraces as well.)

AaronFriel · on March 14, 2014

Docker supports (supported?) what LXC supported, which was relative CPU shares. I am not sure it would help if it supported something else (like quota minimum/maximum). When the IO subsystem is hit with a memory intensive workload that thrashes swap, all your guarantees go out the window. From my perspective (staring at top while an interactive terminal is not responding), it's the kernel that owns the CPU shares being used, and the IO volume, and it looks like the container is "almost idle" because most of the CPU time is being spent by the kernel.

I don't know if it's that something Docker can fix. It might be something kernel devs can fix, though.

deathanatos · on March 15, 2014

The most common reason I've had to do this is that I've exhausted RAM and swap, and the system is disk thrashing for spare pages. In some setups, Linux does not like to OOM a process when it really should. (That's adjustable, by the way.)

Also, you don't need to go all the way through it. In my above example, MagicSysReq+RE or MagicSysReq+REI is enough usually. E and I are for sending SIGTERM and SIGKILL, so all the processes are now dead, and we have plenty of RAM. Sometimes it does take a bit of waiting: the system is disk thrashing. That said, you can then just resume, though you'll likely need to switch to a VT and restart X and other things. (But you get to keep your uptime.)

There's also F, which invokes the OOM killer, though I often forget about it.

mikecarlton · on March 14, 2014

What do you mean 'linux gets frozen'? I literally can't remember the last time a linux box froze on me. As others said, I would just reboot it now and let fsck pick up the pieces.

That's going to be faster than the time spent to search google for the magic incantation.

thanatropism · on March 14, 2014

This would be where keyboards with hardware-level macro support (like the Kinesis Advantage) come to be useful.

... to be honest, I can't think of another scenario. I have an Advantage for every place I spend extended computer time in -- it's too big and clunky to carry around...

jerf · on March 14, 2014

You do not necessarily want to type that too quickly; sync may take a moment.

blueblob · on March 14, 2014

The only way my linux machine has ever locked up is the usb driver seemed to crash in which case I couldn't do this anyways. The only other way I could have possibly restarted is to ssh to my machine from my phone, I opted for the power button instead.

cordite · on March 14, 2014

Hmm, I wonder if something like this extends to mac. Sometimes the login manager crashes and I can't do anything. Music still playing, but no mouse or keyboard reaction. I can ssh in, but I don't know what to run.

Any ideas?

girvo · on March 14, 2014

I learnt about this a few months back... and never had to use it. Seriously, on my AMD APU netbook running Mint, I've had no freezes whatsoever, despite the hardware being a bit odd. A neat trick though.

donniezazen · on March 14, 2014

Can't get it to work on Fedora 20. Are you suppose to press and hold Ctrl, Alt, PrtSc(SysRq), r, e, i, s, u, and b all together? Does it have to be in that order?

pbhjpbhj · on March 14, 2014

Switch to a terminal (Ctrl+Alt+F2) and do AltGr+PrtScr+H - if you've got Magic SysReq enabled that will print the help information (a list of commands with their (k)eycodes) to the terminal. If it doesn't print anything then you need to RTFM to find out how to enable it. AFAIR I had to do this shortly after I switched to Kubuntu as it was disabled by default - I often had to use Ctrl+Alt+Backspace or AltGr+SysRq+K to kill X as there's a hardware bug on my Nvidia graphics card.

My 8yo has a mnemonic for it something about Elephants and Umbrellas but I've always done reissub (now with extra sync'ing power). For a time the process for him was login, Alt+F2, konsole, Ctrl+R, mine, Enter, AltGr+SysRq+R, E, I, S, U, B; then repeat the first part and you're finally ready to play Minecraft!

donniezazen · on March 15, 2014

If AltGr is right Alt, it does print a bunch of stuff. On my system, SysRq is under PrtSc. Does that mean I have to do Fn+PrtSc to get SysRq or just PrtSc is enough.

pbhjpbhj · on March 17, 2014

If it prints out without presssing Fn then you're doing it right. Laptop keyboards are weird but usually it's just the PrtScr key as it acts as SysRq when using the Alt[Gr] modifier.

It should print something like:

"SysRq : HELP : loglevel(0-9) reboot(b) [...]"

noselasd · on March 15, 2014

You'll have to enable it first (http://fedoraproject.org/wiki/QA/Sysrq )

    sysctl -w kernel.sysrq = 1

Stuff kernel.sysrq = 1 in /etc/sysctl to make it permanent.

I do not really understand why it isn't enabled by default though...

mavroprovato · on March 14, 2014

> You need to press and hold Ctrl, Alt and PrtSc(SysRq) buttons, and while holding them, you need to press r, e, i, s, u, b

I think I need an extra hand to do this

DaCapoo · on March 14, 2014

Control isn't necessary - I'm not sure why it would be specified. On a laptop you might have to hold the Fn key to hit PrtScr, but aside from that it's two fingers to hit 'Right Alt' and 'PrtScr' and the other hand can mash r,e,i,s,u,b.

Source: http://en.wikipedia.org/wiki/Magic_SysRq_key

mavroprovato · on March 14, 2014

In my workplace I have a Dell Keyboard that has the SysRq/Scroll Lock/Pause Keys on the top right, above the numpad. Holding Alt+SysRq with one hand is just impossible.

Maybe it's easier in saner keyboards.

DaCapoo · on March 14, 2014

Yeah, with a typical ANSI layout it's a small stretch but definitely doable.

This is the layout that I have (and the only keyboard layout I'll ever buy, because every other throws the pipe key and backslash in a random spot along with randomly sizing the enter key)

http://commons.wikimedia.org/wiki/File:ANSI_Keyboard_Layout_...

pbhjpbhj · on March 14, 2014

Nose.

samsaga2 · on March 14, 2014

"You need to press and hold Ctrl, Alt and PrtSc(SysRq) buttons, and while holding them, you need to press r, e, i, s, u, b"

I'll need three hands.

iwwr · on March 14, 2014

Bind xkill to a global hotkey. If it's just fullscreen or screen-grabbing app that crashed, xkill will take care of it.

jsemrau · on March 14, 2014

Let it go, let it go! Can't hold it back any more. Let it go, let it go! Turn away and slam the door. I don't care what they're going to say. Let the storm rage on. The server never bothered me anyway.

jack57 · on March 15, 2014

Sorry for the double posts. An app I was using caused it.

otikik · on March 14, 2014

If you only have one hand, just press the power button.

squigs25 · on March 14, 2014

If only they had made this more ergonomically possible

tomrod · on March 14, 2014

Using Ubuntu minimal + xmonad + dmenu. Does not work.

plorg · on March 14, 2014

Certain Magic SysRq sequences are now disabled by default in Ubuntu. You can re-enable them by editing /etc/sysctl.d/10-magic-sysrq.conf. Alternately, you can enable them temporarily by # echo 1 > /proc/sys/kernel/sysrq (Or substitute "1" by the number described in magic-sysrq.conf, above)

sgtnasty · on March 14, 2014

I have not had to do this in a while now.

jack57 · on March 14, 2014

This complexity, my technically oriented friends, is part of the reason normal people do not use Linux.

NotHereNotThere · on March 14, 2014

How do you even come up to this conclusion? Knowing the magic key sequence is not required to operate a Linux system at all.

If a Windows box froze, and you had a (somewhat slim) chance of gracefully shutting it down, would you not use it?

Would you call pressing F8 to access magical boot options in Windows, a reason non-technical people wouldn't use Windows?

ragsagar · on March 14, 2014

The site is blocked in UAE.

013 · on March 14, 2014

What's UAE?

LoneWolf · on March 14, 2014

I would say United Arab Emirates

chris_wot · on March 14, 2014

Censor's not happy with magic?

aruggirello · on March 14, 2014

The Ubiquitous Amiga Emulator?