For the sysadmin community and anyone else that feels like chiming in

Literally 9 out of 10 disasters I read about in IT are caused by someone typing an intrusive/destructive command (I’m looking at you Gitlab) into the wrong window/wrong server.

What in your collective opinions, are some effective strategies to try to mitigate this problem? I’ve seen things like different color prompts, other people reviewing the command before it’s entered…so on and so forth … but that doesn’t seem to make a difference when fatigue is involved.

Thoughts?

My thought? Yes, it happens. People try to be careful but make mistakes.

I have used different color backgrounds when running with elevated privileges. It may raise my awareness a bit, but I’m sure I’ve still done something stupid.

Some old ideas

  • make any window that is root tiny ( eg 10 rows) and always put it in same position
  • script what you do as root… having a record helps you undo things
  • never be in a hurry

My favorite mistake… mount a filesystem that is some other
Linux, cd into there, then cd /etc and make some config changes… to the WRONG Linux. I needed cd etc of course, without the /… stupid

Is there a way to automate the scripting of the root window?

Sounds like a job for … da dadada …python

Yes, put script filename in root’s .bashrc file
You might need something to terminate it when you exit

100% - I need to escalate this issue with my bosses - 'cause one of my main customers is deploying Linux workloads in AWS - and you login to the server and it’s name is some bullshit related to it’s IP addresses (with dashes instead of dots) - how the F are you supposed to know which one you’re working on - and they don’t even get me to do the deployment - some other numpties do it - one of them had ZERO clue about CPU architecture and deployed an arm64 system that I had to install iperf and iperf3 on - YOU IDIOT! How the actual “F” do you survive in IT not knowing different CPU architectures exist?

Can be ineffective with people like me (red-green colourblind) - most of my customers have naming conventions so you can tell if the server is Test or Prod (e.g. somewhere before the suffix number in the servername - a T, or a P (or a “D” for dev). I say “most” - but not all…

1 Like

I did laugh out loud at this. I’d like to think they may have done that on purpose because the Graviton instances are cheaper. I’m guessing that was not that case though.

Exactly what we do. Not 100% followed but servers have a prefix of PRD/UAT/DEV/STG depending on their environment. Also included in the middle is the location AWS/CDR/KOP. Followed by a project code and number.

1 Like