Tuesday 8 January 2013

A lazy sysadmin is a good sysadmin

As the sysadmin, it is your job to keep the IT systems running smoothly. If everything is running, they already know. If it isn't, they aren't interested in your petty excuses.
Unfortunately, that's the reality of the situation. That being the case, it is in your best interest to keep everything operational with as minimal downtime or interruption as possible. There's a mixture of human expectation, perception and reality all mixed up here, but essentially this means that in order to be good at your job it helps to be lazy.

Characteristics of a lazy sysadmin

Backups

Lazy sysadmins will be anally retentive when it comes to backs. They will ensure that backups are not only run, but tested to ensure they actually work. Backups will be stored offsite and rotated regularly. Initial backups will be to disk and then flushed to tape. Backup agents will be purchased for every system possible to make granular restoration easier. Complete system backups will also be kept and refreshed every 3 to 6 months so that entire systems can be restored in minimal time. Trial runs will be conducted to familiarise the sysadmin with the process of disaster recovery.

Virtualisation

Lazy sysadmins will also virtualise every system they possibly can. Virtualised servers make life easier by streamlining tasks and removing the hardware dependency on servers. Snapshots will be made to enable easy rollback from upgrades and service pack applications (if required). Lazy sysadmins will also have snapshots stored on redundant hardware for DR purposes.

Clustering / High Availability

All mission critical server applications will be clustered with failover/failback capability. This will allow the sysadmin to sleep at night if a single server happens to fail. Lazy sysadmins recognises that a 3 (or 5) server cluster is the ideal approach as it allows for redundancy even if one server is down for maintenance.

UPS / Generator / Airconditioning

Lazy sysadmins will insist that all IT systems are protected by good quality server grade UPS that are either continually on or line interactive. The UPS will be managed, have remote sensors and produce regular environmental reports and issue alerts. They will configure their servers to shutdown gracefully on power failure or in unfavourable environmental conditions. They will push for backup generators for the UPS and for airconditioning stating that the UPS may run the equipment - but not the airconditioners. They will also push for computer room quality airconditioners - preferably redundant - and not settle for domestic grade split systems.

Hardware

Lazy sysadmins will ensure the IT equipment that is purchased is tier 1 quality (HP, Cisco, IBM, Dell etc) with capability for expansion and at least 60% overhead for current requirements. They will not settle for tier 2 or white box equipment.

Remote Access

Lazy sysadmins will ensure that as many tasks as possible can be conducted from home or on the road and where possible by phone or tablet. The time to connect should be as low as possible.

Monitoring

Lazy sysadmins will setup detailed, granular monitoring of all the equipment, servers and services in a hierarchical fashion. A dashboard will be available for overview with external monitoring and alerts sent by email or SMS depending upon the severity. The lazy sysadmin will regularly check the log files of their systems looking for inconsistencies that may lead to larger problems at a later date.

Self-Healing Systems

Lazy sysadmins will make sure all essential services are self-restartable. Scripts will be written to monitor and record the system configuration before and after service restart. Ideally this will be simply an extension of the capabilities of the monitoring system.

Security

Lazy sysadmins will never compromise on system security. They will establish secure firewalls, secure vpn, vlans, dmz access, email scanning, forward and reverse proxies, virus protection, enforced password security and apply multi-factor authentication where possible.

Patches and Updates

Lazy sysadmins will apply patches and software updates on a regular basis. Patches increase stability and security of your system. Updates extend functionality and reduce time when external support is required.

Documentation

Lazy sysadmins recognise they have a poor memory, so they make sure that all new systems are built three times: once to familiarise, once to document and the last time to test the build documentation. That way if and when it comes time to rebuild that system, they know the documentation is accurate. Lazy sysadmins also write their system documentation aspirationally: that is, the system is documented how they would like it to be rather than as a snapshot of its current condition. That way, over time the documentation becomes more accurate rather than less accurate.

Training

Lazy sysadmins recognise that the more people that know what they do, the less likely they will be called out after hours. They will train their juniors to know as much as they do and encourage them to learn more independently. They will encourage juniors to become mini-experts in the smaller systems and document their systems accordingly.

So, if you are sysadmin, make sure you are a good one by being as lazy as possible and following the tips listed above.

2 comments:

  1. Ewald van GeffenJune 04, 2013 3:53 pm

    Love this.

    I only recently started sysadmining in SMB (<50users) professional environments as a side gig during my studies. We figured documentation (ie. company continuinity) is _the_ most important point. We use dokuwiki for documentation although thinking about switching over to confluence.

    I agree with all other points, our weak points are updates and no self-healing. In our experience there are layer 8 and upwards problems with getting A-grade hardware, so we often compromise with Supermicro.

    The net effect is that we are maybe once a month on site, because there's no need for otherwise, things just keep running or can be remotely fixed. It's even come so far that we pro-actively seek our users for problems because we feel guilty getting payed doing nothing.

    How extensive are you about your monitoring? Every hosted + service monitored? What software would you recommend (preferably low-budget/free/OSS).

    I actually tried to sent you a PM earlier today about the linux admin FB group. Specificly our shared annoyance of the beginner questions but apparently that costs €0.8 nowadays. 'Social' network :x

    ReplyDelete
    Replies
    1. I use Nagios for monitoring. It has a bit of a learning curve but well worth it and there is an excellent nagiosexchange site for swapping check scripts. If the learning curve bothers you, there is a commercial variant that is very easy to get up and going.

      Delete