The main thrust of the keynote was highlighting the insatiable greed for control over technology as exhibited by Governments and legislators - particular when they have little to no understanding of the technologies they are trying to legislate. Particularly damaging is Governments desire to hamstring encryption technology and impose export controls on intangibles and its effect on open source. George emphasised the need to communicate technical concepts to lay people in language they can understand.
Day 2 was also the second day on the miniconfs. For me, that meant the sysadmin miniconf. This one did not have the structure exhibited by he opencloud symposium - each talk was an island of knowledge and there were thicknesses and thinnesses. The talks were also shorter - meaning more of them. The sysadmin miniconf has its own page, so you could ignore this blog entry completely and go there.
2/1 Is that a data-center in your pocket? by Steven Ellis
Subtitled "There will be dragons" rather than the predictable "...or are you pleased to see me."Steven provided a walk-through on how to create a portable, virtualised cloud infrastructure for demo, training and development purposes. This talk was heavy on detail and I found myself wanting to explore this more at a later date. He utilised a USB3 attached SSD drive connected to an ARM Pine64. The setup utilised nested virtualisation, thin LVM and docker.
According to Steve, the "cloud" will very soon be mostly ARM64 - so it's time to prepare for that. He also demonstrated how UEFI can be used to secure boot virtual machines.
2/2 Revisiting Unix principles for modern systems automation by Martin Kraff
Martin highlighted the fact that in the transition from Unix to Linux, somehow we forgot the habits born from Unix administration - in particular, we forgot about system automation, to whit:
- Monitoring
- Data collection
- Policy enforcement
2/3 A Gentle Introduction to Ceph by Tim Serong (Suse)
2/4 Keeping Pinterest Running by Joe Gordon
Joe talked about the challenges and differences in supporting a service as opposed to supporting a piece of software. His basic description is that it's like changing tyres whilst driving at 100MPH. The differences include:
- stable branches
- no drivers and configurations
- no support matrix
- dependency versions
- dev support their own service
- testing against prod traffic
One thing that really interested me is their use of a "Run Book" for the on-call support team. All recent changes are documented in the Run Book against anything it could potentially affect and who to contact about those changes. If on-call support has to respond to a problem, they consult the Run Book first.
In addition to a staging environment, they also have what they call the "canary" environment - akin to the canary in a coal mine metaphor. However, Joe said it was more akin to a rabbit in a sarin gas plant metaphor (insert chuckles).
dev->staging->canary->prod
The staging system uses dark traffic, however the canary system operates of a minimal set of live traffic. If problems occur at any point, they rollback and conduct a blameless post-mortem. Joe emphasised that the blameless component was the most critical.
Before deployment, they conduct a pre-mortem covering:
- Dependencies
- Define an SLA
- Alerting
- Capacity Planning
- Testing
- On call rotation
- Decider to turn feature off if needed
- Incremental launch plan
- Rate limiting
2/5 Site Reliability Engineering at Dropbox by Tammy Butow
She also emphasised the need for a "Captain's Log" - which is a log of every on-call alert. Also for cross team disaster recovery testing.
2/6 Network Performance Tuning by Jamie Bainbridge
2/7 'Can you hear me now?' Networking for containers by Jay Coles
I felt a little lost in this talk. This was part 3/3 in a series of talks by Jay on containerisation. As mentioned before, I have neglected containers - something I need to remediate as it seems everyone has embraced them. Much of the material for this talk is available here.2/8 Pingbeat: y'know, for pings! by Joshua Rich
Pingbeats power lies in its ability to write the ping response to Elasticsearch, an open-source NoSQL-like data-store with powerful, built-in search and analytics. Combined with Kibana, a web-based front-end to Elasticsearch, you get an interactive interface to track, search and visualise your network health in near real-time.
2/9 The life of a sysadmin in a research environment by Eric Burgueno
2/10 Creating bespoke logging systems and dashboards with Grafana, in fifteen minutes by Andrew McDonnell
Andrew gave a live install, config and run demonstration of Grafana, starting from a fresh Ubuntu 14 VM with Docker (again!) where he installed and setup Graphite using Carbon to log both host CPU resources and MQTT feeds and created a custom dashboard to suit.
2/11 Order in the chaos: or lessons learnt on planning in operations by Peter Hall (realestate.com.au)
- Fixed time iterations
- Plan the scope the known work for the next 2-3 weeks
- Leave sufficient slack for urgent work
- Be realistic
Interruptions:
- Assign team members to dev teams
- Have a rotating “ops goal keeper” with a day pager who is free from other work
- Have developers on pager as well. This helps in closing the feedback loop so that they are aware of issues in production
2/12 From Commit to Cloud by Daniel Hall
- Fast (10 minutes)
- Small (ideally a single commit, aware of whole change)
- Easy (as little human involvement as possible, minimise context switching, simple to understand)
This leaves less to break, easier rollbacks and allows the dev team to focus on just one thing at a time rather than a multitude of tracked changes. The basic idea is that deployments should be frequent and nobody should be afraid to deploy.
In the setup Daniel works with, they have:
- 30 separate microservices
- 88 docker machines across 15 workers
- 7 deployments to prod each working day!
- Only 4 rollbacks in 1.5 years
Their deployment steps are:
- write some code
- push to git reporting, build app
- automated tests run
- app is packaged
- deployed to staging
- test in staging
- approve for prod (single click)
- deploy to production
2/13 LNAV
The final talk was a 10 minute adhoc one discussing lnav as a replacement to using tail -f /var/log/syslog when looking at the systemlog. I am fully converted to this tool and will be using it everywhere from now one. It uses static libraries, so you can simply copy it from one system to another as a standalone binary.
No comments:
Post a Comment