How Hiscox insurance adopted agile working in a £0.5bn bet on the cloud

Paul Wilford is the platform services group lead at Hiscox. He spoke at the Agile Delivery 2017 conference on how the insurance company learned how to deliver in more effective ways, and move their core applications to cloud infrastructure…

Steve Parks
Convivio

--

Originally, Hiscox had their data centre in the basement — but one day it flooded and there was no disaster recovery plan.

Following this, their applications were moved to proper data centres in London and Paris. But then, in 2014, there were two storage array failures on different occasions. There were plans for failover between the two data centres, but they hadn’t been tested and the idea was too scary — so they ploughed on with fixing the problem rather than following the plan.

They decided that this wasn’t resilient enough, and the development team felt that a move to cloud based architecture. The company took a risk by backing them on this, and the team was taking a risk by stepping forward to do this. They were given some freedom, but they weren’t going to be able to go straight to the ideal system, just take a big leap forward.

Here’s their journey…

2015 — Phase 1

The purpose of this phase was to find out if they could get business benefits from moving to the cloud. They felt from their knowledge that they could, but the CEO wanted proof.

To gain this proof they put together a small project with a team of pioneers from across the range of skillsets that would be needed — from devops, QA, build and release, project management and more. They also involved stakeholders and the application development teams.

For political reasons, the team had to include far too many people. Through having so many stakeholders in the room there were lots of mumblings about agile not being suitable as ‘you need to plan’ such projects, as if agile means there is no planning. They ploughed on.

In this phase they tried putting straightforward web app and database into the cloud, on two platforms — Azure and AWS. The intention was just to try it roughly to learn, then bin it, rather than invest the time to make it production ready.

Once they’d completed this phase, Azure gave them all sorts of discounts and offers upfront, while Amazon just said the price list was the price list, so they went with Azure on price.

2015 — Phase 2

In this phase, the aim was to identify which apps could be migrated and the benefits of migrating them, and to understand if a cloud management platform can help.

They brought in 3rd party expertise to assess the suitability of their applications for cloud deployment, and also to assess cloud management platforms (CMPs).

They were disappointed by the CMPs available. They wanted a management tool to plug their automation into, but they were expensive, and didn’t quite fit their process. It was decided that they’d need to build their process themselves.

The Hiscox team started to experiment and teach their Ops experts in:
* coding principles for infrastructure as code
* managing source control
* CI pipelines and testing of IaC
* orchestrating Azure through ARM/Powershell and beyond

At the same time they ran into some issues with stakeholders and the application development teams.

They’d been doing lots of comms, but got feedback that they weren’t doing any comms. They realised that their comms was invitations to things, sending things to read etc, relying on others to find ways to fit receiving that communication in amongst their other work. They decided to do more pushy comms — getting people in a room where they had to pay attention.

2016 — Phase 3

The purpose of this phase was to establish a base capability to take advantage of cloud computing benefits, migrate a subset of applications to the cloud, and deliver a support capability for applications that move to the cloud.

This meant lots more learning for the team.

The infrastructure and ops specialists:
- read lots of books
- learnt about code
- learnt about azure
- teach code geeks about real world constraints
- explain legacy — context is key

The code geeks:
- learnt about the Hiscox infrastructure
- learnt about the Hiscox network
- worked on fighting with issues with legacy applications

2017 — phase 4

In this phase the key work was to migrate the core UK policy admin system to the cloud. This was long, hard work because the system is a ‘Java monolith’, pretty much the exact opposite of a cloud native app. That meant delivery was all-consuming. They were using a lot of servers (around 24 environments at peak), which was expensive but still cheaper than cost of delay.

The infrastructure was actually more agile than the system it hosted, as the system was being developed in waterfall. Application development inevitably fell behind schedule, and delayed the migration to the cloud.

While this work was underway the cloud migration team were also supporting wider interest in cloud from around biz, supporting the organisations IT services in cloud adoption and developing the skills required, and they continued to build core platform.

The infrastructure was designed to have multi region failover, with the ability to move applications from Dublin to Amsterdam in 20 mins. But four days before they were due to go live with the migrated application their testing discovered something was wrong with this functionality. The flexibility of the cloud system allowed them to rapidly reconfigure, and still deploy on time.

Their CTO, Jonathan Fletcher, praised them highly for this work to take the organisation to the cloud and introduce agile ways of working, saying it was the highlight of his 20 year career in IT.

Lessons learned

The overarching lesson was that if you’re not outside your comfort zone you won’t learn. Ontop of this, Paul’s top 5 lessons learned are:

5. Embrace the new tech, the relentless, constant new tech
The cloud and devops tools are continually evolving, and new tools appear. The teams would constantly discover new approaches and experiment with these. Over period of a few months he’d see the number of tools bloom, then gradually contract again to sensible level, and then the pattern would repeat like natural selection. He felt it important to let this process happen rather than try to constrain it.

The core tooolset they have ended up with is: Bitbucket for source control, Terraform for infrastructure management, Puppet for configuration management, and Bamboo for automation. For monitoring they use Oms, Dynatrace, Splunk.

4. Hybrid is hard
The team struggled at times with the balance between accepting constraints and achieving tech excellence, and the balance between legacy systems and the new world in the gradual changeover.

3. Know when to stop/pause
Paul says that they are having to split time between how they run stuff in the cloud (providing indirectly attributable value) with actually getting stuff running in the cloud (directly attributable value).

They now have 13 applications migrated, which are visible to the business, but to do this they have had to work on dozens of things that are less visible, but enabled that migration and the successful running of the apps.

The team is now overloaded with requests from the business, and can’t keep up with the demand. Therefore they have asked to stop all work and spend some time working on the platform and tools to enable a major leap in productivity. Their aim now is to be developing the toolkit that application teams in the business can use themselves in a self-service way. That’s the route to scaling the cloud rollout.

2. Nobody knows
With new stuff, even the experts’ experience is months out of date. Therefore you need a team willing to explore and and a business willing to run survivable experiments.

1. Resistance is not futile
Just because there is a direct conflict between operations and development you can’t simply bin one of those two mindsets. You need to facilitate them working together to solve problems. Strong facilitation and agile coaching, is absolutely essential to a successful project.

Conclusion

The benefits are definitely there in agile working, Devops and migration to the cloud — but it is far from easy to do.
It’s important to involve all the skillsets and professional disciplines and ensure everyone has their part to play in the transition.

These were my notes from Paul’s session at the Agile Delivery 2017 conference.

You can continue the conversation on twitter with Paul, or with me.

Convivio is a digital agency that brings people together using technology.

We work with government and large organisations to deliver digital services.

Find out more at www.weareconvivio.com

--

--

Writing, Media, Tech, Entrepreneurship, Food - and particularly any crossover between them. I lead Convivio, a boutique digital services agency.