Cloud covers Netflix resiliency and reliability issues

Editor ©RapidTVNews | 18-06-2012

Marking what could be an insight into an issue that could plague online TV and online video providers, over-the-top (OTT) leader Netflix has revealed the extent to which it has had to invest in infrastructure in order to ensure end user quality as the service has rapidly developed.

Indeed Netflix concedes that its move to cloud-based distribution, seen as absolutely necessary to ensure long-term ambitions as it scales out across the world, forced it to develop new design patterns for running a reliable and resilient distributed system.

Yet in addition to the technical hurdles it overcame to run successfully in the cloud, Netflix says that it also had to undergo operational and organisational transformations.

Netflix revealed that as the scale of web applications has grown over time due to the addition of features and growth of usage, its application architecture has changed radically. For example, when, it revealed, it was running out of its data centre, Netflix described itself at the time as “a large monolithic Java application running inside of a tomcat container.”

This resulted in deployment as part of the planned rollout were “heavy and risk-laden” and, because of all the various elements involved in each deployment, it was handled by a centralised team that was part of IT operations who also carried out production support using a traditional network operating centre (NOC) which itself was organisationally separate from the development team.

In moving to the cloud, Netflix says that it viewed an opportunity to “recast the mould” for how it built and deployed its software, using the cloud migration to re-architect its system into a service oriented architecture with hundreds of individual services. Each service was able to be “revved on its own deployment schedule…empowering each team to deliver innovation at its own desired pace.”