marzo 2013 – List<Stuff>

Charles Darwin and Continuous Delivery

Charles Darwin published On the Origin of Species in 1859. It is somewhat remarkable that some of the theories enunciated in this work can be verified over 150 years later, in human knowledge fields such different from biology as software development.

In order to set the context of the subjects discussed below, and before addressing how are we are affected by Darwin’s statements, let’s take a little trip, visiting some of the most successful Internet companies, whose websites have astronomical numbers of users, millions of pages served per day, and countless amounts of completed transactions.

The voyage aboard the Beagle

What is dangerous is not to evolve – Jeff Bezos, CEO & President of Amazon.com

Darwin embarked on a journey of nearly five years aboard the HMS Beagle, which allowed him to study many animal species and to obtain valuable information to support the theories that later showed in his work. It’s much easier for us to study the species that we are interested in, since we only need a few glimpses to some content that can be found online, publicly available.

We are going to start at Seattle, where a small online bookstore founded in a garage back in the 90’s, has ended up becoming a global bazaar where you can buy anything from the mythical t-shirt of three wolves howling at the full moon (if you did not know about the product, I recommend you to read the customers’ evaluations), to genuine uranium.

Amazon serves 137 million customers per week and has an annual revenue of 34 billion dollars. If all of its active users came together in a country, it would have twice as many people as Canada. You can imagine that, with such amazing figures, introducing new features in the website should be something that they consider thoroughly, and that it would be something that they can’t afford to do so often because of the risk of bugs and unexpected errors showing up, which could lead to huge losses.

Right?

No. Nothing further from the truth. By 2011, Amazon was releasing changes in production every 11.6 seconds on average, involving up to 30,000 servers simultaneously. I’m lacking more recent numbers, but from the evolution of the business, anyone could work out that these figures must only have become even more striking.

The biggest risk is not taking any risk… In a world that changing really quickly, the only strategy that is guaranteed to fail is not taking risks. – Mark Zuckerberg, CEO & Chairman of Facebook

800 miles south, at Menlo Park (California), what began as a social experiment for a group of undergraduates, is serving more than one billion active users, who upload 250 million images a day and view one thousand billion pages per month. The figures are dizzying. The effect is so strong that even some parents have named their children Facebook, literally.

The source code for Facebook is compiled into a binary weighing 1.5 GB and is maintained by more than 500 developers. Stakes are high for each change and deployment. Anyone would expect that any change is made after thorough verification of a strict QA team, and never without the explicit approval from an horde of managers, armed to the teeth with the most inflexible bureaucracy.

Or maybe not?

In fact, whoever imagines it that way, is completely mistaken. Minor changes are released into production at least once a day, and a major version is deployed once a week. Almost all the code is modified directly on the main line; they don’t use branches to protect its integrity. Everyone does testing and can file bugs. Everything is automated to the maximum.

In 100 years people will look back on now and say, ‘That was the Internet Age.’ And computers will be seen as a mere ingredient to the Internet Age. – Reed Hastings, CEO of Netflix

Not far from there, also in California, NetFlix does business from a town called Los Gatos. It is the largest online service for movies and television shows, which are offered by streaming to its subscribers, who sum more than 25 million.

NetFlix services receive frequent attacks that put them at risk, and even lead to failures in specific nodes, making it necessary to perform interventions in order to prevent further problems. So far, it’s not that different from any other big company providing Internet services. What makes it extraordinary in the case of NetFlix, is that most of these attacks are caused… by themselves!

How can it be possible? Have they gone nuts? Are there disgruntled employees trying to sabotage the company from within?

Not really. These attacks are perfectly orchestrated by the Simian Army, the horde of little nuisances developed by Netflix to push the boundaries of their own infrastructure and applications. The Chaos Monkey randomly disables instances to ensure that they can survive this type of failure. The Latency Monkey simulates delays and loss of connectivity. The Conformity Monkey shuts down instances not adhering to a defined set of best practices, immediately and without remorse. And so on… there’s even a Chaos Gorilla, the Chaos Monkey’s bodyguard, who causes an outage across the entire cloud availability zone where it monkeys around. The consequence is that, when these problems occur unexpectedly, all their systems are already prepared to deal with them, since the team has been able to test the procedures, and the code has already been modified to mitigate the consequences. If it sounds interesting for you, you can even take a look at how it is implemented.

In this business, by the time you realize you’re in trouble, it’s too late to save yourself. Unless you’re running scared all the time, you’re gone. – Bill Gates, Co-fundador de Microsoft

Now we return to our first stop, Seattle. Nearby, at Redmond, Team Foundation Service Team serves many other development teams worldwide, providing a tool to support the complete application lifecycle: planning, collaboration, version control, testing, automated builds, etc. With a worldwide-distributed user base, working in all time zones, availability is critical; anyone working in software development knows about the hassle of losing access to version control or not being able to use the build server. The usual approach in these cases, is to focus on a stable set of features, that allows to provide an adequate service to the users, and with minimum changes over time; that way they can guarantee that the availability of the service is not affected by defects introduced by the release of new features.

Do you agree with this approach?

They don’t! The trend since the product launch has been to introduce new features continuously, with a cadence of about three weeks. And we’re not talking about minor or cosmetic changes; those updates have included such important features as automated deployment to Azure, Git integration, or customizable Kanban boards. The few service outages so far have been mostly predictable, and for many of them the user has been alerted so she could be prepared.

Just the same way as Darwin did aboard the Beagle, we could indefinitely continue with our journey in search of peculiar species, in search of many other organizations working in a way that seems to defy common sense and established rules:

Flickr deploys several times a day, and until recently, they reported on their website on the time of the last deployment, and how many changes were included in it.
At Spotify, where they maintain over 100 different systems between clients, backend services, components, etc., any of the 250 developers is authorized to modify any of these systems directly if it is needed in order to implement a feature.
Etsy experiments with new features directly into production, a technique known as A / B testing, to identify those changes that attract more interest from customers.

What conclusions can be drawn at the light of all this information?

Are they all gone completely mad?

Or are we discovering a new way to do work, that breaks with many of the preconceived ideas considered valid so far in software development?

For example, it may seem counter-intuitive to think that the more deployments you do, the less problems you’ll have while deploying. We’ve all had that fateful release on a Friday (you know, if it’s not on Friday, you can’t call it a real deployment…), which forced us to spend all weekend struggling to put the ~~fuc#$@&~~ application up. And the natural reaction is to avoid doing more deployments with all our strength, and postpone it as much as possible, because we know that it will hurt again. After all, if there are many more attempts, it is also much more likely to fail, isn’t it?

Well, usually not. In fact, the effect is that repetition leads to more predictable and controllable deployments, with less uncertainty and much smaller and manageable issues. The underlying philosophy is that if something hurts, rather than avoid it, you should do it more often, and that way you’ll make the pain more bearable. Or putting it another way, it is more acceptable a succession of small pains, than a large, concentrated traumatic pain.

Is it feasible that any single developer has the power to release any changes she deems ready to deliver? Yes, if that change is subjected to a verification process that ensures that it will not break anything once it has been released.

Is it reckless to remove from this verification process a whole chain of bureaucracy, requests, approvals, meetings between departments and a comprehensive control of the process by adequately trained roles?

It is not reckless, as long as all or most of these verifications have been coded and are run in the form of acceptance, regression and smoke automated tests, and unattended deployments, and with the ability of checking the status of all the process in an easy way. Not only it is not reckless, but it will far exceed the reliability of a group of humans doing the same process manually (or even worse, a random variation of it), often in a state of boredom and under a poor concentration. I am not talking about completely eliminating manual steps, which is usually impossible: at least there will be a manual first run of acceptance tests for the user to verify that the development team has understood whatever was intended to be addressed with the particular requirement. Or there might be some special device in our environment for which we can not set up a fully automated deployment. But we always can deal with the rest of our process, and aim to reduce these manual steps as much as possible.

Continuous Delivery is a discipline, a way to work, or a set of patterns and practices, that bears in mind all these factors and takes advantage of them to the maximum. We’re going to rely on techniques such as test automation and deployment, continuous integration, transparency and visibility throughout the entire process, the detailed scrutiny of all dependencies and configuration parameters that affect the delivery of our software, the detection and early addressing of problematic changes, and many others, to enable the possibility that any slight change in our code, committed to version control, is a candidate to be released as soon as possible, and indeed it will, if nothing makes us (automatically) discard it along the way.

It is not only about continuous deployment, as many mistakenly assume, as you can be deploying crap and still do it automatically and continuously. Nor is it just continuous automated testing. It is comprised of these practices but also of many others; all of those which are needed to be confident when assuring that a change is ready for use and the user can benefit from it.

Of course, for this to be successful, close collaboration between whoever is involved is needed, in an environment where barriers and departmental silos have been removed. It is something that movements like DevOps are also addressing.

What is the benefit?

If we stick to the results, the figures from these companies, we could say that the benefit is huge. But in order to avoid falling into the ’Correlation implies Causation’ fallacy, we should be more specific and focus on the context of the software development process.What we find then is:

A transparent and predictable delivery process. For each change, we always go through the same sequence of steps, and these are automated as far as possible. No surprises.
Fewer defects in production. The defects appear and are addressed in earlier stages, even automatically. The standardized delivery process prevents any of these defects from ending up in production because of a misunderstanding, or because of the work being done in a different way. It also provides traceability of the origin of these problems.
Flexibility to undertake changes. Changes are addressed in smaller, more manageable pieces. They are implemented and delivered promptly.
Immediate and useful feedback about changes, even from the production environment: whether they are running smoothly, how are users accepting them, or the impact on the business.
Less time required to deploy and release into production, since everything has been automated as much as possible.
Empowered teams, motivated by the confidence that has been put in them, and the continued feeling of delivering increments of tangible value.

All of this sounds great, but it’s not for me

The adoption of Continuous Delivery practices can be worth it, even if you do not need or do not want to release your software so frequently. Overall, the aforementioned benefits should be the same, so any team willing to improve could consider adopting this approach.

There are very special cases where Continuous Delivery might not be the best option, or even be counterproductive. I, for one, have found very few of them. Most times, these are scenarios where the effort to adopt the practices will not justify the results: legacy systems, outdated technologies, lack of adequate tools to set up the environment, designs not prepared for automation or testing, etc.

But the real problem, which unfortunately appears quite often, comes when the team or the organization itself does not adopt an open attitude to change and improvements. We could say that deep inside, even unconsciously, they do not want a transparent delivery process, they don’t need fewer defects in production, or they don’t want flexibility to cope with changes. Externally this manifests itself as the decision of not to invest in the necessary improvements. The most common example is the typical argument of the kind ‘our case is very unusual,’ ‘our system is very complex,’ ‘we deal with a very delicate business,’ ‘our users are very special,’ ‘my boss would never let me,’ ‘my mom won’t let me’ or ‘insert your favorite excuse here.’ Among these, it is quite frequent to hear ‘we can’t afford to invest in it,’ when in fact, as we will see in a moment, is thatvwhat for sure you can’t afford is not to invest in it.

Is your system bigger than Team Foundation Service?

Do you deal with a more complex business than Amazon?

Are your users more demanding than those from Facebook? Do they have more special requirements?

Do you have to serve a bigger volume of data than NetFlix?

If you’re among the vast majority, those who would respond negatively to these questions, chances are that you are just feeling lazy about addressing the transition to a Continuous Delivery model. In that case, my advice is to be careful, because your organization can suffer the same fate as the dodo or the thylacine.

Natural selection

We had left Darwin aboard the Beagle, sailing the seven seas in search of unique species. At the end of his voyage, he felt perplexed about the variety of wildlife and fossils he had found, so he began an investigation that led him to enunciate the theory of natural selection in his book «On the Origin of Species».

It is not the strongest of the species that survives, nor the most intelligent that survives. It is the one that is the most adaptable to change.

In the long history of humankind (and animal kind, too) those who learned to collaborate and improvise most effectively have prevailed.

Charles Darwin, English naturalist

Natural selection states that those members of a population which have the characteristics that are better adapted to their environment, are more likely to survive. What about the others? Well, sooner or later they’ll end up disappearing.

It is a law that applies to living organisms, but if you think about it for a moment, what is an organization but a big living organism? Of course natural selection applies to companies and organizations, as any list of extinct companies demonstrates.

In the constantly changing environment in which most businesses operate, it is no longer enough to offer nice and cheap products. You have to deliver them sooner, and evolve them quickly in response to the demands of the users. Keeping track of metrics such as team velocity or defect rate is insufficient. The metrics that are making a difference between those who succeed and those who get stuck on the way are others:

Cycle time: the time elapsed on average since we start working on a feature, until we have it released in production.
Mean time to failure (MTTF): how long it takes, on average, for my system to suffer from a big issue or an outage.
Mean time to recover (MTTR): how long it takes, on average, for my system to be fully functional again after a big problem or an outage.

Natural selection will favor those who are able to hold these values as small as possible, and this is exactly one of the areas where Continuous Delivery can help better.

OK, I don’t want to become extinct. Where should I start?

Throughout this article we have focused on showing the benefits of Continuous Delivery and what could happen if we ignore it. But we have not covered in any depth how to implement it, and given the large number of patterns and practices to consider, it can end up being a process that is far from trivial.

Undoubtedly, there is a cultural side, which will demand from us to work within our organization to remove barriers and silos, and improve collaboration as much as possible.

There are lots of available resources that can help us to get started, but without any doubt the most valuable one is the excellent book by Jez Humble and David Farley, Continuous Delivery.

If your environment is based on Microsoft technologies, fortunately we have great tools available that can support most of the aforementioned practices. Visual Studio and Team Foundation Server, and related tools, will help us to implement the whole Continuous Delivery ‘pipeline’, from automating all kinds of testing and deployment, to more specific topics such as static code analysis or continuous integration. It is true that these tools require customization work and some tweaking in order to be suited for the model we are proposing, but here at Plain Concepts we can help you to prepare the environment that best suits your project; it’s something that we’ve done before for many organizations, and seems that still none of them have become extinct.

Also, if you want to get an overall idea about how Team Foundation Server can be customized in order to support Continuous Delivery, you can have a look at my presentation on this topic at ALM Summit 3.

And if you can afford to wait a bit longer, right now I’m working with the Microsoft Patterns & Practices team in a new book about the subject that will be available in a few months, where we will cover in depth whatever is needed to put these ideas into practice in an effective way. More news about it very soon!

ALM Summit 3 – Setting up a Continuous Delivery Deployment Pipeline with TFS from Jose Luis Soria Teruel

Charles Darwin y la Entrega Continua

Charles Darwin publicó El Origen de las Especies en 1859. No deja de ser admirable que algunas de las teorías que enunció en esta obra se pueden verificar más de 150 años después, en campos del conocimiento humano tan dispares a la biología como puede ser el desarrollo de software.

Para establecer el contexto de los temas que veremos a continuación, antes de abordar cómo nos afectan los enunciados de Darwin, vamos a hacer un pequeño viaje, en el que visitaremos algunas de las empresas más exitosas de Internet, cuyas webs cuentan con cifras astronómicas de usuarios, millones de páginas servidas al día, e incontables cantidades de transacciones completadas.

El viaje del Beagle

Lo peligroso es no evolucionar – Jeff Bezos, CEO & Presidente de Amazon.com

Darwin se embarcó en un viaje de casi cinco años a bordo del buque HMS Beagle, que le permitió estudiar multitud de especies animales y obtener información valiosa para soportar las teorías que posteriormente reflejó en su obra. Para estudiar las especies que nos interesan a nosotros, lo tenemos mucho más fácil, pues nos basta con un par de vistazos a algunos contenidos que podemos encontrar en la red, disponibles públicamente.

Empezamos en Seattle, donde una pequeña librería online fundada en un garaje en los años 90, ha acabado convirtiéndose un bazar global donde se puede comprar desde la mítica camiseta de los tres lobos aullando a la luna llena (si no conocías el producto, te recomiendo que leas las evaluaciones de los clientes), a verdadero uranio.

Amazon sirve a 137 millones de clientes por semana y tiene unos beneficios de 34.000 millones de dólares anuales. Si todos sus usuarios activos se juntasen en un país, éste tendría el doble de habitantes que Canadá. Os podéis imaginar que con semejantes cifras, introducir nuevas funcionalidades en el sitio web debe ser algo que se piensen muy mucho, y no se puedan permitir hacer con demasiada frecuencia por el riesgo de aparición de bugs y errores inesperados, que podrían conllevar pérdidas millonarias.

¿Correcto?

No. Nada más lejos de la realidad. En 2011 Amazon estaba liberando cambios en producción cada 11,6 segundos de media, que podían estar afectando hasta a 30.000 servidores a la vez. No dispongo de datos más actualizados, pero por la evolución del negocio, todo hace pensar que estas cifras no habrán hecho más que volverse aún más sorprendentes.

El mayor riesgo es no asumir ningún riesgo… En un mundo que está cambiando realmente rápido, la única estrategia con garantías de fallar es no asumir riesgos. – Mark Zuckerberg, CEO & Chairman de Facebook

1.300 kilómetros al sur, en Menlo Park (California), lo que empezó como el experimento social de un grupo de universitarios, está dando servicio a más de mil millones de usuarios activos, que suben 250 millones de imágenes al día y consultan un billón de páginas al mes. Las cifras que se manejan son simplemente mareantes. La repercusión es tal que incluso hay padres que ponen a sus hijos el nombre de Facebook, literalmente.

El código fuente de Facebook es compilado en un binario que pesa 1,5 GB y es mantenido por más de 500 desarrolladores. En cada modificación y despliegue, hay mucho en juego. Es de esperar que cualquier cambio se haga tras la verificación exhaustiva de un estricto equipo de QA, y nunca sin la aprobación expresa de un ejército de gerentes armados hasta los dientes con la más inflexible burocracia.

¿O quizás no?

La verdad es que, el que se lo imagine así, está muy equivocado. Se sale a producción un mínimo de una vez al día con cambios menores, y una vez a la semana se despliega una versión mayor. Casi todo el código se modifica directamente sobre la línea principal; no usan ramas para proteger la estabilidad de la misma. Todo el mundo hace pruebas y puede reportar defectos. Todo está automatizado al máximo.

En 100 años, la gente echará la vista atrás y dirá: “Eso fue la Era de Internet”. Y los ordenadores se verán como simples ingredientes de esta Era de Internet. – Reed Hastings, CEO de Netflix

No muy lejos de allí, desde un pueblo de California llamado Los Gatos, opera NetFlix. Se trata del mayor servicio de películas y series de televisión online ofrecidas por streaming a sus suscriptores, que suman más de 25 millones.

Los servicios de NetFlix reciben de forma frecuente ataques que hacen peligrar el correcto funcionamiento de los mismos, e incluso provocan caídas en nodos concretos que hacen necesarias intervenciones para evitar problemas mayores. Hasta aquí no es diferente de cualquier otra gran empresa que proporcione servicios en Internet. Lo particular del caso de NetFlix es que gran parte de estos ataques están provocados… ¡por ellos mismos!

¿Cómo es posible? ¿Han perdido la cabeza? ¿Hay empleados descontentos intentando sabotear la compañía desde dentro?

En realidad no. Los ataques están perfectamente orquestados por la Simian Army, el ejército de pequeños incordios desarrollado por NetFlix para llevar al límite su propia infraestructura y aplicaciones. El Chaos Monkey se ocupa de deshabilitar instancias aleatoriamente para asegurarse de que se puede sobrevivir a este tipo de fallo. El Latency Monkey simula retardos y pérdida de conectividad. El Conformity Monkey cierra instancias que no cumplen un conjunto definido de buenas prácticas, directamente y sin mayores contemplaciones. Y así sucesivamente… incluso hay un Chaos Gorilla, el primo de zumosol del Chaos Monkey, que corta de un plumazo el servicio en toda la zona de disponibilidad de la nube en la que desempeña sus monerías. El resultado es que cuando este tipo de problemas aparecen de forma inesperada, todos sus sistemas y equipos ya están preparado para tratar con ellos, puesto que han podido ensayar los procedimientos y el código se ha protegido para mitigarlos. Si la idea te parece interesante, incluso puedes echar un vistazo a cómo está implementado.

En este negocio, para cuando te has dado cuenta de que tienes problemas, ya es demasiado tarde para salvarte. A no ser que te estés preocupando continuamente, estás acabado. – Bill Gates, Co-fundador de Microsoft

Volvemos a nuestra primera parada, Seattle. Justo al lado, en Redmond, el equipo de Team Foundation Service da servicio a muchos otros equipos de desarrollo a nivel mundial, proporcionando una herramienta completa para dar soporte al ciclo de vida de las aplicaciones: planificación, colaboración, control de versiones, ejecución de pruebas, construcciones automatizadas, etc. Con una base de usuarios a nivel mundial, trabajando en todas las zonas horarias, los requisitos de disponibilidad son críticos; todos los que trabajamos en desarrollo de software sabemos el fastidio que supone quedarse sin acceso al control de versiones o que el servidor de construcciones automatizadas no esté disponible. El enfoque usual en estos casos es centrarse en una base estable de características que permitan dar un servicio adecuado a los usuarios y que cambien poco en el tiempo, de esa forma garantizamos que la disponibilidad del servicio no se ve afectada por defectos derivados de la introducción de características nuevas.

¿Estás de acuerdo con este enfoque?

¡Pues ellos no! La tendencia desde el lanzamiento del producto ha sido la de introducir nuevas funcionalidades de forma continua, con cadencias de aproximadamente tres semanas. Y no estamos hablando de cambios menores o estéticos; en esas actualizaciones han entrado características de tanta entidad como el despliegue automatizado a Azure, la integración con Git, o la personalización de tableros Kanban. Las mínimas caídas de servicio acontecidas hasta la fecha han sido en su mayoría previsibles, y en muchas de ellas el usuario es alertado para que pueda estar listo.

De la misma forma que hizo Darwin a bordo del Beagle, podríamos seguir indefinidamente con nuestro periplo en busca de especies peculiares, de otras muchas organizaciones que usan un modo de trabajo que parece desafiar el sentido común o las reglas establecidas:

Flickr realiza varios despliegues al día, y hasta hace poco informaba en su web de la hora del último despliegue, y de cuántos cambios había incluido.
En Spotify, donde se mantienen más de 100 sistemas distintos entre clientes, servicios de backend, componentes, etc., cualquiera de los 250 desarrolladores está autorizado a modificar cualquiera de estos sistemas directamente si lo necesita para implementar una característica.
Etsy experimenta con características nuevas directamente en producción, una técnica conocida como A/B testing, para identificar aquellos cambios que atraen un mayor interés de los clientes.

¿Qué conclusiones podemos sacar a la vista de toda esta información?

¿Se han vuelto todos completamente locos?

¿O estamos ante un nuevo modelo de trabajo, que rompe con muchas de las ideas que considerábamos válidas hasta ahora en desarrollo de software?

Por ejemplo, puede parecer anti-intuitivo pensar que hacer más despliegues te lleve a tener menos problemas al desplegar. Todos hemos tenido esa salida a producción fatídica un viernes (ya se sabe que si no es en viernes, no es un verdadero despliegue…), que nos ha obligado a estar todo el fin de semana luchando para poner la ~~jod$#@~~ aplicación en marcha. Y la reacción natural es resistirse a volver a desplegar con todas nuestras fuerzas, demorarlo al máximo, ya que sabemos que nos va a doler otra vez. Al fin y al cabo, al haber muchos más intentos, hay también muchas más posibilidades de fallar ¿no es así?

Pues por lo general no. En realidad, el efecto es que la repetición nos lleva a que los despliegues son más predecibles, más controlados, con menos incertidumbre y con problemas mucho más pequeños y controlables. La filosofía subyacente es que si algo duele, en lugar de evitarlo deberías hacerlo más frecuentemente, y así harás el dolor más llevadero. O dicho de otra forma, es más asumible una sucesión de dolores pequeños que un gran dolor traumático concentrado.

¿Es viable que cualquier simple desarrollador tenga el poder de poner en producción cualquier cambio que él considere listo para entregar? Sí, si dicho cambio es sometido a todo un proceso de verificación que asegura que no se va a romper nada si lo liberamos.

¿Es imprudente eliminar de ese proceso de verificación toda una cadena de burocracia, solicitudes, aprobaciones, reuniones entre departamentos y control exhaustivo del proceso por parte de los roles adecuadamente capacitados?

No es para nada imprudente, si todas o la mayoría de esas verificaciones han sido codificadas y se ejecutan en la forma de pruebas de aceptación, de regresión y de humo automatizadas, de despliegues desatendidos, y con la posibilidad de visualizar fácilmente el estado de todo el proceso. No sólo no es imprudente, sino que va a superar con creces la fiabilidad de un grupo de humanos que hagan el mismo proceso (o lo que es peor, una variación aleatoria del mismo) de forma manual, muchas veces en un estado de aburrimiento absoluto y con la concentración bajo mínimos. No me estoy refiriendo a eliminar por completo los pasos manuales, lo cual es por lo general imposible: al menos habrá una primera ejecución manual de las pruebas de aceptación para que el usuario pueda verificar si el equipo de desarrollo ha entendido bien lo que se pretendía conseguir con el requisito concreto. O quizá pueda haber algún dispositivo especial en nuestro entorno para el que no podamos configurar un despliegue totalmente automatizado. Pero sí podemos abordar el resto de nuestro proceso, y tender a minimizar estos pasos manuales en la medida de lo posible.

La Entrega Continua (Continuous Delivery) es una disciplina, un modo de trabajo, o un conjunto de patrones y prácticas, que tiene en cuenta todos estos factores posibles de optimización y los explota al máximo. Nos vamos a basar en técnicas como la automatización de las pruebas y despliegues, la integración continua, la transparencia y visibilidad a lo largo de todo el proceso, el control exhaustivo de todas las dependencias y parámetros de configuración que afectan a la entrega de nuestro software, la detección y tratamiento temprano de cambios problemáticos, y otras muchas, para habilitar la posibilidad de que cualquier mínimo cambio en nuestro código, que se suba al control de versiones, sea candidato a acabar en producción en el mínimo tiempo posible, y de hecho acabe allí si nada nos hace desecharlo (a ser posible automáticamente) en el transcurso todo este proceso.

No se trata tan sólo de despliegue continuo, como muchos erróneamente asumen, ya que puedes estar desplegando basura y aun así hacerlo de forma automatizada continuamente. Tampoco se trata sólo de pruebas automatizadas continuas. Se trata de esas prácticas pero también de otras muchas más; todas aquellas que necesitemos para poder afirmar con garantías que un cambio está listo para que el usuario lo utilice y pueda sacar provecho de su uso.

Por supuesto, para que esto tenga éxito es necesaria una colaboración estrecha entre todos los involucrados, y un entorno en el que se han eliminado barreras y silos departamentales. Algo muy relacionado con lo que movimientos como DevOps se están ocupando de promulgar.

¿Qué ganamos con todo esto?

Si nos guiamos por las cifras que manejan las empresas que trabajan así, podríamos decir directamente que mucho. Pero para evitar caer en la falacia de “correlación implica causalidad”, habría que concretar más, y enfocarnos en el contexto del proceso de desarrollo de software. Lo que nos encontramos entonces es:

Un proceso de entrega transparente y predecible. Para todo cambio siempre pasamos por la misma secuencia de pasos, y éstos están automatizados en la medida de lo posible. No hay sorpresas.

Menos defectos en producción. Los defectos aparecen y son abordados en fases más tempranas, incluso de forma automática. El proceso de entrega estandarizado evita que ninguno de estos defectos pueda acabar en producción por un malentendido o por formas distintas de hacer las cosas. Además nos aporta trazabilidad sobre el origen de estos problemas.
Flexibilidad para asumir cambios. Los cambios se abordan en trozos más pequeños y manejables, se implementan y se entregan lo antes posible.
Información más inmediata y útil acerca de los cambios, incluso en el propio entorno de producción: si están funcionando sin problemas, cómo son acogidos por los usuarios o cómo afectan al negocio.
Menos tiempo empleado para desplegar y liberar en producción, ya que tenemos todo automatizado al máximo.
Equipos motivados por la confianza depositada en ellos y la sensación continuada de estar contribuyendo con incrementos de valor tangibles.

Todo esto suena muy bien, pero no es para mí

La adopción del tipo de prácticas que propone la Entrega Continua puede ser interesante incluso si no necesitamos o si no queremos salir a producción de forma tan frecuente. Los beneficios enumerados en general serán los mismos, por lo que cualquier equipo con afán de mejorar podría plantearse seguir este modo de trabajo.

Hay casos muy especiales en los que la Entrega Continua podría no ser la mejor opción, o incluso sea contraproducente. Yo la verdad es que me he encontrado bien pocos. La mayoría de las ocasiones suelen ser escenarios en los que el esfuerzo de adoptar todas estas prácticas no va a justificar los resultados obtenidos: sistemas legados, tecnologías obsoletas, falta de herramientas adecuadas para montar el entorno necesario, diseños que no favorecen la automatización o las pruebas, etc.

Pero el verdadero problema, que además desafortunadamente suele aparecer con frecuencia, viene cuando el propio equipo o la organización no adoptan una actitud abierta al cambio y a posibles mejoras. Podríamos decir que en el fondo, de modo inconsciente, no quieren un proceso de entrega transparente, no necesitan menos defectos en producción o no buscan flexibilidad ante cambios. De cara al exterior esto se manifiesta como la decisión de que no quieren invertir en las mejoras necesarias. El ejemplo más común es el típico argumento del tipo “nuestro caso es muy singular”, “nuestro sistema es muy complejo”, “el negocio en el que nos movemos es muy delicado”, “nuestros usuarios son muy especiales”, “mi jefe nunca me dejaría”, “mi mamá no me deja” o “inserta tu excusa preferida aquí”. Es especialmente frecuente el de “no podemos permitirnos invertir en eso”, cuando en realidad, como veremos en un momento, lo que seguramente no te puedes permitir es dejar de invertir.

¿Es tu sistema más complejo que Team Foundation Service?

¿Te mueves en un negocio más complejo que Amazon?

¿Son tus usuarios más exigentes y con demandas más variadas que los de Facebook?

¿Tienes que servir más volumen de información que NetFlix?

Si estás entre la gran mayoría de los que responderían negativamente a todas esas preguntas, es muy probable que simplemente te sientas perezoso ante la perspectiva de la transición al modelo de Entrega Continua. En ese caso, mi recomendación es que tengas cuidado, porque a tu organización puede esperarle el mismo destino que al dodo o al tilacino.

Selección natural

A todo esto, nos habíamos dejado a Darwin a bordo del Beagle, navegando por los siete mares en busca de especies singulares. Al final de su viaje estaba perplejo con la variedad de fauna y fósiles que había encontrado, y comenzó una investigación que le llevó a enunciar la teoría de la selección natural en su obra “El origen de las especies”.

No es la especie más fuerte la que sobrevive, ni la más inteligente. Es la que se adapta mejor al cambio.

En la larga historia de la humanidad (y de los animales, también), aquellos que han aprendido a colaborar y a improvisar son los que han prevalecido de forma más efectiva.

Charles Darwin, naturalista inglés

La selección natural nos dice que los miembros de una población con características mejor adaptadas a su entorno, son los que sobreviven con mayor probabilidad. ¿Qué ocurre con los demás? Pues que tarde o temprano acaban desapareciendo.

Es una ley que se aplica a organismos vivos, y si nos paramos a pensar un poco, ¿qué es una organización sino un gran organismo vivo? Por supuesto que la selección natural se aplica a empresas y organizaciones, como cualquier lista de compañías extintas se encarga de demostrarnos.

En el entorno en constante cambio en el que se mueven la mayoría de los negocios, ya no sirve con tener productos buenos, bonitos y baratos. Hay que tenerlos antes, y hacer que evolucionen rápidamente según las demandas de los usuarios. Ya no basta con mantener métricas como la velocidad del equipo o la tasa de defectos. Las métricas que están marcando la diferencia entre los que triunfan y los que se quedan en el camino son otras:

Tiempo de ciclo (cycle time): el tiempo que transcurre desde que empiezo a trabajar en una funcionalidad, hasta que la tengo en producción.
Tiempo medio entre fallos (MTTF, mean time to failure): lo que tarda mi sistema de media en tener una caída o un corte de servicio.
Tiempo medio de recuperación (MTTR, mean time to recover): lo que tardo de media en poner en marcha mi sistema después de una caída.

La selección natural favorecerá a aquellos que sean capaces de mantener estos tiempos en valores lo más pequeños posibles, y es precisamente una de los aspectos en los que la Entrega Continua puede ayudarnos mejor.

Vale, ¡no quiero extinguirme! ¿Por dónde empiezo?

A lo largo de este artículo nos hemos centrado en ver los beneficios de la Entrega Continua y qué puede ocurrir si la ignoramos. Pero no hemos entrado mucho en ver cómo llevarla a la práctica, y dado el gran número de patrones y prácticas a tener en cuenta puede ser un proceso que diste de ser trivial.

Sin lugar a dudas hay una parte del proceso que es cultural, y en la que tendremos que trabajar dentro de nuestra organización para eliminar barreras y silos y mejorar la colaboración en la medida de lo posible.

Para ir guiándonos en los pasos necesarios hay muchos recursos disponibles en los que encontrar ayuda, pero sin lugar a dudas el mejor y más completo es el excelente libro de Jez Humble y David Farley, Continuous Delivery.

Si tu entorno está basado en tecnologías Microsoft, afortunadamente tenemos disponibles herramientas magníficas que pueden dar soporte a la mayoría de las prácticas que hemos mencionado. Visual Studio y Team Foundation Server, y otras herramientas relacionadas, van a servirnos para implementar toda la “pipeline” de Entrega Continua, desde la automatización de todo tipo de pruebas y despliegues, hasta temas más concretos como el análisis estático de código o la integración continua. Es verdad que dichas herramientas necesitan personalización y ajustes para adaptarlas a la forma de trabajar que estamos proponiendo, pero desde Plain Concepts podemos ayudaros a preparar el entorno que mejor se ajuste a vuestro proyecto; es algo que ya hemos hecho para montones de organizaciones, y parece que aún no se ha extinguido ninguna de ellas.

Para hacerte una idea de cómo puede personalizarse Team Foundation Server para dar soporte a la Entrega Continua, también puedes echar un vistazo a mi presentación sobre este tema en el ALM Summit 3.

Y si te puedes permitir esperar, en pocos meses estará disponible un libro en el que estoy colaborando con el equipo de Microsoft Patterns & Practices, y en el que contaremos con todo lujo de detalles qué se necesita para poner en práctica todas estas ideas de forma efectiva. ¡Más noticias acerca de esto en breve!

ALM Summit 3 – Setting up a Continuous Delivery Deployment Pipeline with TFS from Jose Luis Soria Teruel