Cloud computing has transformed the way businesses operate, allowing them to store and access data on a global scale. Organisations have traded the pain of managing hardware, and on-premises solutions for hosting their solutions on cloud providers such as Amazon Web Services (AWS), Google Cloud Provider (GCP) and Microsoft Azure Cloud (Azure). These providers are not just reliable but provide a host of managed solutions, cutting out what could be months of engineering effort. That being said, to remain competitive, businesses need to understand what challenges they may face as well as how to maximise the value of their cloud-based solutions. This is especially true as new features and tools get launched yearly.
Being successful at the cloud can involve many moving parts, but I believe it can be broken it down in to certain key topics:
- Choosing services: Picking services to solve business’ needs and choice of managed services versus open source software
- Handling scale: Designing your cloud-based architecture to be optimised and scale to the needs of your business’ objectives
- Observability: How do I not just monitor or collect logs of my system, but how do I understand why something went wrong?
- Security: How do I make sure sensitive data and internal tools and services are not compromised?
- Developer effectiveness: How do I automate and maintain the integrity of deployments to system, so my engineers can focus on building new features, rather than focusing on updating production?
Note: As AWS is the most popular cloud platform, our examples will be predominantly with that. But the same logic strategy is applicable to any cloud platform
Choosing services
AWS and Azure have over 200 services while GCP has well over 100. How does one begin to pick? Well there are multiple ways one can choose a series of services to deliver the same thing, but the final selection should be based on business needs. For example, you may want to launch a single page application written in React. One method, using AWS S3 and CloudFront, will ensure your website is delivered quickly to everyone across the globe. But let’s say your application gains more and more complicated UI features, making rendering time a pain for users, especially if their computers are slow. In that case, a better solution may be to host a React service with server side rendering on an AWS ECS cluster.
Additionally, one may need to know whether they should use a cloud provider’s managed service versus an open-source solution which you self host on the cloud. For example, using AWS API Gateway with Lambda contrasts with ECS hosting and API powered by a popular framework like Flask. Both solve the same problems, but come with different costs and benefits. For example, Lambda can be lightweight and cost efficient, but one now has to understand limitations, such as run time and memory, as well as other nuances such as cold starts, and RDS proxies to connect reliably to databases.
That all may come across as very dense information, but it is important to keep in mind that new services are added to the cloud every year, so one must keep up to date with the latest information. Doing so can help organisations adopt newer services that introduce better cost savings and better reliability.
Handling scale
Your organisation grows. There is a big difference in the 100 daily users you may have right now, and 100,000 you will have in the future. Similarly, there is a difference between transforming and moving megabytes of data versus moving it when it grows to gigabytes. One needs to pick tools that not only solve the needs of today, but also the needs of tomorrow. The key is to design your cloud architecture so the performance scales along with your needs along with the cost. Leveraging services and features like load balancing and auto scaling are popular ways to tackle this.
Observability
Setting up log collection and monitoring are often great sources of information to help you know when something goes wrong. But, this only gets you so far. As things get bigger, and more critical, you want to know not just if your system has failures, but also why it might be failing. This is called observability. Monitoring is reactive, whereas observability is proactive. This is especially important since in a large cloud architecture, you have multiple resources running and connected to one other. Popular tools like DataDog and Dynatrace, and even Cloud Native ones like AWS X-Ray, can help trace the origins and the reasons behind failures. Another important key here is best practices, such as having Slack and email notifications setup so you know immediately when a key service may be failing, and ideally why. Adopting observability as practice will help your organisation solve bugs quickly.
Security
There may be important data and services you wish not to be public. This means you need to make sure sensitive resources are not open on the public internet by restricting access policies, isolating them in private cloud networks, setting up IP whitelisting and black listing, or even adding Oauth and SSO authentication layers. On top of that you want to make sure that you scan any open-source dependencies for vulnerabilities before they get adopted in production. A great place to put these scans is in your CI/CD pipelines. How you choose to protect things will depend again on the services you decide to use and your business’ needs.
Developer effectiveness
The final part of your strategy is setting up your system in a way that makes it easier for engineers to focus on building new features rather than worrying about how to deploy them. There are integral concepts in the DevOps worlds that will help you do this. The first is infrastructure as code. This will make it easier for you to spin up new cloud resources and maintain multiple environments such as development, staging and production. The second is continuous integration and deployment. This enables engineers to have deployment of new features automated as soon as they are developed. Aside from these, another key aspect is picking the right cloud technologies that do not add friction in the development process.
Often we see organisations choosing services in a way that adds too much friction in the development process. A common one is using AWS Lambda over something like a simple cron job running in an EC2. Sure, one may be cheaper or easier on initial setup, but there are so many other challenges one may face as we covered previously, and if its set up improperly it will be hard/time consuming for engineers to test it. This ends up meaning that while some services may “cost less”, you will have to pay more in time and engineering hours, making things more expensive in the long run.
All these challenges should not scare you, but inspire you. Cloud computing is an essential part of modern business operations. It continues to evolve, with new technologies such as artificial intelligence, machine learning, and blockchain empowering companies to do more. Organisations must tackle the challenges of implementing cloud applications and improve their cloud performance, and when they do it properly, they will handsomely reap the benefits of this technology.