Cloud Solutions Software Architect Certification: A Practical Guide
Becoming a certified cloud solutions architect is simpler than you might think. Despite the differences among cloud providers, their core architectures share many similarities, and the services required for certification follow a consistent pattern. In this guide, I’ll walk you through the process and highlight the benefits of mastering these key services in a vendor-agnostic manner as you work toward becoming a certified cloud solutions software architect.
On this journey, we will cover the key categories that every cloud service addresses: Compute, Networking, Storage, Monitoring, and Security.
The cloud:
Public clouds are essentially collections of servers distributed across various regions around the world, allowing customers to leverage hardware infrastructure without owning it. In contrast, private clouds—often referred to as on-premises solutions—are maintained and managed by the company itself. The hybrid model combines the benefits of both approaches.
The most significant being the ability to scale easily, faster deployment speeds, and reduced capital expenditures (CAPEX). Building your own infrastructure requires a high upfront investment in hardware, whereas cloud services convert these expenses into operational costs (OPEX).
Clouds allow you to focus on managing operational expenditures (OPEX) while the provider handles the underlying infrastructure. This model supports both vertical scaling (scale-in), which involves enhancing the capacity of existing resources, and horizontal scaling (scale-out), which involves adding more servers to distribute workloads.
Note that I will discuss strategies and services in a vendor-agnostic manner. Although providers may use different names or require additional services to achieve similar goals, the principles outlined here apply across AWS, Azure, GCP, and others.
Public cloud providers operate data centers in multiple regions around the world. A region typically comprises multiple isolated data centers, known as availability zones, which enhance fault tolerance by ensuring that applications remain operational even if one zone fails. Within each availability zone, fault domains (often individual racks or groups of hardware) help further isolate and mitigate risks. Additionally, edge locations are used to cache data closer to users, thereby reducing latency for global applications.
Cache examples: AWS ElastiCache, Azure Cache for Redis, GCP Memorystore
Before deploying your application, it’s crucial to understand your responsibilities when using a public cloud. The provider manages the physical infrastructure, but you must comply with service-specific regulations and local laws in your chosen region. This division is outlined in the shared responsibility model, which clearly defines what is managed by the cloud provider and what remains your responsibility.
To achieve this, cloud service models determine the level of control and responsibility you have over your environment.
IaaS (Infrastructure as a Service):
As the name suggests, in an IaaS model, the cloud provider manages the physical infrastructure (servers, networking, etc.), while you are responsible for the operating system, middleware, applications, data, security patches, and more. A common example is the use of virtual machines (VMs).
IaaS is often appealing for companies with existing on-premises infrastructure, as they are already familiar with managing their own systems. This familiarity allows for a smoother transition to a hybrid model. However, the main trade-off is that you must manage the entire software stack yourself, including the operating system, middleware, and applications.
IaaS Examples: AWS Elastic Compute Cloud (EC2), Azure Virtual Machines, GCP Compute Engine
PaaS (Platform as a Service):
PaaS builds on IaaS by managing the infrastructure, operating system, and runtime environment, allowing you to focus solely on your application. This model is especially useful for modern applications that need to be deployed quickly.
PaaS offers a streamlined experience by managing the infrastructure, operating system, and runtime environment, allowing you to focus solely on your application. However, this convenience comes with trade-offs. Since the provider manages the operating system and runtime, only certain versions are supported for security and stability reasons. For example, if your application relies on Node.js v12, there’s no guarantee that this runtime will be available, which might require you to update your application or find an alternative solution.
PaaS Examples: AWS Elastic Beanstalk, Azure App Service, Cloud Services, GCP App Engine
SaaS (Software as a Service):
In the SaaS model, the cloud provider manages everything—including the application—allowing you to use a ready-made solution. The trade-offs include limited customization, potential cost implications, and the risk of vendor lock-in, where you become dependent on a single provider.
SaaS examples: Datadog, Salesforce, ServiceNow, Slack
Serverless:
Finally, in the serverless model, servers still exist but are fully abstracted away, meaning you don’t manage the underlying infrastructure. The most common example is FaaS (Function as a Service), where you write and deploy individual functions that run independently. Additional variations include CaaS (Container as a Service), BaaS (Backend as a Service), and XaaS (Anything as a Service), all of which further extend the service-based model.
FaaS examples: AWS Lambda, Azure Functions, GCP Cloud Functions
CaaS examples: AWS Elastic Kubernetes Service (EKS), AWS Elastic Container Service (ECS), Azure Kubernetes Service (AKS), Azure Container Apps, GCP Kubernetes Engine
With this foundation in place, let’s now explore how to apply this knowledge to build a highly available architecture—a key focus of solution architect certification exams.
To cover the essential components of web solutions in cloud certifications, let’s begin at the very start—with a simple request.
Domain Name System:
Accessing servers around the world is complex. DNS servers simplify this process by ensuring that your domain (e.g., www.example.com) is reachable when users send requests. Additionally, DNS can route traffic to different servers based on factors such as latency, geographical proximity, and health checks.
DNS examples: AWS Route 53, Azure DNS, GCP Cloud DNS
Load balancing:
To achieve high availability—ensuring that your application remains online to handle incoming requests—multiple servers are deployed in various locations. To create a single entry point, gateways (often implemented as load balancers) are used.
Layer 7 Load Balancers, operate at the application layer of the OSI model. They distribute requests across servers and offer advanced features like request inspection, web application firewall (WAF) integration, and other manipulations, effectively acting as a reverse proxy.
Cloud providers also offer Layer 4 load balancers, which operate at the transport layer (supporting protocols such as TCP and UDP). While these provide faster distribution due to their lower-level processing, they do not offer the same payload manipulation capabilities as Layer 7 load balancers.
API Management examples: AWS API Gateway, Azure API Management, GCP Apigee API Management
Load Balancing examples: AWS Elastic Load Balancing (ELB), Azure Application Gateway, GCP Cloud Load Balancing
Servers behind the gateway also require a dedicated network layer for internal communication. Cloud providers typically offer an isolated virtual network where you can create additional networks for tasks like network peering and establishing private endpoints for specific services. Additionally, you can configure a NAT gateway to allow outbound internet access, and set inbound and outbound rules to control traffic to and from each IP address.
Within our network, various services—such as virtual machines (VMs) and serverless functions—operate. To ensure these services can handle increased load, you must consider your scaling strategy:
Vertical Scaling: Increasing the capacity of an individual server (e.g., adding more memory or CPU).
Horizontal Scaling: Adding more instances to distribute the load, with the load balancer routing traffic accordingly.
In contrast, serverless functions inherently scale out-of-the-box, automatically handling increased demand.
Functions should be simple, fast, and stateless, and are typically triggered by specific events. There are also solutions available for building more complex, stateful workflows, enabling the development of comprehensive serverless applications.
Storage:
But what is an application without data? Servers typically come with attached disk volumes, which can be configured in various ways depending on your needs. Each cloud provider has its own naming conventions and options for these storage solutions. In addition to local volumes, shared volumes are available for internal networks. Always ensure you choose the appropriate file system type, especially for Windows-based solutions.
File storage examples: AWS Elastic File System (EFS), AWS FSx, Azure Files, GCP Filestore
Block storage examples: AWS Elastic Block Storage, Azure Disk Storage, GCP Persistent Disk
For virtually unlimited storage, object storage is used to save almost any type of data. Many cloud providers also allow you to host static websites directly from object storage. Typically, object storage offers three tiers:
Hot Storage: Optimized for frequently accessed data.
Cold Storage: Designed for infrequently accessed data.
Archive Storage: Intended for long-term retention and compliance purposes. Files in the archive tier may require a restoration (or "hydration") process before they can be accessed.
Object storage examples: AWS Simple Storage Service (S3), Azure Blob Storage, GCP Cloud Storage Buckets
Object storage latency can be reduced through the use of a Content Delivery Network (CDN). A CDN consists of geographically distributed caching servers that store copies of your content. When an end user requests the content, the response is delivered from a nearby location, significantly improving performance.
CDN examples: AWS CloudFront, Azure Content Delivery Network, GCP Cloud CDN
Databases:
In addition, cloud providers offer a wide range of database options, including:
Relational Databases: Traditional SQL databases that are ideal for managing relational and normalized data.
Examples: AWS Relational Database Service (RDS), AWS Aurora, Azure SQL Database, GCP Cloud SQL
Document Databases: Designed to store structured or semi-structured data, typically in JSON or BSON formats.
Examples: AWS DocumentDB, Azure Cosmos DB, GCP Firestore
Graph Databases: Optimized for storing data along with its relationships, making it easier to query complex networks of interconnected records. Examples: AWS Neptune
Key-Value Databases: Utilize a simple key-value format, commonly used for caching and quick data retrieval.
Examples: AWS Dynamo, Azure Table storage, GCP Cloud Bigtable
Wide-Column Databases: Tabular databases optimized for storing and querying large volumes of data by column rather than by row.
Examples: AWS Keyspaces
Time-Series Databases: Specialized for storing event data and tracking values over time, ideal for monitoring and IoT applications.
Examples: AWS Timestream, Azure Time Series Insights
Additionally, many cloud providers offer indexing and search services that help users quickly locate information across various databases. These services often integrate with the underlying databases to provide fast, scalable search capabilities. Moreover, most cloud databases support strategies such as read replicas, sharding, and automatic partitioning to further enhance performance and scalability.
Text Search examples: AWS OpenSearch Service, Azure Elastic Cloud
Queues:
When system load increases dramatically, simply adding more resources might not be the best strategy—especially for large-scale systems or those that require strict control over processing order. In these cases, using queues can be an effective solution to handle events and maintain control over the processing sequence.
Queue Types in the Cloud:
Cloud-based queues typically appear in three forms:
Standard Queues: These provide robust storage and a centralized way to manage and process messages. They serve as a single entry point (hub) that can distribute tasks across multiple queues for load balancing and redundancy.
Examples: AWS Simple Queue Service (SQS), Azure Queue Storage, GCP Cloud Tasks
Brokered Queues: In this model, the queue acts as an intermediary by converting a push-based messaging approach into a pull-based one. This allows services to retrieve messages at their own pace, ensuring controlled processing.
Example: Amazon MQ, Amazon Simple Notification Service (SNS), Azure Service Bus
Priority or FIFO Queues (if applicable): For systems where processing order is critical, First-In-First-Out (FIFO) queues ensure that messages are processed in the exact order they are received.
Streaming Options:
In addition to traditional queues, cloud providers offer streaming services that are ideal for real-time applications. These services allow for continuous ingestion and processing of data streams, making them well-suited for Internet of Things (IoT) implementations and other applications where low-latency processing is crucial.
Together, queues and streaming solutions provide flexible options to handle varying loads and ensure that event processing remains orderly and efficient.
Observability and Monitoring
To ensure that your cloud solution operates smoothly, observability tools are critical. These tools provide real-time analytics, enabling you to monitor system performance, track application behavior, and manage alerts and events programmatically. In addition, big data platforms, data lakes, and data warehouses, allow you to persist large volumes of information for historical analysis and continuous improvement.
Monitoring examples: Amazon CloudWatch, Azure Monitor, Google Cloud Observability
Security
This is a must-have for any cloud solution. Effective access management—including user authentication, authorization, and permissions—is essential. Be sure to take advantage of the security features provided by your cloud provider, which often include identity management services and tools to monitor and enforce security policies.
Artificial Intelligence and Machine Learning
Finally, an increasingly important topic on cloud certification exams is artificial intelligence (AI) and machine learning (ML). Cloud providers now offer services that allow you to create and train your own models, as well as pre-built solutions for tasks such as chatbots, computer vision, natural language processing, and speech recognition. These services empower organizations to leverage AI/ML without the need for extensive in-house expertise.
Conclusion
This journey has provided an overview of the key services covered in solution architect certification exams. Our goal was to help you understand the cloud tools and strategies commonly employed in cloud-based applications—and how they are assessed in certification scenarios. Although there is much more to explore, this guide serves as a starting point for familiarizing yourself with the most common services and their use cases. Now it’s time to choose your provider, evaluate how their offerings align with these strategies, and begin your path to certification. Let’s get certified!