The Layered Architecture of AWS AI/ML Services - Flexibility Through the Three Tiers of SageMaker, Bedrock, and API Services

This article organizes AWS AI/ML services into three layers - SageMaker (full control), Bedrock (managed generative AI), and Rekognition/Comprehend/etc. (API-based) - and explains AWS's flexibility through comparisons with GCP Vertex AI and Azure OpenAI Service, including custom silicon integration.

約 7 分で読めます最終更新: 2025-09-09

Why AI/ML Services Need a 'Layered' Approach

AI/ML maturity varies significantly across organizations. Some have data scientists building models from scratch, while others only need to call pre-trained models via APIs. With the emergence of generative AI, the intermediate need to customize foundation models has also expanded rapidly. AWS addresses this diverse range of needs with a three-layer structure: SageMaker for full control, Bedrock for managed generative AI, and API-based services like Rekognition, Comprehend, and Transcribe. Each layer is independent yet interoperable, allowing organizations to start at the appropriate layer for their AI maturity and gradually progress to more advanced usage.

SageMaker - A Full-Control End-to-End ML Platform

SageMaker is a platform that covers the entire ML workflow. It provides a consistent environment for data preprocessing (Data Wrangler, Processing), model training (Training, HyperParameter Tuning), deployment (Endpoints, Serverless Inference), and monitoring (Model Monitor). SageMaker Studio serves as a browser-based IDE offering Jupyter notebooks, experiment management, model registry, and pipeline visualization. SageMaker's strength lies in its deep integration with AWS compute infrastructure. Training jobs can use GPU instances (P5, P4d) or AWS's custom Trainium chips, while inference can leverage Inferentia chips for cost-efficient inference. Built-in distributed training libraries (SageMaker Distributed Training) also streamline large-scale model training.

Bedrock - A Multi-Model Strategy for Managed Generative AI

Bedrock is a platform that provides foundation models for generative AI as a managed service. It offers models from multiple providers - including Anthropic's Claude, Meta's Llama, Stability AI's Stable Diffusion, and Amazon's own Nova - through a unified API. This "multi-model" approach is the key differentiator from Azure OpenAI Service. Azure OpenAI Service specializes in OpenAI models, and while the quality of GPT-4 and DALL-E is high, the choice of model providers is limited. With Bedrock, you can select the optimal model for each use case, avoiding lock-in to a specific provider. Knowledge Bases for RAG (Retrieval-Augmented Generation), Guardrails for content filtering, and fine-tuning for model customization are all provided as integrated Bedrock features.

API-Based Services - Embedding AI with Just Code

The third layer of AWS AI services consists of API-based services specialized for specific tasks. Over 10 services are available, including Rekognition (image and video analysis), Comprehend (natural language processing), Transcribe (speech recognition), Translate (translation), Polly (text-to-speech), Textract (document analysis), and Personalize (recommendations). These services require no ML expertise at all - you can embed AI capabilities into your applications simply by calling REST APIs. GCP also offers API-based services such as Vision AI, Natural Language AI, and Speech-to-Text, but AWS has a wider variety of services, with particularly deep optimization for specific use cases like Textract's form analysis and Personalize's real-time recommendations. API-based services have not lost their value since the emergence of generative AI, and there are many scenarios where they are more advantageous than general-purpose LLMs in terms of latency and cost.

Custom Silicon Integration - Inferentia and Trainium

Custom silicon is an essential part of the AWS AI/ML strategy. Inferentia for inference and Trainium for training are AI-specific chips designed in-house by AWS, offering cost-performance advantages compared to NVIDIA GPUs. Inferentia2 is reported to achieve up to 40% cost reduction for large language model inference compared to equivalent GPU instances. Trainium2 is optimized for distributed training of large models and can be used transparently from SageMaker or EKS. GCP's TPU (Tensor Processing Unit) also delivers high performance as an AI-specific chip, but TPUs are only available within the GCP cloud environment, limiting their usage patterns. Azure currently does not have its own AI chips and relies on NVIDIA GPUs. Having custom silicon options provides a long-term competitive advantage in cost optimization for AI workloads.

Usage Patterns for the Three-Layer Structure

AWS's three-layer AI/ML structure enables staged adoption aligned with organizational maturity. In the early stages of AI adoption, you can quickly prove value with API-based services; as generative AI usage advances, you can transition to customization with Bedrock; and when custom model development becomes necessary, you can introduce SageMaker. This provides a clear growth path. The three layers are not mutually exclusive and can be used together within the same application. For example, you could classify user inquiries with Comprehend, generate responses with a Bedrock LLM, and evaluate response quality with a custom model trained in SageMaker. For practical machine learning usage patterns, related books on Amazon are also a helpful reference.

Summary

AWS AI/ML services provide flexibility for organizations at any AI maturity level through the three-layer structure of SageMaker, Bedrock, and API-based services. Azure OpenAI Service led with access to OpenAI models, but Bedrock holds the advantage in model provider diversity. GCP's Vertex AI excels as an integrated platform, but AWS surpasses it in the variety and depth of API-based services. Furthermore, the existence of custom silicon in Inferentia and Trainium serves as a long-term differentiator for AI workload cost optimization. AI/ML adoption does not happen overnight - it deepens gradually. AWS, with its layered structure that supports this gradual growth, is a solid choice as a platform for AI strategy.

Amazon.com Is AWS's Biggest Customer - How Internal Dogfooding Drives Service QualityStarting from the fact that Amazon.com's e-commerce site, Prime Video, and Alexa all run on AWS, this article explores how internal dogfooding elevates service quality and how Prime Day's traffic demands have shaped AWS's architecture.AWS Data Analytics and Data Lakes - The Integrated Ecosystem of Athena, Glue, Lake Formation, and RedshiftExplore the integrated data analytics stack of AWS Athena, Glue, Lake Formation, Redshift, and QuickSight, comparing it with Azure Synapse Analytics and GCP BigQuery to highlight AWS's advantages in ecosystem integration.AWS Backward Compatibility and API Stability - The Trust Built by Never Retiring Published APIsExamine AWS's track record of never retiring published APIs, compare it with Azure's rebranding history and GCP's service discontinuation cases, and explain why API stability matters for enterprises.AWS Availability Zone Design - How Physical Separation and Fault Isolation Create a Reliability AdvantageExamine the design philosophy behind AWS AZs as physically independent data center clusters, compare them with Azure and GCP availability zones, and analyze the differences in fault isolation maturity through real-world incident examples.The Market Value of AWS Skills and the Salary Premium of CertificationsAnalyze the number of job postings requiring AWS skills, the salary premium for certification holders, and the impact on career paths, comparing with Azure and GCP to evaluate the return on investment of AWS certifications.AWS Technical Communities and Learning Resources - From re:Invent to JAWS-UGCompare the richness of AWS technical communities including re:Invent, AWS Summit, and JAWS-UG, along with localized documentation and training resources, against Azure and GCP to highlight AWS's learning ecosystem advantages.AWS Compliance - Over 143 Certifications from ISMAP to PCI DSS That Outpace the CompetitionExplore the breadth of AWS's 143+ compliance certifications, focusing on ISMAP, SOC, PCI DSS, and HIPAA, and compare the certification coverage with Azure and GCP.AWS Container Orchestration - The Freedom of Choice Offered by ECS, EKS, and FargateWe compare the three container orchestration options AWS provides - ECS, EKS, and Fargate - with Azure ACI/AKS and GCP Cloud Run/GKE, and explain the practical advantages of having a wide range of choices tailored to different workload characteristics.