Cloud AI services
Explore core cloud services to enhance your applications with advanced AI capabilities
Amazon Web Services
Amazon Bedrock
Amazon Bedrock is a fully managed AWS service that provides access to a broad range of foundation models (FMs) from leading AI companies through a single API. It supports models from third-party providers (such as Anthropic, Meta, Mistral AI etc.) as well as Amazon’s own Titan models. Developers can use Bedrock to build generative AI applications like chatbots, virtual assistants, search, and content generation with enterprise-grade security and privacy built in.
The service offers flexible model customization. You can perform distillation or fine-tune certain models on your own data to improve task-specific performance or adapt the model’s tone to your brand. Bedrock also introduced Custom Model Import which lets you bring your own fine-tuned models (e.g. Llama
or Mistral
) from SageMaker
, EC2
, or on-premises and deploy them on Bedrock’s serverless infrastructure. This means developers can leverage open-source model weights they have customized, while letting AWS handle scaling and serving.
Bedrock includes integrated tools for RAG to ground model outputs on your enterprise data, and it can orchestrate multi-step workflows via Bedrock Agents. With Bedrock Agents, developers can set up agents that break down user requests, call other AWS services or APIs, and use the reasoning capabilities of FMs to fulfill complex tasks. These agents can maintain memory across interactions and use guardrails for safety, and even coordinate multiple specialized sub-agents for sophisticated use cases. Amazon Bedrock is accessed via API/SDK and integrates with other AWS services (such as data stores and security controls), offering a flexible yet managed way to incorporate cutting-edge generative AI into applications without managing underlying model infrastructure.
You can find a lot of information about Bedrock in this official playlist.
Amazon SageMaker
Amazon SageMaker is a fully managed platform that enables developers and data scientists to build, train, and deploy machine learning models at scale on AWS. It covers the end-to-end ML workflow: from data preparation and labeling, through model training and hyperparameter tuning, to deployment and monitoring of ML models.
SageMaker provides a hosted Jupyter notebook environment (SageMaker Studio) for development and integrates with popular frameworks like TensorFlow
, PyTorch
, and Hugging Face
for custom model development. Developers can bring open-source models or custom architectures and train them on dedicated GPU instances or distributed clusters managed by SageMaker.
You can fine-tune pre-trained models (including large transformers) on SageMaker using your data, leveraging deep learning optimizations (e.g. AWS Deep Learning Containers with CUDA, DeepSpeed, etc.). It supports MLOps best practices via SageMaker Pipelines for CI/CD, a Model Registry for versioning, and SageMaker Model Monitor to track data drift and model quality in production. With integration to other AWS services (like S3 for data storage, AWS Glue for data prep, CloudWatch for logging, and IAM for security), SageMaker provides a robust, enterprise-ready environment for developing AI models including LLMs.
Amazon Lex
Amazon Lex is an AWS service for building conversational interfaces (chatbots and voicebots) using natural language voice and text. It provides the same deep learning-powered automatic speech recognition (ASR) and natural language understanding (NLU) technology that powers Amazon Alexa, made available for any developer.
Amazon Lex manages the dialogue flow, including multi-turn conversations and slot filling, based on the structure the developer defined, without the developer needing expertise in NLP or ML. It has a graphical console and also APIs/SDKs for integrating bots into applications.
For voice interactions, Lex can integrate with Amazon Polly (for text-to-speech responses) and perform speech recognition on audio input, enabling phone or voice-assistant style bots.
Recently, Amazon Lex has evolved to leverage generative AI for unanswered queries: Lex V2 introduced an AMAZON.QnAIntent that, when a user's question doesn’t match any predefined intent, can invoke a foundation model via Amazon Bedrock to find answers from a knowledge base. Using this, Lex can perform a fallback to search FAQs or company documents (through Amazon Kendra or OpenSearch indices) and then have a model like Anthropic Claude generate a helpful answer.
Amazon Q
Amazon Q is a generative AI-powered assistant offered by AWS that is designed to enhance productivity for both developers and business users across various AWS and enterprise applications. It essentially serves as an AI copilot: for developers, Amazon Q Developer acts as a coding and cloud operations assistant integrated deeply with AWS tooling, and for non-technical users, Amazon Q Business functions as an enterprise knowledge assistant that can securely answer questions and automate tasks using company data.
Others services
Amazon also offers a range of other AI and ML services that can be used in conjunction with the above:
- Amazon Transcribe: A service for automatic speech recognition (ASR) that converts audio to text, enabling voice interfaces and transcription of meetings or media.
- Amazon Textrackt: A service for extracting text and structured data from documents, such as PDFs or images, using OCR (Optical Character Recognition).
- Amazon Comprehend: A natural language processing (NLP) service that can analyze text for sentiment, entities, key phrases, and language detection.
- Amazon Rekognition: A computer vision service that can analyze images and videos for object detection, facial analysis, text in images, and more.
- Amazon Translate: A neural machine translation service that provides real-time translation between languages.
Azure
Azure AI Foundry
Azure AI Foundry is Microsoft’s unified platform for designing, customizing, and managing AI applications and agents across the enterprise. It brings together a suite of powerful AI services, tools, and model catalogs under one roof to streamline the development of generative AI and large-scale machine learning solutions.
With Azure AI Foundry, developers and data scientists can access frontier AI models (including both Microsoft’s own models and those from OpenAI and other partners) as well as popular open-source models, and use them as building blocks in their applications. The platform provides an integrated web interface along with SDKs, enabling users to experiment with prompts, fine-tune models, orchestrate workflows, and deploy AI-powered apps in a collaborative environment.
Key capabilities of Foundry include the Model Catalog (Foundry Models) that offers hundreds of pre-trained models for various modalities (text, vision, code, etc.), an Agent development environment for creating AI agents that can utilize tools and enterprise data, and deep integration with Azure’s cloud infrastructure for deployment and security.
Foundry emphasizes a pro-code approach for complex scenarios, complementing Microsoft’s low-code AI offerings: it integrates with Visual Studio, GitHub, and CI/CD pipelines to fit naturally into a developer’s workflow. Users can bring their own data to customize models – whether via fine-tuning a model on proprietary data or via retrieval augmentation – all while keeping data and model artifacts within Azure’s secure environment.
Azure AI Foundry also ensures that AI solutions can meet enterprise requirements: it provides features for responsible AI (tools to evaluate model fairness and explainability), compliance logging, and the ability to deploy models in isolated networks or on-prem via Azure Arc. Another aspect of Foundry is Azure AI Foundry Labs, which showcases cutting-edge AI research projects (like new model prototypes or algorithms) that advanced users can experiment with.
Azure AI Foundry Agent service
As AI agents is a popular topic now, let's talk about it a bit more. The Azure AI Foundry Agent Service is a platform for building, orchestrating, and deploying AI agents. It is part of the Azure AI Foundry suite, focused on the “agentic” side of AI applications.
Foundry Agent Service enables developers to securely design and scale agents that leverage the rich model ecosystem of Foundry along with enterprise data and APIs. An agent in this context can be something like a customer service assistant that not only answers questions but can look up order information and initiate refunds, or a marketing agent that can generate content and then schedule a post via an API.
The Agent Service works by connecting three main elements:
- Models: it integrates with the Foundry model catalog, so an agent can use powerful foundation models (GPT-4, etc.) for its reasoning and language generation;
- Knowledge sources: it can be grounded with enterprise knowledge via connectors to data like internal wikis, SharePoint, databases, Bing web search, or Azure Cognitive Search index;
- Action connectors: it has a library of 1,400+ actions through Azure Logic Apps connectors, allowing the agent to perform operations like sending an email, creating a ticket in
ServiceNow
, updating a CRM record, or calling a custom API.
Developers can compose agents that use these building blocks using the Foundry Agent SDK or through the Foundry Studio interface. Foundry Agent Service provides an orchestration layer that manages how agents plan and execute tasks.
Agents built here have a planning capability – meaning they can break down user requests into subtasks, decide which tools or data to use at each step, and carry out sequences of actions. They also maintain memory of past interactions, enabling multi-turn conversations and context carry-over. The service supports multi-agent scenarios as well.
Azure OpenAI service
Azure OpenAI Service is a managed AI platform that provides access to OpenAI’s advanced language and vision models on Azure’s cloud, combining the power of models like GPT-4 with the enterprise-grade security, compliance, and regional availability of Azure.
In essence, it’s OpenAI API’s capabilities delivered as an Azure service. With Azure OpenAI, developers can use state-of-the-art GPT-family models, Codex models for code completion, and DALL-E models for image generation through REST API or SDK, but the difference is the infrastructure is Azure’s – the models are hosted in Azure data centers and your data never leaves Azure.
This arrangement is particularly appealing to enterprises that require strict data privacy: prompts and outputs are not sent to OpenAI’s servers and are not used to improve OpenAI’s models. Microsoft ensures that your inputs and fine-tuning data are kept confidential to your instance; they explicitly state that customer data is not seen or used by OpenAI, and the service operates in isolation from the public ChatGPT/OpenAI systems.
Azure OpenAI requires customers to apply for access (to ensure responsible use), and once approved, you can deploy models to your Azure resource just like you would spin up a database or VM. You can choose the region and even the capacity (there are options like picking model series and scaling units).
One big feature is fine-tuning: Azure OpenAI allows fine-tuning certain models with your own training data. This means you can train a custom version of, say, GPT-4 on your proprietary dataset (under strict controls) to better suit your application’s vocabulary or style. The fine-tuned model is then available as a private model endpoint in your Azure OpenAI instance, exclusively for your use.
Others services
Azure also offers a range of other services ready to be to used, when building AI applications:
- Content Understanding: A service that helps enterprises transform unstructured multimodal data into insights.
- AI Search: A service that provides AI-powered search capabilities, including semantic search and vector search, to enhance search experiences (like RAG) in applications.
- Content Safety: Monitor text and images to detect offensive or inappropriate content.
- Translator: Translate text instantly or in batches across more than 100 languages.
- Speech: Convert text to speech and vice versa, with support for multiple languages and voices.
- Vision: Analyze images and videos for object detection, OCR, facial recognition, and more.
- Language: Build conversational interfaces, summarize documents, and analyze text using prebuilt AI-powered features.
- Document Intelligence: A service that applies advanced machine learning to extract text, key-value pairs, tables, and structures from documents.
- Custom Vision: An image recognition service that lets you build, deploy, and improve your own image identifier models.
- Face service: AI algorithms that detect, recognize, and analyze human faces in images.
- Video Indexer: Service that enables you to extract the insights from your videos using Azure AI Video Indexer video and audio models.
Google Cloud Platform (GCP)
Vertex AI
Vertex AI is Google Cloud’s unified platform for developing and deploying machine learning models, including training custom models and leveraging Google’s latest generative AI offerings. It brings together the tools needed by data scientists, ML engineers, and developers under a single, collaborative environment – covering data preparation, model training, hyperparameter tuning, model evaluation, deployment, and ongoing monitoring.
For traditional ML, Vertex AI includes capabilities like AutoML (which lets you train high-quality models on your dataset with no code, for vision, text, tabular data, etc.) and Custom Training (where you can run your own training code on managed compute resources, with support for distributed training).
Once models are trained, they can be registered in the Model Registry and then deployed to endpoints for serving predictions with a few clicks or API calls.
For LLMs and generative AI, Vertex AI offers Model Garden – a catalog of both Google’s foundation models and select open-source models, which developers can discover, test, and tune within the platform. Google’s own models are accessible via the Vertex AI API.
Importantly, Vertex AI allows you to fine-tune Google’s models on your data.
From a developer’s view, Vertex AI offers multiple interfaces: Vertex AI SDK (Python), a Web Console (Vertex AI Studio) for visual workflow and experimentation, and support for deploying via Terraform or gcloud CLI for infrastructure-as-code automation.
Vertex AI Agent Builder
Vertex AI Agent Builder is Google Cloud’s framework and toolkit for creating multi-agent systems and AI agents that can perform complex tasks through collaboration and tool use. It enables developers to design agents (or a network of agents) that can operate with a degree of autonomy — processing user requests, breaking them into steps, calling external tools/APIs, and managing dialogue.
At its core, Agent Builder provides an Agent Development Kit (also called ADK
), which is a programming framework (currently Python only) where you can define agents with their behaviors in under 100 lines of code for many cases.
Vertex AI Agent Builder is designed to be open and interoperable. It supports connecting agents built on different frameworks via the Agent2Agent
(A2A
) protocol. A2A is essentially a communication standard that lets agents talk to each other irrespective of whether one is built with Google’s ADK and another with, say, LangChain or a third-party system.
Cloud Run for AI
Cloud Run for AI refers to Google Cloud’s usage of Cloud Run (a fully managed serverless container platform) enhanced with the capability to run AI inference workloads on GPUs. Cloud Run traditionally allows you to deploy any stateless container and have it scale automatically with incoming requests. With the introduction of GPU support in Cloud Run, developers can now deploy containers that use NVIDIA GPUs to serve machine learning models and other GPU-accelerated workloads.
For AI and LLM scenarios, Cloud Run for AI is a great fit for deploying custom models, especially open-source ones or ones you fine-tuned yourself (you can use Ollama for that), encapsulated in a lightweight serving runtime. You can see it in action in this DeepSeek video.
Others services
Some other Google Cloud services that can be used in AI applications include:
- Natural Language AI: Derive insights from unstructured text using Google machine learning.
- Speech-to-Text: Accurately convert speech into text using an API powered by Google's AI technologies.
- Text-to-Speech: Convert text into natural-sounding speech using a Google AI powered API.
- Translation AI: Make your content and apps multilingual with fast, dynamic machine translation.
- Vision AI: Derive insights from your images in the cloud or at the edge with AutoML Vision or use pre-trained Vision API models to detect objects, understand text, and more.
- Video AI: Enable powerful content discovery and engaging video experiences.
- Document AI: Document AI includes pre-trained models for data extraction, Document AI Workbench to create new custom models or uptrain existing ones, and Document AI Warehouse to search and store documents.