AWS Data Exchange
A data marketplace service for subscribing to third-party datasets via AWS Marketplace and ingesting them directly into S3 or Redshift
Overview
AWS Data Exchange is a data marketplace platform that lets you discover and subscribe to third-party datasets via AWS Marketplace, ingesting them directly into S3 buckets or Redshift clusters. Over 3,000 data products are available spanning financial market data, weather data, geospatial data, and healthcare data, with data acquisition and updates automatable via APIs and event-driven workflows. It also provides the capability to publish and sell data as a data provider.
Dataset and Revision Delivery Model
Data Exchange's data structure consists of three layers: datasets, revisions, and assets. A dataset is a logical unit of data (e.g., daily stock price data), a revision represents a point-in-time snapshot (e.g., March 2026), and assets are individual files within a revision (e.g., CSV, Parquet). When a provider publishes a new revision, subscribers automatically gain access to it. Four delivery types are available: S3 snapshots, API access via API Gateway, Redshift data shares, and Lake Formation tables - choose based on the nature of the data and usage patterns. S3 snapshots suit batch analytics, while API access is ideal for real-time queries. Integration with EventBridge lets you trigger a Lambda function when a new revision is published, automating a data ingestion pipeline.
Publishing Data as a Provider
Data Exchange supports not only data consumption but also publishing and selling data as a provider. The publishing flow starts with creating a dataset, uploading assets, and finalizing a revision. Next, you register it as a data product on AWS Marketplace, defining pricing (monthly subscription, annual, or free) and terms of use. Once it passes Marketplace review, all AWS users can discover and subscribe to the data product. Providers commit to an update frequency (daily, weekly, monthly) and publish new revisions on schedule. Revision publishing can be automated via API, typically integrated as the final step of a data pipeline. Subscription management lets you track who is accessing the data, and usage reports provide revenue visibility. A private data product feature allows publishing to specific AWS accounts only, commonly used for data sharing between group companies or partners.
Data Retrieval Patterns via S3, Redshift, and API
The most basic pattern for subscribers to retrieve data is an S3 export job. You specify a revision from a subscribed dataset and export assets to your own S3 bucket. Export jobs run asynchronously and emit an EventBridge event upon completion, enabling automatic triggering of downstream ETL processing. For Redshift data share type products, you can query data directly from the provider's Redshift cluster without copying the data. In analytics workloads, you can also run queries that join S3-exported data via Redshift Spectrum with Redshift data share data. API-type data products provide real-time data retrieval through API Gateway endpoints, suited for use cases requiring immediate access to the latest data such as real-time exchange rate or stock price feeds. Lake Formation type products are registered as tables in the Glue Data Catalog, queryable directly from Athena or EMR.