Amazon AppFlow
A service that builds data flows between SaaS applications (Salesforce, Slack, SAP, etc.) and AWS services
Overview
Amazon AppFlow is a fully managed integration service that securely transfers data between SaaS applications and AWS services. It comes with over 50 built-in connectors for Salesforce, SAP, Slack, Google Analytics, ServiceNow, and more, enabling you to build data flows without writing code. You can apply transformation operations such as filtering, mapping, masking, and validation to data in transit, making it useful as a preprocessing step for ETL pipelines. With PrivateLink support for private connectivity, data integration with SaaS applications can be achieved without traversing the public internet.
Flow Design and Data Transformation Mapping
An AppFlow flow consists of three elements: a source (where data is fetched from), a destination (where data is sent to), and transformation rules. You can apply field mapping (renaming, type conversion), filtering (transferring only records matching specific conditions), masking (redacting personally identifiable information), and validation (range checks on values) to records retrieved from the source. A typical use case is exporting Salesforce opportunity data to S3 in Parquet format for analysis with Athena. The field mapping feature can flatten nested JSON structures from the source into flat columns, converting complex SaaS data models into formats suitable for a data lake. There is no upper limit on the number of records transferred per flow execution, but you need to be aware of source-side constraints such as Salesforce API bulk query limits (daily API call quotas). When S3 is selected as the destination, you can specify file split sizes and partition structures (date-based prefixes) to maintain consistency with downstream analytics infrastructure.
Connection Profiles and OAuth Authentication Management
AppFlow connection profiles are a mechanism for securely managing authentication credentials for SaaS applications. For OAuth 2.0-based connectors (Salesforce, Slack, Google Analytics, etc.), you complete the authorization flow in a browser during the initial connection, and the refresh token is automatically stored in AWS Secrets Manager. Token renewal is handled automatically by AppFlow, so operators do not need to manually manage token expiration. However, if a password change or application reauthorization occurs on the SaaS side, the connection profile must be re-authenticated. For connectors that use Basic authentication or API keys, such as SAP and ServiceNow, credentials are stored in Secrets Manager and referenced from the connection profile. Using PrivateLink-enabled connectors (Salesforce, Snowflake, etc.) keeps data transfers entirely within the VPC's private network without traversing the internet. Connection profiles can be shared across multiple flows, allowing a single connection profile to be reused across multiple flows that retrieve different objects from the same Salesforce organization.
Scheduled Execution and Event-Driven Triggers
AppFlow offers three trigger modes for flow execution: on-demand, scheduled, and event-driven. Scheduled execution can be configured at intervals as short as 1 minute and supports incremental transfers (fetching only records changed since the last execution). Incremental transfers operate based on timestamp fields on the source side, automatically detecting Salesforce's SystemModstamp or LastModifiedDate. Event-driven triggers integrate with Salesforce Change Data Capture (CDC) and Platform Events, detecting record creation, updates, and deletions in real time to trigger flows. This enables near-real-time data synchronization. Flow execution results are automatically sent to EventBridge, allowing you to build downstream processing (invoking Lambda functions, sending SNS notifications, starting Step Functions workflows) based on success or failure. Pricing is usage-based, charged per number of records processed per flow execution, at approximately $0.001 per 1,000 records. For large initial data loads, costs can add up, so filtering down to only the necessary records is the fundamental cost optimization strategy.