Amazon Kendra

A machine learning-powered enterprise search service that returns accurate answers to natural language questions

Overview

Amazon Kendra is an intelligent enterprise search service powered by machine learning. It ingests documents from diverse data sources such as S3, RDS, SharePoint, Confluence, and Salesforce, then extracts and returns pinpoint answers from within documents in response to natural language questions. Its greatest strength is the ability to return semantically relevant information that traditional keyword searches cannot find, dramatically improving search accuracy for internal knowledge bases and help desks.

Indexes and Data Source Connectors

A Kendra index is a logical container that stores the documents to be searched. When creating an index, you select an edition (Developer or Enterprise) and provision capacity based on document count and query frequency. The Developer edition supports up to 10,000 documents and 4,000 queries per day, making it suitable for evaluation purposes. Over 40 data source connectors are available, including S3, RDS, SharePoint Online, Confluence, ServiceNow, and Google Drive, with each connector automatically handling data source-specific authentication, incremental synchronization, and metadata extraction. Sync schedules can be set to on-demand or periodic execution (minimum 1-hour intervals), and incremental sync re-indexes only changed documents. The BatchPutDocument API lets you ingest data from custom systems as a custom data source. Documents can be tagged with custom attributes (department name, project name, confidentiality level, etc.) for use in faceted search and filtering.

Query Processing and Ranking Tuning

Kendra's query processing is built on semantic search powered by natural language understanding (NLU) models. It analyzes the intent behind a user's question and includes semantically related documents in search results, not just exact keyword matches. Search results are classified into three types: Suggested Answers extract the relevant passage from a document as a direct answer, Documents return highly relevant documents in their entirety, and FAQs match against pre-registered question-and-answer pairs. To improve ranking accuracy, use Relevance Tuning to adjust field-level weighting - for example, increasing the importance of the title field and boosting documents with more recent update dates. Additionally, submitting user click feedback via the SubmitFeedback API enables the machine learning model to continuously improve rankings. Query response times are typically under 1 second, maintaining practical speed even with large document sets.

Access Control and User Context

Kendra natively supports document-level access control lists (ACLs). Data source connectors automatically import permission information from the source system, returning only documents the user is authorized to view based on their group membership at query time. For SharePoint and Confluence connectors, the source system's ACLs are directly reflected in Kendra, eliminating the need for duplicate permission management. For custom data sources, you explicitly set per-user or per-group allow/deny permissions using Principal objects. User context is passed as a UserContext parameter at query time, containing user ID and group information. Integration with IAM Identity Center enables automatic retrieval of user information from SSO tokens. An important operational consideration is that ACL synchronization depends on the data source sync schedule, so permission changes may not be reflected immediately. For highly sensitive environments, it is recommended to set shorter sync intervals and manually trigger synchronization after permission changes.

共有するXB!