Amazon Comprehend Specialized2017年〜
A service that uses natural language processing to extract entities, sentiment, and key phrases from text
What It Does
Amazon Comprehend is a fully managed service that analyzes text data using natural language processing (NLP). It extracts entities (named entities) such as person names, place names, and organization names from text, determines the sentiment of text (positive, negative, neutral), and identifies key phrases. No ML expertise is required - simply call the API to run text analysis.
Use Cases
Comprehend is used for sentiment analysis of customer support feedback (automatically classifying satisfaction vs. dissatisfaction), social listening to analyze brand reputation from social media posts, and data organization tasks like automatically extracting person and company names from large volumes of documents.
Everyday Analogy
Think of it like having an excellent reader as your assistant. Hand over a stack of letters (text data), and the assistant reads each one and summarizes the key points: "This letter is angry," "This letter mentions Mr. Tanaka and Tokyo," "The main issue in this letter is a delivery delay." Comprehend performs this assistant's role instantly across massive volumes of text.
What Is Comprehend?
Amazon Comprehend is a service that automatically understands and analyzes the content of text. Just as a human reads text and judges "this passage is positive" or "there's a company name here," Comprehend uses machine learning models to parse the meaning of text. It supports numerous languages including Japanese, and analysis results can be retrieved by simply calling the API.
Key Analysis Features
Comprehend provides multiple analysis features. Sentiment analysis classifies text sentiment into four categories: positive, negative, neutral, and mixed. Entity recognition extracts named entities such as person names, place names, organization names, dates, and quantities. Key phrase extraction identifies important terms within text. Language detection determines what language a piece of text is written in. By combining these features, you can gain multifaceted insights from text data.
Custom Classification and Custom Entities
In addition to standard analysis features, you can build custom models that define your own categories and entities. For example, you can create a custom classification model that automatically sorts customer support tickets into "technical issues," "billing inquiries," and "feature requests," or a custom entity model that extracts industry-specific terminology. Simply prepare training data and have Comprehend learn from it to build specialized NLP models. To deepen your understanding of custom classification and custom entities, reference books on Amazon are also worth checking out.
Getting Started
To get started with Comprehend, simply enter text in the "Real-time analysis" section of the Comprehend console to instantly see sentiment analysis and entity extraction results. To use it programmatically, call the DetectSentiment or DetectEntities APIs through the AWS SDK. For bulk text analysis, you can create batch processing jobs to process data stored in S3.
Things to Watch Out For
- Japanese sentiment analysis and entity recognition may differ in accuracy compared to English. Validate results when using them for important decisions
- The real-time API has a text size limit (100 KB in UTF-8). Use batch processing for analyzing long documents
- Custom model training requires at least several hundred labeled data samples. Accuracy decreases with insufficient training data, so prepare enough samples