Implementing Natural Language Processing with Amazon Comprehend - Sentiment Analysis and Entity Extraction

Learn about sentiment analysis, entity extraction, and building custom classification models with Comprehend.

Overview of Comprehend

Comprehend is a service that provides natural language processing (NLP) APIs supporting over 25 languages. Given text input, it returns results for sentiment analysis, entity extraction, key phrase extraction, language detection, and syntax analysis. You can incorporate NLP features into your application without ML expertise. Custom classification models and custom entity recognition also support industry-specific text classification.

Sentiment Analysis and Custom Models

The sentiment analysis API classifies text sentiment into four categories - Positive, Negative, Neutral, and Mixed - and returns confidence scores for each. It can be used for e-commerce review analysis and call center transcript analysis. Custom classification builds models that classify text into industry-specific categories using labeled training data. PII detection automatically identifies and masks personal information in text for GDPR and privacy law compliance.

PII Detection and Real-Time Analysis

Comprehend's PII detection API automatically identifies personal information in text (names, addresses, phone numbers, email addresses, credit card numbers, social security numbers) and returns location information and confidence scores. The ContainsPiiEntities API determines whether PII is present, while the DetectPiiEntities API identifies specific locations. Creating a real-time analysis endpoint keeps custom classification or custom entity recognition models running continuously, providing immediate inference results via API calls. Asynchronous batch jobs process large volumes of documents on S3 and output results to S3. Comprehend Medical specializes in medical text, extracting drug names, disease names, and procedure names. For a deeper understanding of Comprehend theory and implementation, specialized books (Amazon) can be helpful.

Comprehend Pricing

Comprehend pricing is based on the number of characters per API call. Sentiment analysis, entity extraction, and key phrase extraction are each billed per 100-character unit (minimum 300 characters) at approximately $1 per 100,000 units. Custom model training costs approximately $3 per hour, and real-time endpoints cost approximately $0.50/hour per inference unit. Batch processing has lower per-unit costs than real-time APIs, making it suitable for analyzing large volumes of documents. Real-time endpoints incur continuous charges, so deleting endpoints during low-traffic periods and switching to asynchronous jobs reduces costs. Comprehend Flywheel automates continuous model improvement, reducing the operational burden of retraining.

Summary

Comprehend is a service that provides NLP features such as sentiment analysis, entity extraction, and PII detection via API. Custom classification models handle industry-specific text classification, and batch processing and real-time endpoints handle diverse analysis workloads. Comprehend Medical also supports extracting drug names and disease names from medical text.