Implementing Image and Video Analysis with Amazon Rekognition - From Label Detection to Custom Models

Learn how to implement label detection, facial analysis, and text detection using pre-trained APIs, and build domain-specific image recognition models with Custom Labels.

Key Features of Rekognition

Rekognition is an image and video analysis service that provides pre-trained deep learning models as APIs. Key features include label detection (object and scene classification), face detection and analysis (age range, emotions, face orientation), text detection (character recognition in images), content moderation (detecting inappropriate content), and face comparison (similarity scoring between two faces). All features are accessible via API calls, with no need for ML model training or infrastructure management. Pricing is pay-per-use based on the volume of images and videos processed, with up to 5,000 images per month included in the free tier.

Implementing Label Detection and Text Detection

The DetectLabels API accepts images from an S3 bucket or as Base64-encoded bytes and returns detected labels (e.g., Car, Tree, Person) with confidence scores (0-100%). The MinConfidence parameter lets you set a confidence threshold to control false positives. Bounding box coordinates are also returned, allowing you to pinpoint where objects appear in the image. The DetectText API detects printed and handwritten text in images and extracts it as strings. It can detect up to 100 text elements per image, returning position information and confidence scores for each. It supports a wide range of OCR use cases, including product label reading on manufacturing lines, license plate recognition in parking lots, and document digitization.

Building Custom Models with Custom Labels

Custom Labels is a feature for classification and detection tasks that Rekognition's pre-trained models can't handle. Examples include defect detection on manufacturing lines, retail shelf analysis, and crop disease detection in agriculture, where domain-specific image recognition is required. You can start building a model with as few as a few dozen training images. Upload images to S3, label them using the Rekognition console or SageMaker Ground Truth, and start training. After training completes, launch an inference endpoint and run predictions via API using your custom model. The inference endpoint is billed hourly, so stop it during idle periods to control costs. For a systematic study of Rekognition, related books on Amazon are also a helpful reference.

Rekognition Pricing

Rekognition pricing is pay-per-use based on the number of images processed. Label detection costs approximately $1.00 per 1,000 images for the first 1 million images/month, face detection costs approximately $1.00 per 1,000 images, and text detection costs approximately $1.00 per 1,000 images. Up to 5,000 images per month are included in the free tier. For Custom Labels, the primary cost is the inference endpoint's hourly charge (approximately $4.00 per hour); stop it during idle periods to reduce costs. Training costs approximately $1.00 per hour.

Summary

Rekognition lets you implement image and video analysis without ML expertise. Pre-trained APIs cover common use cases, while Custom Labels handles domain-specific requirements. By combining S3 and Lambda in an event-driven architecture, you can build automated analysis pipelines triggered by image uploads.