Data Search and Analytics in Practice - Building Full-Text Search and Visualization with OpenSearch
Learn about data search and analytics design with Amazon OpenSearch Service, including how to build an analytics foundation with full-text search, log analysis, and dashboard visualization.
OpenSearch as an Integrated Search and Analytics Platform
Modern applications require the ability to instantly search for needed information from large volumes of data and visualize data trends and patterns. Amazon OpenSearch Service is a fully managed service that provides open-source OpenSearch, supporting diverse use cases including full-text search, log analysis, real-time monitoring, and security analytics. OpenSearch maintains compatibility with Elasticsearch while being developed by the open-source community under the Apache 2.0 license. The serverless option lets you get started without capacity planning, with automatic scaling based on workload.
Using OpenSearch as a Full-Text Search Engine
OpenSearch's full-text search capability provides fast search based on inverted indexes. For Japanese text search, the kuromoji analyzer performs morphological analysis, enabling Japanese-specific word segmentation and improved search accuracy. It supports diverse search patterns including fuzzy search, phrase search, wildcard search, and regex search, flexibly accommodating user search intent. Search result scoring is based on the BM25 algorithm, ranking the most relevant results at the top. Custom scoring allows ranking adjustments based on business logic. Suggest and autocomplete features present candidates while users are still typing, improving the search experience. The highlight feature emphasizes portions matching search keywords, improving result visibility. Here is an example of creating an index with Japanese search configuration in OpenSearch: curl -X PUT "https://search-domain.ap-northeast-1.es.amazonaws.com/products" -H "Content-Type: application/json" -d '{"settings":{"analysis":{"analyzer":{"ja_analyzer":{"type":"custom","tokenizer":"kuromoji_tokenizer","filter":["kuromoji_baseform","ja_stop"]}}}},"mappings":{"properties":{"name":{"type":"text","analyzer":"ja_analyzer"}}}}' to configure the kuromoji tokenizer for Japanese morphological analysis.
Log Analysis and Observability
OpenSearch Service is widely adopted as a log analysis platform, capable of directly ingesting AWS service logs including CloudWatch Logs, VPC Flow Logs, CloudTrail logs, and ALB access logs. Using Kinesis Data Firehose, you can automatically deliver streaming data to OpenSearch, building a real-time log analysis pipeline. The Trace Analytics feature visualizes distributed tracing data, identifying request flows between microservices and latency bottlenecks. The Anomaly Detection feature uses machine learning to automatically detect anomalous metric patterns, discovering anomalies that are difficult to catch with manual threshold settings. The Alerting feature automatically sends notifications to SNS, Slack, or custom webhooks when search query results meet specified conditions, enabling immediate alerts to operations teams. For those who want to systematically learn about AWS log analysis, related books on Amazon are also a useful reference.
Visualization with OpenSearch Dashboards
OpenSearch Dashboards is an integrated tool for data visualization and dashboard construction. It provides diverse visualization types including line charts, bar charts, pie charts, heatmaps, and geographic maps, enabling intuitive understanding of data trends and patterns. Dashboards can combine multiple visualizations and update in real time, building monitoring screens that stay current. The Notebooks feature lets you create interactive reports combining query results with markdown explanations, streamlining analysis result sharing and documentation. Direct queries against data stored in S3 are also possible, enabling cross-cutting analysis that includes data not indexed in OpenSearch. SAML authentication and fine-grained access control let you set different dashboard and data access permissions for each team.
OpenSearch Pricing
OpenSearch Service instance pricing for r6g.large.search is approximately $0.167 per hour (about $120 per month). Storage uses EBS gp3 at approximately $0.08 per GB per month. OpenSearch Serverless uses OCU (OpenSearch Compute Unit) hourly billing at approximately $0.24 per OCU-hour. Serverless requires a minimum of 2 OCUs (one for indexing, one for search) running at all times, resulting in a minimum monthly cost of approximately $345. For small-scale environments, provisioned clusters are more cost-effective.
Summary
Amazon OpenSearch Service is a fully managed platform integrating full-text search, log analysis, and data visualization, addressing diverse search and analytics use cases. Japanese full-text search with the kuromoji analyzer, BM25 scoring, and suggest features deliver a high-quality search experience. For log analysis, direct ingestion of AWS service logs and Anomaly Detection improve operational monitoring efficiency. Visualization and real-time monitoring with OpenSearch Dashboards support data-driven decision-making. For organizations looking to build a data search and analytics foundation, OpenSearch Service provides a comprehensive solution.