Relational Database: Requires predefined tables and columns.
DynamoDB: Stores all movie details in a single JSON document, handling one-to-many relations easily.
Primary Keys
Simple Primary Key: Single attribute key (e.g., movie title).
Composite Primary Key: Consists of a partition key and a sort key (e.g., year and movie title).
Data Distribution and Partitions
Partition Key: Determines data partitioning; should be chosen to evenly distribute data.
Example Scenarios:
User ID as partition key and game title as sort key for even distribution.
Game title as partition key creates potential for uneven distribution.
Country as partition key can lead to a hot partition (overutilization of one partition).
DynamoDB Features
High Availability: Automatic replication across Availability Zones.
Global Tables: Multi-master, multi-region replication for global applications.
Transactions: Supports transactions across multiple items and tables.
Continuous Backup to S3: With 35 days retention; point-in-time restore available.
Time To Live (TTL): Automatically deletes expired items.
Item Size Limitation: Maximum of 400 KB per item; larger items can be stored in S3 with a reference in DynamoDB.
Use Cases
Ideal for high-traffic web applications, e-commerce systems, gaming platforms, and more.
Summary
DynamoDB offers high performance, scalability, and flexibility for NoSQL database needs.
Its unique features like schema-less design, global tables, and transaction support make it a versatile choice for various applications.
253. Cassandra and DocumentDB
Introduction
AWS provides managed services for migrating open-source databases Cassandra and MongoDB to the cloud.
Amazon Managed Cassandra
Cloud-optimized Cassandra: AWS version designed for easy migration from on-premises deployments.
Performance: Comparable to DynamoDB; offers single-digit millisecond response time, linear scaling, and a flexible schema.
Use Cases: Suitable for industrial equipment data collection and scenarios needing numerous columns.
Key Differences from DynamoDB:
Primary Key Structure: Cassandra supports multi-column partition and sort keys, offering more flexibility.
Item Size: Cassandra allows up to 2 GB per column (best performance with few MB per column) versus DynamoDB's 400 KB per item limit.
Column Limit: Cassandra supports a virtually unlimited number of columns, whereas DynamoDB’s attributes must fit within its item size limit.
Advantages: Offers more flexibility in terms of primary key structure and item size.
Amazon DocumentDB
MongoDB API Compatibility: Designed to be compatible with MongoDB APIs.
Challenges: There have been reports of compatibility issues with newer MongoDB versions.
Future Roadmap Uncertainty: Potential incompatibility between future versions of MongoDB and DocumentDB.
Summary
AWS Managed Cassandra and DocumentDB provide cloud migration options for Cassandra and MongoDB users, respectively.
Managed Cassandra offers significant flexibility and scalability benefits.
DocumentDB's compatibility with MongoDB is a crucial feature, but future compatibility remains uncertain.
254. Amazon ElastiCache - Usage Example, Features
Introduction
AWS Elasticache is a distributed in-memory data store offering sub-millisecond latency.
It's primarily used as a database cache and for capturing frequently changing information.
Use Cases
Ideal for applications like game leaderboards, user session management, product reviews and ratings, and geospatial applications.
Provides network isolation and security by deploying in a VPC.
Elasticache Engines
Offers two engine choices: Memcached and Redis.
Memcached
Simple key-value store.
Scales up to 20 nodes and 12 TB of data.
Provides sub-millisecond read/write operations.
Suitable for basic caching requirements.
Redis
Feature-rich, supporting advanced data structures like lists, sorted sets, hash tables, and bit arrays.
Enables implementation of in-memory queues, automatic maintenance of leaderboards, and geospatial data processing.
Scales up to 250 nodes and 170 TB of data.
Provides pub-sub capabilities for building chatrooms, server communication, and tracking social media feeds.
Supports read replicas across multiple Availability Zones for increased scalability and automatic primary promotion in case of failure.
Offers backup support to S3 and data export to other regions.
Includes Lua scripting support.
Integration Example
Application using DynamoDB for storage integrates Elasticache for real-time aspects like gameplay, leaderboards, and session data.
Reduces the number of read/write requests to DynamoDB, optimizing resource usage.
Benefits of Elasticache
High Performance: Ensures sub-millisecond response times, enhancing user experience in real-time applications.
Reduced Database Load: Decreases the volume of requests to the backend database, enabling efficient scaling with fewer resources.
Flexibility and Scalability: Offers various data structures and scaling options to fit diverse application needs.
Conclusion
Elasticache, with its choice of Memcached and Redis, offers significant benefits in terms of performance and database load reduction, making it a versatile solution for high-throughput applications.
255. Amazon Redshift
Introduction
Amazon Redshift is AWS's petabyte-scale data warehouse service.
It utilizes a distributed cluster architecture including a leader node and multiple compute nodes.
Key Features
Distributed Cluster
Redshift's architecture allows for scalability by adding more compute nodes to the cluster.
Columnar Storage
Unlike traditional row-based storage, Redshift uses columnar storage.
This approach is efficient for analytic queries, which often involve a subset of columns across many rows.
Each block in columnar storage contains data of a single column, allowing optimized compression based on data type.
Analytics and SQL Support
Redshift supports SQL with advanced analytics functions.
Ideal for sophisticated analytics tasks requiring querying across numerous rows but fewer columns.
Redshift Spectrum Integration with S3
Allows querying data across Redshift tables and files stored in Amazon S3.
Offers the flexibility to access a large amount of data in file format, enhancing data warehousing capabilities.
Benefits
Optimized for Analytics: Columnar storage makes it suitable for data warehousing and complex analytical queries.
Scalable Architecture: Ability to scale by adding more compute nodes.
Data Compression: Efficient data storage due to compression algorithms tailored for specific data types.
SQL and Advanced Analytics: Supports a wide range of analytics use cases with familiar SQL syntax.
Integration with S3: Extends querying capabilities to include unstructured data stored in S3.
Conclusion
Amazon Redshift offers a highly scalable, columnar storage-based data warehouse solution, making it a powerful tool for businesses needing advanced data analytics capabilities.
The integration with AWS services like S3 further enhances its utility in diverse data processing scenarios.