AWS Kinesis

  • AWS research workshops
  • What is Kinesis
    • Amazon Kinesis is a managed, scalable, cloud-based service that allows real-time processing of streaming large amount of data per second. It is designed for real-time applications and allows developers to take in any amount of data from several sources, scaling up and down that can be run on EC2 instances.
  • Tutorial point
  • ETL
    • ETL stands for “extract, transform, load,” which is the process of loading business data into a data warehousing environment, testing it for performance, and troubleshooting it before it goes live.
  • Data stream -> set of shards->sequence of data records
  • Shards
    • Data stream is made up of one or more shards
    • Read
      • Up to  5 transactions per second for reads
      • Up to a maximum total data read rate of 2 MB per second
      • GetRecords
        • call GetRecords in a loop. Use GetShardIterator to get the shard iterator to specify in the first GetRecords call. GetRecords returns a new shard iterator in NextShardIterator. Specify the shard iterator returned in NextShardIterator in subsequent calls to GetRecords
    • Write
      • Up to 1,000 records per second for writes
      • Up to a maximum total data write rate of 1 MB per second
    • GetShardIterator
  • Data Records
    • Sequence Number
    • Partition Key
    • Data blob up to 1 MB
  • Data retention period default to 24 hr.  could be increased to 365 days
  • Producer, such as Twilio, puts record into AWS data stream
  • Consumer gets data from AWS data stream and process them
  • Using AWS Lambda with Amazon Kinesis
  • How to retry using info in DLQ using SNS
  • 6 common pitfalls AWS lambda with Kinesis trigger
  • AWS kinesis code sample
  • Batch size: The number of records to send to the function in each batch, up to 10,000.
  • The default value for getRecord is 10,000.  See document here
  • Example data in DLQ
    {
        "requestContext": {
            "requestId": "f01fef26-d010-4253-b997-d7dacb59452b",
            "functionArn": "arn:aws:lambda:us-west-2:999:function:example-datastream",
            "condition": "RetryAttemptsExhausted",
            "approximateInvokeCount": 6
        },
        "responseContext": {
            "statusCode": 200,
            "executedVersion": "$LATEST",
            "functionError": "Unhandled"
        },
        "version": "1.0",
        "timestamp": "2021-10-29T15:33:51.712Z",
        "DDBStreamBatchInfo": {
            "shardId": "shardId-00000001635520907907-798533d1",
            "startSequenceNumber": "4178381200000000045390851482",
            "endSequenceNumber": "4178381200000000045390851482",
            "approximateArrivalOfFirstRecord": "2021-10-29T15:32:47Z",
            "approximateArrivalOfLastRecord": "2021-10-29T15:32:47Z",
            "batchSize": 1,
            "streamArn": "arn:aws:dynamodb:us-west-2:999:table/example-table/stream/2020-01-06T22:39:48.687"
        }
    }