2022-02-16: About the Use of Amazon Rekognition and the Installation of Associated AWS CLI

 



Amazon Rekognition is a cloud service for extracting text in images launched by Amazon. It can find the text in an image and recognize it, as well as output other necessary information provided in this image, such as the location of both the image and the text. I'd like to share my hands-on experience in installing and working with it in this blog post. It can be used on both Windows and Linux OS.


Part 1: Prerequisites


Step 1: Sign up to AWS

Follow the instructions to sign up for an AWS account

Step 2: Create an IAM user account

Sign in to the IAM console and set up user and permissions. You can follow these instructions in part 'Step 2: Create an IAM user account'. The IAM console is shown in the image below. You can add users, create groups, and set up access in this console.


Step 3: Create an access key ID and secret access key

The access key ID and secret access key are needed for the AWS CLI (Command Line Interface) access. In the IAM console, choose "Users", choose the name of the user, and then choose the "Security credentials" tab. In the "Access keys" section, choose "Create access" key.


Your credentials will look something like this:

Access key ID: AKIAIOSFODNN7EXAMPLE

Secret access key: wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY

To download the key pair, choose the "Download .csv" file. Store the keys in a secure location.


Part 2: Set up the AWS CLI

I basically follow this tutorial to install AWS Command Line Interface (CLI). I will talk about the installation on Windows and Linux respectively in detail.


Installation of AWS CLI on Windows


Download and run the AWS CLI MSI installer for Windows (64-bit):



Then open a command prompt window from the Start menu, and input:

aws --version command

The installation is correct if this is the response:

$ aws-cli/2.4.5 Python/3.8.8 Windows/10 exe/AMD64 prompt/off


Installation of AWS CLI on Linux


Use the curl command to download:

curl "https://awscli.amazonaws.com/awscli-exe-linux-x86_64.zip" -o "awscliv2.zip"

-o option specifies the file name for the downloaded package.

Unzip the package:

unzip awscliv2.zip

Run the install program:

sudo ./aws/install

You can also install without sudo if you specify a directory:

./aws/install -i /usr/local/aws-cli -b /usr/local/bin

Confirm the installation with the following command:

aws --version

It will return this if successfully installed:

aws-cli/2.4.5 Python/3.8.8 Linux/4.14.133-113.105.amzn2.x86_64 botocore/2.4.5



Part 3: Set up AWS SDKs

We can set up AWS SDKs by this command:

$ pip install boto3

You can refer to the link for more details about AWS SDKs.


Part 4: Working with Amazon S3 buckets

Amazon Simple Storage Service (Amazon S3) is an object storage service that offers industry-leading scalability, data availability, security, and performance. Customers can use Amazon S3 to store any amount of data. In order to use the text extraction services of Amazon Rekognition, we have to upload the images into the S3 bucket and input the command in the CLI from the local machine for various operations.

More details about Amazon S3 can be found in this introduction: "What is Amazon S3?". 

These are the instructions on how to create a bucket.  

This is another tutorial about "Analyzing images stored in an Amazon S3 bucket". It talks about how to interact with S3 bucket via the boto3 SDK. This can be done in different languages such as JAVA, Python and so on.


I will give an example of uploading and analyzing images using Amazon S3 bucket as shown below. First of all, you have to upload the images to the S3 bucket. Below is a list of commands you can use to upload images to the bucket.

go to the installation directory of AWS:

$ cd aws

check all the contents in your bucket:

$ aws s3 ls s3://your_bucket_name

upload a file:

$ aws s3 sync path_to_your_images s3://your_bucket_name/your_folder_name

remove a image file:

$ aws s3 rm s3://your_bucket_name/your_folder_name/your_image_name.png




Then use the code provided in the tutorial "Analyzing images stored in an Amazon S3 bucket" to extract the text in the images. A sample code looks like what's shown below:


import boto3

def detect_labels(photo, bucket):

    client=boto3.client('rekognition')

    response = client.detect_labels(Image={'S3Object':{'Bucket':bucket,'Name':photo}},
        MaxLabels=10)

    print('Detected labels for ' + photo) 
    print()   
    for label in response['Labels']:
        print ("Label: " + label['Name'])
        print ("Confidence: " + str(label['Confidence']))
        print ("Instances:")
        for instance in label['Instances']:
            print ("  Bounding box")
            print ("    Top: " + str(instance['BoundingBox']['Top']))
            print ("    Left: " + str(instance['BoundingBox']['Left']))
            print ("    Width: " +  str(instance['BoundingBox']['Width']))
            print ("    Height: " +  str(instance['BoundingBox']['Height']))
            print ("  Confidence: " + str(instance['Confidence']))
            print()

        print ("Parents:")
        for parent in label['Parents']:
            print ("   " + parent['Name'])
        print ("----------")
        print ()
    return len(response['Labels'])


def main():
    photo=''
    bucket=''
    label_count=detect_labels(photo, bucket)
    print("Labels detected: " + str(label_count))


if __name__ == "__main__":
    main()

Put the code above in a Python script and run it with the commands:


$ cd path_of_python_script

$ python3 name_of_the_script.py


{
    "Labels": [
        {
            "Name": "Vehicle",
            "Confidence": 99.15271759033203,
            "Instances": [],
            "Parents": [
                {
                    "Name": "Transportation"
                }
            ]
        },
        {
            "Name": "Transportation",
            "Confidence": 99.15271759033203,
            "Instances": [],
            "Parents": []
        },
        {
            "Name": "Automobile",
            "Confidence": 99.15271759033203,
            "Instances": [],
            "Parents": [
                {
                    "Name": "Vehicle"
                },
                {
                    "Name": "Transportation"
                }
            ]
        },



It will give the extracted texts along with the confidence score for it. More importantly, the coordinates of the bounding box for the image are also given, which can be used for subsequent analysis. If there are multiple images and labels in one image file, the location of each of them will be given separately.

For example, this is the image we want to analyze:





Run the following code as a whole in Python (this code is modified from the sample code given above):

import boto3

def detect_labels(photo, bucket):

    client=boto3.client('rekognition')

    response = client.detect_labels(Image={'S3Object':{'Bucket':bucket,'Name':photo}},
        MaxLabels=10)

    if response['Type'] == "LINE"
        for label in response['Labels']:
        
            print ("Label: " + label['Name'])
            print ("Confidence: " + str(label['Confidence']))
            print ("Top: " + str(instance['BoundingBox']['Top']))
            print ("Left: " + str(instance['BoundingBox']['Left']))
            print ("Width: " +  str(instance['BoundingBox']['Width']))
            print ("Height: " +  str(instance['BoundingBox']['Height']))
            print()

        
    return len(response['Labels'])


def main():
    photo=''
    bucket=''
    label_count=detect_labels(photo, bucket)
    print("Labels detected: " + str(label_count))


if __name__ == "__main__":
    main()


Put the code above in a Python script and run it with the commands:


$ cd path_of_python_script

$ python3 name_of_the_script.py


Label: FIG. 5
Confidence: 99.76611328125
Top: 0.661735475063324
Left: 0.23441961407661438
Width: 0.09687193483114243
Height: 0.02643181011080742

Label: FIG. 6
Confidence: 99.77725219726562
Top: 0.6624100804328918
Left: 0.6570111513137817
Width: 0.09466367214918137
Height: 0.024769658222794533

Label: FIG. 7
Confidence: 99.44810485839844
Top: 0.8001129627227783
Left: 0.45073583722114563
Width: 0.0940646380186081
Height: 0.02555014006793499

Label: FIG. 8
Confidence: 99.40226745605469
Top: 0.9668554663658142
Left: 0.44961944222450256
Width: 0.09739556908607483
Height: 0.026392830535769463

Labels detected: 4


We can obtain "FIG. 5", "FIG. 6", "FIG. 7", and "FIG. 8", the correct labels for the drawings in this image with the output. Amazon Rekognition accurately extracted the texts in this image.


-- Xin Wei

Comments