Amazon Rekognition is a cloud service for extracting text in images launched by Amazon. It can find the text in an image and recognize it, as well as output other necessary information provided in this image, such as the location of both the image and the text. I'd like to share my hands-on experience in installing and working with it in this blog post. It can be used on both Windows and Linux OS.
Part 1: Prerequisites
Step 1: Sign up to AWS
Step 2: Create an IAM user account
Sign in to the IAM console and set up user and permissions. You can follow these instructions in part 'Step 2: Create an IAM user account'. The IAM console is shown in the image below. You can add users, create groups, and set up access in this console.
Step 3: Create an access key ID and secret access key
The access key ID and secret access key are needed for the AWS CLI (Command Line Interface) access. In the IAM console, choose "Users", choose the name of the user, and then choose the "Security credentials" tab. In the "Access keys" section, choose "Create access" key.
Your credentials will look something like this:
Access key ID: AKIAIOSFODNN7EXAMPLE
Secret access key: wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY
To download the key pair, choose the "Download .csv" file. Store the keys in a secure location.
Part 2: Set up the AWS CLI
I basically follow this tutorial to install AWS Command Line Interface (CLI). I will talk about the installation on Windows and Linux respectively in detail.
Installation of AWS CLI on Windows
Then open a command prompt window from the Start menu, and input:
$ aws --version command
The installation is correct if this is the response:
$ aws-cli/2.4.5 Python/3.8.8 Windows/10 exe/AMD64 prompt/off
Installation of AWS CLI on Linux
Use the curl command to download:
$ curl "https://awscli.amazonaws.com/awscli-exe-linux-x86_64.zip" -o "awscliv2.zip"
-o option specifies the file name for the downloaded package.
Unzip the package:
$ unzip awscliv2.zip
Run the install program:
$ sudo ./aws/install
You can also install without sudo if you specify a directory:
$ ./aws/install -i /usr/local/aws-cli -b /usr/local/bin
Confirm the installation with the following command:
$ aws --version
It will return this if successfully installed:
aws-cli/2.4.5 Python/3.8.8 Linux/4.14.133-113.105.amzn2.x86_64 botocore/2.4.5
Part 3: Set up AWS SDKs
$ pip install boto3
You can refer to the link for more details about AWS SDKs.
Part 4: Working with Amazon S3 buckets
Amazon Simple Storage Service (Amazon S3) is an object storage service that offers industry-leading scalability, data availability, security, and performance. Customers can use Amazon S3 to store any amount of data. In order to use the text extraction services of Amazon Rekognition, we have to upload the images into the S3 bucket and input the command in the CLI from the local machine for various operations.
I will give an example of uploading and analyzing images using Amazon S3 bucket as shown below. First of all, you have to upload the images to the S3 bucket. Below is a list of commands you can use to upload images to the bucket.
go to the installation directory of AWS:
$ cd aws
check all the contents in your bucket:
$ aws s3 ls s3://your_bucket_name
upload a file:
$ aws s3 sync path_to_your_images s3://your_bucket_name/your_folder_name
remove a image file:
$ aws s3 rm s3://your_bucket_name/your_folder_name/your_image_name.png
import boto3
def detect_labels(photo, bucket):
client=boto3.client('rekognition')
response = client.detect_labels(Image={'S3Object':{'Bucket':bucket,'Name':photo}},
MaxLabels=10)
print('Detected labels for ' + photo)
print()
for label in response['Labels']:
print ("Label: " + label['Name'])
print ("Confidence: " + str(label['Confidence']))
print ("Instances:")
for instance in label['Instances']:
print (" Bounding box")
print (" Top: " + str(instance['BoundingBox']['Top']))
print (" Left: " + str(instance['BoundingBox']['Left']))
print (" Width: " + str(instance['BoundingBox']['Width']))
print (" Height: " + str(instance['BoundingBox']['Height']))
print (" Confidence: " + str(instance['Confidence']))
print()
print ("Parents:")
for parent in label['Parents']:
print (" " + parent['Name'])
print ("----------")
print ()
return len(response['Labels'])
def main():
photo=''
bucket=''
label_count=detect_labels(photo, bucket)
print("Labels detected: " + str(label_count))
if __name__ == "__main__":
main()
Put the code above in a Python script and run it with the commands:
$ cd path_of_python_script
$ python3 name_of_the_script.py
{
"Labels": [
{
"Name": "Vehicle",
"Confidence": 99.15271759033203,
"Instances": [],
"Parents": [
{
"Name": "Transportation"
}
]
},
{
"Name": "Transportation",
"Confidence": 99.15271759033203,
"Instances": [],
"Parents": []
},
{
"Name": "Automobile",
"Confidence": 99.15271759033203,
"Instances": [],
"Parents": [
{
"Name": "Vehicle"
},
{
"Name": "Transportation"
}
]
},
It will give the extracted texts along with the confidence score for it. More importantly, the coordinates of the bounding box for the image are also given, which can be used for subsequent analysis. If there are multiple images and labels in one image file, the location of each of them will be given separately.
For example, this is the image we want to analyze:
Run the following code as a whole in Python (this code is modified from the sample code given above):
import boto3
def detect_labels(photo, bucket):
client=boto3.client('rekognition')
response = client.detect_labels(Image={'S3Object':{'Bucket':bucket,'Name':photo}},
MaxLabels=10)
if response['Type'] == "LINE"
for label in response['Labels']:
print ("Label: " + label['Name'])
print ("Confidence: " + str(label['Confidence']))
print ("Top: " + str(instance['BoundingBox']['Top']))
print ("Left: " + str(instance['BoundingBox']['Left']))
print ("Width: " + str(instance['BoundingBox']['Width']))
print ("Height: " + str(instance['BoundingBox']['Height']))
print()
return len(response['Labels'])
def main():
photo=''
bucket=''
label_count=detect_labels(photo, bucket)
print("Labels detected: " + str(label_count))
if __name__ == "__main__":
main()
Put the code above in a Python script and run it with the commands:
$ cd path_of_python_script
$ python3 name_of_the_script.py
Label: FIG. 5
Confidence: 99.76611328125
Top: 0.661735475063324
Left: 0.23441961407661438
Width: 0.09687193483114243
Height: 0.02643181011080742
Label: FIG. 6
Confidence: 99.77725219726562
Top: 0.6624100804328918
Left: 0.6570111513137817
Width: 0.09466367214918137
Height: 0.024769658222794533
Label: FIG. 7
Confidence: 99.44810485839844
Top: 0.8001129627227783
Left: 0.45073583722114563
Width: 0.0940646380186081
Height: 0.02555014006793499
Label: FIG. 8
Confidence: 99.40226745605469
Top: 0.9668554663658142
Left: 0.44961944222450256
Width: 0.09739556908607483
Height: 0.026392830535769463
Labels detected: 4
We can obtain "FIG. 5", "FIG. 6", "FIG. 7", and "FIG. 8", the correct labels for the drawings in this image with the output. Amazon Rekognition accurately extracted the texts in this image.
-- Xin Wei
Comments
Post a Comment