2022-01-03: googletrans Python Library, The Unofficial Google Translate API

 The Unofficial Google Translate API



Enhancing communication between people around the world have become of a special importance over the last decade or so due to the unparalleled growth in globalization, international eCommerce, social networks, internet usage, and other services. Thus, the demand for a tool that is capable of removing language barriers have increased and led to the development of cloud translation services (a.k.a. Translation APIs). There are several translation APIs, but the giant, Google Translate, is what I will be talking about in this post. In 2016, Google translated more than 100 billion words a day; and after 10 years of service, more than 500 million people were using Google Translate. While I am not a translation expert and prefer to use dictionary.com to get an accurate definition of individual words, I have used Google Translate in the past to get a list of synonyms for a word (in English or Arabic) or for quickly translating a long article that I do not need to read every word in it. For Arabic translation, Google accuracy is 85% according to a Google survey. This indicates that using Google Translate is reasonable for translating uncritical documents, but it isn't for business contracts, medical documents, research papers, etc. The accuracy falls down if the user translates more than a few words or a sentence. 

I needed a machine translation API for my research and since Google's official translateAPI costs money, I went to look for alternatives. I decided to try to crawl and scrape Google Translate to extract the translation (e.g., crawling and scraping https://translate.google.com/?sl=en&tl=ar&text=inhaling%20oxygen%20is%20good%20for%20you&op=translate will give you the Arabic Translation for "inhaling oxygen is good for you"). While looking to see if someone else has done it before, I found the googletrans library for Python. It uses the Google Translate Ajax API to auto detect the language of a text and translate it. I started using it and it is definitely easy to use (easier than the official Google Translate API). 

Python script using googletrans library

 There is a disclaimer on the project's GitHub repository page:

DISCLAIMER: this is an unofficial library using the web API of translate.google.com and also is not associated with Google.

Due to limitations of the web version of google translate, this API does not guarantee that the library would work properly at all times (so please use this library if you don't care about stability).

Important: If you want to use a stable API, I highly recommend you to use Google's official translate API.

 After successfully using it for a while, one day it broke and returned AttributeError: 'NoneType' object has no attribute 'group' error message:

AttributeError when running python script using googletrans library

After scouring the internet for a solution and trying different methods and reading various issues on GitHub, I found that the problem happens when Google directly sends the raw token. This is a recurring issue and was fixed once before. I continued to look and found this solution, which listed various methods to fix the problem. This is the solution that worked for me:

pip uninstall googletrans
pip install googletrans==3.1.0a0 

Uninstalling googletrans and installing googletrans==3.1.0a0 library

 After that, everything worked:

Running python script to translate text using googletrans library

It is worth mentioning that Google may ban your client IP address for too many requests. This is something I must keep in mind because, eventually, I will need to translate hundreds of thousands of news stories from Arabic to English. The best way to avoid being banned or getting an HTTP 429 Too Many Requests response code is to have a delay between requests in your script. You can look at the value for the Retry-After header in the response object and add a sleep with that value to fix an HTTP 429 Too Many Requests or an HTTP 503 Service Unavailable.

Example:

HTTP/1.1 429 Too Many Requests
Content-Type: text/html
Retry-After: 3600
Python script using googletrans library 

Running python script to translate text using googletrans library with a one hour delay between the first and second request


--- 

Hussam Hallak


Comments

Post a Comment