In order to call a Google Cloud API from your app, you need to create an intermediate
REST API that handles authorization and protects secret values such as API keys. You then need to
write code in your mobile app to authenticate to and communicate with this intermediate service.
One way to create this REST API is by using Firebase Authentication and Functions, which gives you a managed, serverless gateway to
Google Cloud APIs that handles authentication and can be called from your mobile app with
pre-built SDKs.
This guide demonstrates how to use this technique to call the Cloud Vision API from your app.
This method will allow all authenticated users to access Cloud Vision billed services through your Cloud project, so
consider whether this auth mechanism is sufficient for your use case before proceeding.
Before you begin
If you have not already added Firebase to your app, do so by following the
steps in the
getting started guide
.
Use Swift Package Manager to install and manage Firebase dependencies.
- In Xcode, with your app project open, navigate to
File > Add Packages
.
- When prompted, add the Firebase Apple platforms SDK repository:
https://github.com/firebase/firebase-ios-sdk.git
- Choose the Firebase ML library.
- Add the
-ObjC
flag to the
Other Linker Flags
section of your target's build settings.
-
When finished, Xcode will automatically begin resolving and downloading your
dependencies in the background.
Next, perform some in-app setup:
- In your app, import Firebase:
Swift
import FirebaseMLModelDownloader
Objective-C
@import FirebaseMLModelDownloader;
A few more configuration steps, and we're ready to go:
-
If you have not already enabled Cloud-based APIs for your project, do so
now:
- Open the
Firebase ML
APIs page
of the Firebase console.
-
If you have not already upgraded your project to the Blaze pricing plan, click
Upgrade
to do so. (You will be prompted to upgrade only if your
project isn't on the Blaze plan.)
Only Blaze-level projects can use Cloud-based APIs.
- If Cloud-based APIs aren't already enabled, click
Enable Cloud-based
APIs
.
- Configure your existing Firebase API keys to disallow access to the Cloud
Vision API:
- Open the
Credentials
page of the Cloud console.
- For each API key in the list, open the editing view, and in the Key
Restrictions section, add all of the available APIs
except
the Cloud Vision
API to the list.
Deploy the callable function
Next, deploy the Cloud Function you will use to bridge your app and the Cloud
Vision API. The
functions-samples
repository contains an example
you can use.
By default, accessing the Cloud Vision API through this function will allow
only authenticated users of your app access to the Cloud Vision API. You can
modify the function for different requirements.
To deploy the function:
- Clone or download the
functions-samples repo
and change to the
Node-1st-gen/vision-annotate-image
directory:
git clone https://github.com/firebase/functions-samples
cd Node-1st-gen/vision-annotate-image
- Install dependencies:
cd functions
npm install
cd ..
- If you don't have the Firebase CLI,
install it
.
- Initialize a Firebase project in the
vision-annotate-image
directory. When prompted, select your project in the list.
firebase init
- Deploy the function:
firebase deploy --only functions:annotateImage
Add Firebase Auth to your app
The callable function deployed above will reject any request from non-authenticated
users of your app. If you have not already done so, you will need to
add Firebase
Auth to your app.
Add necessary dependencies to your app
Use Swift Package Manager to install the Cloud Functions for Firebase library.
Now you are ready to start recognizing text in images.
In order to call Cloud Vision, the image must be formatted as a base64-encoded
string. To process a
UIImage
:
Swift
guard let imageData = uiImage.jpegData(compressionQuality: 1.0) else { return }
let base64encodedImage = imageData.base64EncodedString()
Objective-C
NSData *imageData = UIImageJPEGRepresentation(uiImage, 1.0f);
NSString *base64encodedImage =
[imageData base64EncodedStringWithOptions:NSDataBase64Encoding76CharacterLineLength];
2. Invoke the callable function to recognize text
To recognize landmarks in an image, invoke the callable function passing a
JSON Cloud Vision request
.
First, initialize an instance of Cloud Functions:
Swift
lazy var functions = Functions.functions()
Objective-C
@property(strong, nonatomic) FIRFunctions *functions;
Create the request. The Cloud Vision API supports two
Types
of text detection:
TEXT_DETECTION
and
DOCUMENT_TEXT_DETECTION
.
See the
Cloud Vision OCR Docs
for the difference between the two use cases.
Swift
let requestData = [
"image": ["content": base64encodedImage],
"features": ["type": "TEXT_DETECTION"],
"imageContext": ["languageHints": ["en"]]
]
Objective-C
NSDictionary *requestData = @{
@"image": @{@"content": base64encodedImage},
@"features": @{@"type": @"TEXT_DETECTION"},
@"imageContext": @{@"languageHints": @[@"en"]}
};
Finally, invoke the function:
Swift
do {
let result = try await functions.httpsCallable("annotateImage").call(requestData)
print(result)
} catch {
if let error = error as NSError? {
if error.domain == FunctionsErrorDomain {
let code = FunctionsErrorCode(rawValue: error.code)
let message = error.localizedDescription
let details = error.userInfo[FunctionsErrorDetailsKey]
}
// ...
}
}
Objective-C
[[_functions HTTPSCallableWithName:@"annotateImage"]
callWithObject:requestData
completion:^(FIRHTTPSCallableResult * _Nullable result, NSError * _Nullable error) {
if (error) {
if ([error.domain isEqualToString:@"com.firebase.functions"]) {
FIRFunctionsErrorCode code = error.code;
NSString *message = error.localizedDescription;
NSObject *details = error.userInfo[@"details"];
}
// ...
}
// Function completed succesfully
// Get information about labeled objects
}];
If the text recognition operation succeeds, a JSON response of
BatchAnnotateImagesResponse
will be returned in the task's result. The text annotations can be found in the
fullTextAnnotation
object.
You can get the recognized text as a string in the
text
field. For example:
Swift
let annotation = result.flatMap { $0.data as? [String: Any] }
.flatMap { $0["fullTextAnnotation"] }
.flatMap { $0 as? [String: Any] }
guard let annotation = annotation else { return }
if let text = annotation["text"] as? String {
print("Complete annotation: \(text)")
}
Objective-C
NSDictionary *annotation = result.data[@"fullTextAnnotation"];
if (!annotation) { return; }
NSLog(@"\nComplete annotation:");
NSLog(@"\n%@", annotation[@"text"]);
You can also get information specific to regions of the image. For each
block
,
paragraph
,
word
, and
symbol
, you can get the text recognized in the region
and the bounding coordinates of the region. For example:
Swift
guard let pages = annotation["pages"] as? [[String: Any]] else { return }
for page in pages {
var pageText = ""
guard let blocks = page["blocks"] as? [[String: Any]] else { continue }
for block in blocks {
var blockText = ""
guard let paragraphs = block["paragraphs"] as? [[String: Any]] else { continue }
for paragraph in paragraphs {
var paragraphText = ""
guard let words = paragraph["words"] as? [[String: Any]] else { continue }
for word in words {
var wordText = ""
guard let symbols = word["symbols"] as? [[String: Any]] else { continue }
for symbol in symbols {
let text = symbol["text"] as? String ?? ""
let confidence = symbol["confidence"] as? Float ?? 0.0
wordText += text
print("Symbol text: \(text) (confidence: \(confidence)%n")
}
let confidence = word["confidence"] as? Float ?? 0.0
print("Word text: \(wordText) (confidence: \(confidence)%n%n")
let boundingBox = word["boundingBox"] as? [Float] ?? [0.0, 0.0, 0.0, 0.0]
print("Word bounding box: \(boundingBox.description)%n")
paragraphText += wordText
}
print("%nParagraph: %n\(paragraphText)%n")
let boundingBox = paragraph["boundingBox"] as? [Float] ?? [0.0, 0.0, 0.0, 0.0]
print("Paragraph bounding box: \(boundingBox)%n")
let confidence = paragraph["confidence"] as? Float ?? 0.0
print("Paragraph Confidence: \(confidence)%n")
blockText += paragraphText
}
pageText += blockText
}
}
Objective-C
for (NSDictionary *page in annotation[@"pages"]) {
NSMutableString *pageText = [NSMutableString new];
for (NSDictionary *block in page[@"blocks"]) {
NSMutableString *blockText = [NSMutableString new];
for (NSDictionary *paragraph in block[@"paragraphs"]) {
NSMutableString *paragraphText = [NSMutableString new];
for (NSDictionary *word in paragraph[@"words"]) {
NSMutableString *wordText = [NSMutableString new];
for (NSDictionary *symbol in word[@"symbols"]) {
NSString *text = symbol[@"text"];
[wordText appendString:text];
NSLog(@"Symbol text: %@ (confidence: %@\n", text, symbol[@"confidence"]);
}
NSLog(@"Word text: %@ (confidence: %@\n\n", wordText, word[@"confidence"]);
NSLog(@"Word bounding box: %@\n", word[@"boundingBox"]);
[paragraphText appendString:wordText];
}
NSLog(@"\nParagraph: \n%@\n", paragraphText);
NSLog(@"Paragraph bounding box: %@\n", paragraph[@"boundingBox"]);
NSLog(@"Paragraph Confidence: %@\n", paragraph[@"confidence"]);
[blockText appendString:paragraphText];
}
[pageText appendString:blockText];
}
}