System Overview
VisionIQ is a powerful image recognition platform. This article peeks “under the hood” to help you understand what the system can do and how to get the best results from your application.
VisionIQ Mobile SDK
VisionIQ software resides on mobile phones and in the cloud. We provide iOS and Android SDKs that are designed to handle on-device visual scans for image recognition, QR codes and barcodes. On-device scans are fast, less than 1 second, and work best if you have a small image dataset.
VisionIQ Server API
For larger datasets the mobile SDKs communicate to the server through an API. The server API allows your app to send images to the server and retrieve metadata that identify the visual content of those images. It’s pretty simple: send a photo, return an ID. You can improve visual search speed and accuracy by training computer vision to recognize your own private dataset. To get started, download the mobile SDKs and read the API document. Grab some coffee! It’s a good read.
VisionIQ image processing modules
OK, now for the fun part. Ever wonder how VisionIQ works its magic? The answer is a set of powerful image processing modules. Since each module tackles a different piece of the puzzle, you can mix and match them to meet your application’s image recognition needs.
| Image processing module | Description | ||
| QR code | |
| QR codes are easy. They’re read locally and contain an embedded url/text string. No image training is needed. A QR code reader comes with our SDK. |
| UPC code | |
| UPC (barcodes) are read on-device. VisionIQ takes care of the UPC product code lookup and translates the number string into the product name. |
| Object recognition | |
| Super fast < 1 second on-device object recognition. Load up to 100 training images to the SDK and the mobile phone will scan for those objects. Recommended for small image datasets. |
| Object recognition | |
| Fast server-side object recognition performed through the API. Scales to billions of objects. Works best for logos, packaged products, books, CDs, DVDs, posters and other flat media. |
| Color | |
| A color histogram is calculated and contained within the JSON string. Helps provide information to blind users. |
| Human Crowdsourcing | |
| Uses real people to tag images computer vision cannot identify. Average response time 30-60 seconds. Turning on this module provides "any image" recognition. |
How VisionIQ processes visual scans
When you visually scan an object with your mobile phone the image recognition is done sequentially in these steps:
1
On-device image recognition
First, visual processing is performed on the device. Our mobile SDKs look for QR codes, UPC codes and images that are stored locally. On-device scans are fast (less than 1 second) and reliable because the phone’s camera constantly scans for the object, producing more accurate visual scans. Use our iOS or Android SDKs to enable this feature.
2
Server API image recognition
For objects that are not recognized by the on-device scan, the mobile SDK sends a photo, along with your API key and timestamp to the server. Here, computer vision compares the submitted photo against your private trained dataset and looks for a match. If a match is found the server returns the image ID and metadata you specified when you trained the dataset. You can also chose to search IQ Engines’ public database of millions of trained objects.
3
Crowdsourcing
Human crowdsourcing is a good option when computer vision does not return an ID from on-device or server API image recognition. Our human taggers can return a general description (i.e. bird, car, plant) for any image in 30-60 seconds. We’re also working on expert tagging modules where people with specific expertise can provide more detailed tags for your images than the general crowd. We use statistical analysis to ensure that crowdsourcing tags are accurate. You can even use crowdsourcing to train your private dataset by enabling real-time learning (contact us to find out how).
Optimize VisionIQ for your application
The speed and accuracy of IQ Engines’ image recognition service depends on several factors. To get the best results from your application follow these guidelines:
1
Train - upload training images to your private dataset
Computer vision works best if it has a database of properly labeled images to compare queries against. For best results, train your own dataset. This means uploading images with the metadata (labels, urls) you want returned on recognition. The better the training, the better computer will work. For details check out our training console.
2
Use on-device image recognition if you have a small dataset
For small datasets of less than 100 images you don't need the server API. In fact, it’s faster and more reliable to configure the mobile SDK for on-device recognition. Using this technique you can process image recognition scans as fast as barcodes.
3
Improve human crowdsourcing results with a custom question
You can improve human crowdsourcing results with a custom question. Our default question is "describe the image using 2-4 words.” For best results, tailor the question to your needs. A wine app, for example, may ask for the “vineyard, vintage and varietal.” Contact us for more details on this setting.
4
Turn on real-time learning to let the crowd train your dataset
Turning on real-time learning allows images tagged by the crowd to train your computer vision dataset. Leveraging the power of the crowd helps your visual scans get faster and better over time.
5
Choose the VisionIQ module that works best
Understand the strengths of each VisionIQ module and choose the one that best fits your application.
What VisionIQ does best
What makes VisionIQ so powerful is that it uses both automated computer vision technology and human crowdsourcing to recognize things so you can pick the method that works best for your application. Following are examples of what computer vision does well and what works better with human crowdsourcing.
| Computer Vision (with trained dataset) | Human Crowdsourcing |
|
|
|
|
Cool computer vision modules we are working on
Similarity sorting: This new computer vision module will be available soon. With similarity sorting, when a query comes in that computer vision doesn’t recognize it will sort your trained data set to return the most likely matches. This allows computer vision to provide a 100% response rate, and our testing shows it will handle difficult 3D objects as well.
Optical Character Recognition: We will also be adding another module soon that will use state-of-the-art OCR techniques to enable robust optical character recognition from your mobile phone.
Expert tagging: This concept improves on our crowdsourcing module with expert human taggers who have specific knowledge on products, places or things. Contact us to find out how expert tagging can provide more detailed labels for your images than the general crowd.