Today with features like image-level labels, new visual relationships and new form of multimodal annotations called localized narratives, a boost has been taken by Google’s Open Images corpus for computer vision tasks. Google believes that the last segment that is designed to study how people describe images will create huge avenues for research. This will result into leading insights for the interface design across various platforms such as mobile apps, desktop and web.
A data set consisted of millions of labeled images from various categories was launched in 2016 by Google. It was upgraded by the major updates that arrived in 2018 and 2019. A research scientist at Google Research, Jordi Pont-Tuset said that along with the data set itself, the latest advances in visual relationship detection and instance segmentation have been spurred by the associated Open Images challenges. She further added that in computer vision tasks to train the latest deep convolutional neural networks, the Open Images is the largest annotated image data set.
As Pont-Tuset explains, to control the connection between language and vision is considered to be one of the inspirations behind localized narratives. Generally image captioning is used for this. But visual grounding is lacked in image captioning. To minimize this issue, there are some of the researchers are trying to come up with some new solutions.
In Open Images localized narratives were generated by the annotators. These annotators described with a computer mouse while hovering over regions. The Google researchers aligned manually transcribed description by annotators with automatic speech transcriptions. These researchers make sure that ensuring that the trace of mouse, speech send text were synchronized and correct.
Pont-Tuset explained that the task of simultaneous pointing and Speaking is very intuitive which made it impossible to give them an explicit ideas about the task they were performing. She added that they believe that the new version of open image would give users a proper understanding of the scene as for applications such as object detection, instance segmentation, image classification and visual relationship detection, to improve the unified annotations it is considered to be a significant quantitative and qualitative step. This data set is available for free.