Patricia Tang, Sue Xu, Managing Partner, Amino Capital
With recent, rapid advancements in computer vision technologies and the increasing presence of these technologies in everyday life, it’s unsurprising that the computer vision domain is top-of-mind both for investors looking to deploy capital in a growing space as well as for startup founders looking to advance the space through entrepreneurship. The computer vision space’s growth has been of particular interest for Amino Capital: as a data-focus venture firm with many successful portfolio companies in the computer vision space like Orbeus (acquired by Amazon), Grokstyle (acquired by Facebook), Daedalean.ai, AIFI.io, Voyage.auto, BrainKey, and Wyze, Amino seeks to identify new, innovative technologies in the space and help them grow by leveraging the firm’s expertise and resources.
Additionally, Amino’s team of technologists are doubly interested in the computer vision space’s growth because of their expertise: partner Dr.Huican Zhu, for instance, is a pioneer of computer vision as the inventor of Google Image Search. I recently sat down for a conversation with Zhu to learn more about his thoughts on the growth of computer vision over the years, emerging applications in the space, and his advice for potential startup founders in computer vision.
This interview has been lightly edited for clarity.
You Invented Google Image Search In 2000. Could You Talk About The Gap You Saw In Google’s Functionality At The Time, How Google Image Search Dealt With That, As Well As About The Evolution Of Google Search Over Time?
When I started Google Image Search, and when we launched the product, the product was very straightforward compared to what you see today. Back then, Google had only web searches, which was text-based. You could only search for HTML pages as well as for some text. There was no image search, so I made Image Search.
But, our image search engine at the time was also text-only in the sense that you could only type in something, like “cat” or some celebrity’s name, and you would get a list of images related to your queries. You’d get images of cats back, you’d get pictures of Jennifer Lopez and other pictures. So, the image that returns from search comes to you because, somehow, the images are on webpages that happened to mention the keywords you searched. So, the first iteration of Google Image Search was keyword-based.
Today, Google Image Search definitely evolved a lot, especially with advancements in AI, deep learning, and neural networks. Google Image Search today can do so many things: for example, similar image search and reverse image search. It’s not completely related to Google Image Search, but Google Goggles can even do OCR (Optical Character Recognition) for you. If you give it an image, it can get all the text for you in that image. For example, if you take a picture of a restaurant, it can give you the restaurant’s name and the opening time.
It can do all kinds of other image classification. It can recognize objects for you: if you give it an image, it can tell you whether it is a dog, a cat, or a flower. Sometimes it can even describe images: for instance, it can process a picture and be able to say, “This is a kid flying a kite at the beach.” So, overall, Google Image Search is doing much more complicated stuff.
As An Early Adopter Of Computer Vision, How Have You Seen The Computer Vision Domain Change In The Past 20+ Years? What Have Been Some Notable Applications Of Computer Vision Technology?
When you say computer vision, I think you mean two things: the first is image classification, and the second is object recognition. For Image classification, you give the computer an image, and it’ll tell you whether the image is a flower, an animal, or something else. You give the computer an image of a person, and it’ll tell you which celebrity it is. So, the technology categorizes the images. Object detection technologies in self-driving cars, for example, need to detect whether there are pedestrians nearby, what’s the color of the traffic light ahead, things like that. That’s object detection: you can detect what’s in the image, especially in the typical case in which multiple objects are present in an image. I think AI and deep learning have helped these technology areas improve quite a bit in both domains.
Such technologies have been applied everywhere, for everyday life. All smartphones have facial recognition technology built-in today. Some companies deploy facial recognition technology outside of personal use to see, for instance, if you’re an employee or not of said company.
Grokstyle, one of our portfolio companies, has been in the news lately for GrokNet technology powering Facebook’s new AI shopping too, after acquired by Facebook successfully last year. What Grokstyle’s technology did was it helped you identify objects in images. Given a picture, it can tell you what the furniture in the picture is, such as what kind of bag a person is carrying and the bag’s brand. Grokstyle was working with IKEA (Scandinavian Furnishings Company) to deploy their technology. If you want to buy a piece of furniture from IKEA, for instance, you would want to know what it looks like in your home. So, Grokstyle uses AR technology to show you what the table would look like when it’s put in your family room. That’s all based on computer vision.
With All These Applications In Mind, Including Grokstyle, What Do You Think Have Been Some Important Breakthroughs In This Field That Enable These Technologies And Where Do You See Room For Growth?
For image classification, it’s mostly a combination of neural networks. For object detection, it used to be a complicated problem because there are many different types of objects; they may be in different parts of the image. Nowadays, there are some advances like YOLO (You Only Look Once), a deep learning technology that makes object detection very efficient. Because of this, object detection can now be applied everywhere.
Besides the applications I’ve already mentioned, there’s also process automation based on computer vision. The next thing we can anticipate with regards to breakthroughs in computer vision will be in healthcare, in which we want to see more progress. Today, we rely on doctors to read X-ray images, to understand CT scans, and to diagnose patients based on what they see. But doctors might make mistakes, and different doctors classify diseases differently. Computer vision can make good doctors better.
How do you think Amino leads with regards to investing in computer vision technologies, and how does Amino use its platform to be a leader in this space?
At Amino, our partners are all tech-savvy, and we feel very comfortable investing as the first check-in technology, especially domains like big data and machine learning. That’s our advantage.
Also, we are based in Silicon Valley. That’s where technological trends come from. We want to take advantage of both our location and expertise areas. By investing in technologies like computer vision and AI, we can do much better than other investors in this regard.