I’m new to computer vision and a lot of the basic concepts are very interesting. As an iOS developer, my interests comes from using CoreML & Apple’s Vision in apps to improve the user experience.
Two common tasks are classification and object detection. Classification allows you to detect dominant objects present in an image. For example, classification can tell you that photo is probably of a car.
Object detection is much more difficult since it not only recognizes what objects are present, but also detects where they are in the image. This means that object detection can tell you that there is probably a car within these bounds of the image.
What’s important is that the machine learning model runs in an acceptable amount of time. Either asynchronous in the background or in real time. Apple provides a listing of sample models for classification at https://developer.apple.com/machine-learning/.
For real time object detection, TinyYOLO is an option, even if the frame rate is not near 60 fps today. Other real time detection models like YOLO or R-CNN are not going to provide a sufficient experience on mobile devices today.
One other interesting thing I came across is the PASCAL Visual Object Classes (VOC). These are common objects used for benchmarking object classification.
For 2012, the twenty object classes that have been selected were:
- Person: person Animal: bird, cat, cow, dog, horse, sheep
- Vehicle: aeroplane, bicycle, boat, bus, car, motorbike, train
- Indoor: bottle, chair, dining table, potted plant, sofa, tv/monitor
These are common objects used to train classification models.
Computer vision used with machine learning has a tremendous amount of potential. Whether used with AR or other use cases, they can provide a compelling user experience beyond Not Hotdog.