There is a well known anecdote among Computer Vision researchers regarding how difficult CV is. You can hear it being repeated in university labs and lecture theatres across the world. I myself have heard it several times.

Computer Vision was originally a summer project given to an undergraduate student

If you look around you can find various versions of the same anecdote, and sometimes it even goes so far to suggest that the project was to solve the whole Computer Vision problem. Everyone has smiled and reflected that it's probably true. There are a lot of things like this that we accept unquestioningly. One day I was thinking to myself "how true is this story ?"

After fifteen minutes of searching with Google, the majority of web pages give a citation that the person who said this was Marvin Minsky and the student was Gerald Sussman. According to the majority of these quotes, in 1966 Minsky asked Sussman to "connect a camera to a computer and do something with it".

They may indeed have had that conversation but in actual fact, the original Computer Vision project referred to above was set up by Seymour Papert at MIT and given to Sussman who was to co-ordinate a group of 10 students including himself.

The original document outlined a plan to do some kind of basic foreground/background segmentation, followed by a subgoal of analysing scenes with simple non-overlapping objects, with distinct uniform colour and texture and homogeneous backgrounds. A further subgoal was to extend the system to more complex objects.

So it would seem that Computer Vision was never a summer project for a single student, nor did it aim to make a complete working vision system. Maybe it was too ambitious for its time, but it's unlikely that the researchers involved thought that it would be completely solved at the end. Finally, Computer Vision as we know it today is vastly different to what it was thought to be in 1966. Today we have many topics that have been derived from CV or use CV based approaches such as inpainting, novel view generation, gesture recognition, computational photography, etc etc.

Stop Saying That Computer Vision Is Easy

We laugh at the above anecdote but many times I have heard many Computer Vision based systems explained to non-CV people in the style of

Uhhh, we segment the car from the video, find the disparities and then fit a model and render a reconstruction.

This has got to stop! You just described decades of Computer Vision research in a few sentences as if it was the easiest thing in the world. The fact is, those advances took a lot of sweat, tears and hard work and we can only make such (still imperfect) systems by standing on the shoulders of those who did so much foundation work before us. So stop writing it off as if it is nothing. You've got to remember that non-CV people don't realise that tasks like image recognition are easy for human vision systems that have evolved over thousands of years but difficult for Computer Vision systems that we have only recently started work on (in comparison).