First Days of Computer Vision with Python

Ever wondered how our computers and robots see things around them. Computer vision is all what it is. What is Computer Vision? According to Wikipedia

"Computer vision is an interdisciplinary scientific field that deals with how computers can gain high-level understanding from digital images or videos. From the perspective of engineering, it seeks to understand and automate tasks that the human visual system can do"

Computer vision can be applied in almost every aspect of our lives. Even human vison can be also computer vision after all our body can be referred to as a computer.

It is so amazing how we have given the computer a human ability to see things process them and give output. This technology is a very important one as it has been applied and proven to make life easier. Tesla ,one of the best companies in the world right now has made driverless cars very possible in our daily activities. Have you ever asked yourself how the cars are able to perceive what is around them, Computer Vision.

The cars use cameras and LiDar sensors to see their environment what is around them and process if the car would turn right or left, accelerate or decelerate and many other functions. But this would be coupled with Deep learning, another field that has made the world a better place.

Amazon also uses computer vision for their mini store robots. These mini robots use lines to move around and these lines can be detected using edge detection.

In security also, computer vision is used to detect faces from camera feeds during investigation. Once faces are captured, AI can now be used to track who the face belongs to from a database.

Photo by PhotoMIX Company from Pexels

Agriculture too, imagine a large farm has to differentiate between bad and good plants at a faster rate. Computer vision got us covered. Just feed the robots with a sample of the good and the bad plants, leave a trained computer vision model to do the work and watch how the output would increase in efficient.

Applications are just too much for me to start writing about them.

Started learning Computer vision with python using Open CV .

After 21 days of learning, I came up with a small project. But you know python is just very sweet to use. The project code lines are little in number but processes are very large. S/O to the creator of Python; Guido van Rossum . So what my small project does is to is to open the webcam or the primary camera and start a video stream. Once the video stream has started and any face that is detected would be captured and be saved.

Firstly, we import the cv2 package as cv. I prefer to use cv so it will be easy to type at anytime.

#importing opencv
import cv2 as cv

Then connecting the camera with cv.VideoCapture(0). The 0 denotes that taking feed from the primary camera connected.

cap = cv.VideoCapture(0) # connecting the camera to the variable cap

This is used to get the width and height of the video

width = int(cap.get(cv.CAP_PROP_FRAME_WIDTH))
height = int(cap.get(cv.CAP_PROP_FRAME_HEIGHT))

Assigning the haarcascade xml file to a variable

faceCascade = cv.CascadeClassifier('haarcascades/haarcascade_frontalface_default.xml')

Now using using cv.VideoWriter to save the recorded video

writer = cv.VideoWriter('1stvideo.mp4',cv.VideoWriter_fourcc(*"XVID"),(24),(width,height))

Now created a function to detect the faces. A function so it will be easy to detect anytime. With just typing detectFace(‘image name’) would detect face immediately.

def detectFace(img):

    global x, y, w,h  
    #making the x,y,w,h variable global so they can be used anytime

    faceImg= img.copy() #creating a copy 
    faceRect = faceCascade.detectMultiScale(faceImg)#The haarcascade variable

    #Four variables are returned , unpacking them to know the locations of the face
    for (x,y,w,h) in faceRect:


        cv.rectangle(faceImg,(x,y),(x+w,y+h),(255,255,255),8)
        #this is to draw a rectangle around the detected face on the image copy


    return faceImg #returning the image with the detected face

Now using a while loop to continuously capture the video feed, captures the face and saves a detected face as a jpg file.

while True:    


    ret , frame = cap.read() #reading from the camera



    frame = detectFace(frame) #using the detectFace function find the face on each frame

    cv.imshow("record",frame) 
    cv.imwrite("detected"+str(n)+".jpg",frame[y:y+h,x:x+w])




    n+=1

    if cv.waitKey(2) & 0xFF == ord('q'): # press "q" to quit the program after 2 millisecond

        break

The process would require a large size to save the faces detected. But this would also provide accurate results when carrying out a theft investigation.

Then lastly, we release the camera and destroy all windows.

cap.release() #releasing the camera

cv.destroyAllWindows()

Process Screenshots from the video feed:

Screenshot from Video1.png

Screenshot from Video2.png

Detected faces from the frames of the video.

This is my first project. A small project, but I am looking forward to exploring the world of computer vision and make life easier for us all.

Thank you.