Video Analytics Algorithm For Summarization Based Surveillance System as a Final year project for University students

Today we share with you complete final year project report and this is latest final year project proposal about Surveillance System. This final year project is done by engineering university students and this is best project for electrical engineering students, signal and system engineering students, embedded system engineering students and computer networking students. Here we share with you best final year project and complete report of this project. You can get ideas about latest projects and now due to online universities many students are completing their degree from distance learning. So for distance learning students we share good final year project idea so that they can choose easily a good final project for them.

Now-a-days security is becoming major threat throughout the world. Security cameras are used for monitoring and prevention of crimes including terrorist attacks. Video recorded by security cameras results in storage consumption on daily basis, on storage servers. This not only increases the storage space but also make video browsing and searching cumbersome. So, there is a need of such algorithm which summarized the daily videos and provide us efficient video searching and also minimize storage space. Currently mostly companies using motion detection cameras for surveillance. Normally surveillance system has storage servers which store daily CCTV footage as videos. Company system stores videos for specific time limit which results in the increase of storage space and if there is need to watch videos for highlighting and browsing some past event than there is no video summarization algorithm which provides efficient searching. The main issues we handle are to extract and remove frames that contain motion of cat, flight/motion of Lays wrapper and the motion of plants in container. We have designed a dynamic video skimming technique, in which our algorithm detects object/objects in the video and draw bounding box around every object in the frame. We specify a certain threshold after studying and recording areas of each box in each frame of several videos and on the basis of which we eliminate those frame/frames from videos having bounding box area less than our area threshold, in order to minimize storage and to make video frames search easy.

Best Surveillance Systemm in world
 

  • why this final year project is important ?

Now-a-days security is becoming major threat throughout the world. Security cameras are used for monitoring and prevention of crimes including terrorist attacks. Video recorded by security cameras results in storage consumption on daily basis on storage servers. This not only increases the storage space but also make video browsing and searching cumbersome. So, there is a need of such algorithm which summarized the daily videos and provide us efficient video searching and also minimize storage space. There are certain ways for the summarization of surveillance video, like static video summarization and dynamic video skimming. Static video summarization can be used to summarize videos and it will give video summary in the form of static frames. While dynamic video skimming when used, provides summarized video of original video after removing certain frames. University Of Engineering Taxila is using motion detection cameras for surveillance. University surveillance system has storage servers which store daily CCTV footages as videos. University stores videos for specific time limit which results in the increase of storage space and if there is need to watch videos for highlighting and browsing some past event than there is no video summarization algorithm which provides efficient searching. Few months ago there was an incident happen in the Software department of University, which was about the robbery. The surveillance room in-charge and his team watched and searched videos in order to get evidences about the incident and they found it very difficult as videos were not summarized. In order to get summarized based video of surveillance system, I under my supervisor study and implement certain important issues, which if handle properly, not only minimize storage server capacity but also enable efficient video searching. The main issues are:

  • To extract and remove frames that contains motion of cat
  •   To extract and remove frames that contains flight/motion of wrappers (LAYs / biscuit packets )
  • To remove frames that was recorded due to the motion of plants in plant containers

So, that was the source of inspiration to work on the project named as “Video Analytics Algorithm For Summarization Based Surveillance System”, to develop an algorithm, in order to minimize storage and to made video frames search easy.

what work is already done in this final year project ?

Surveillance in French refers to the ‘watching over’ and thus, means to keep a close watch over a particular area. Surveillance cameras provide public safety and help to capture criminals. They ensure security and results in crime prevention. There is very little work which directly hits the surveillance video summarization area and provides a solution which not only summarizes the video but also results in storage space minimization. Large number of solutions only focuses on the summarization area, by extraction of some key frames on the basis of some focused objects detection, resulting in some event generation like alarm [1]. Some solutions work on the privacy and security of surveillance video storage content, they are not concerned with efficient searching and storage minimization.[2]
Different solutions have been proposed by different researchers in this domain of video summarization. Another method proposed by Brief Cam’s[10], a company that provides video summarization solution. Brief Cam’s uses Video Synopsis which detects and analyzes moving objects, and converts video into small objects and their routines and stores them in database. When there is a need of video summary then all objects and routines from database are taken and shown as video activity.
But the technique we took for guidance was purposed by a researcher [1] to develop an algorithm which basically extract key frames from the video and on the basis of these key frames, compile a new video which only consists of the key frames. Preprocessing phase includes two interlinked stages, including background estimation followed by the background subtraction. As a result of preprocessing an image is obtained which only contain objects which are not the part of background making human detection easy. Human detection is the next phase of the system. In this phase dilation technique is used for the detection of moving component from the image. As result of dilation, moving human is detected from the image. Object detection is the next phase of the proposed system. The techniques used for object detection are convolution and normalization.

what is background of this project ?

The main objective of project is to provide efficient searching and to minimize storage space by extracting and dropping targeted frames on the basis of “Cat motion detection”, “plants motion detection”, “wrappers motion/flight detection” and other events in the recorded video that do not include human beings.

The Project Comprises of three major phases:

  •   Study of object detection techniques in video frames of surveillance video
  •   Study of software language and tool to implement object detection techniques
  •   Study and implementation of methodology of object detection and video summarization

First Phase:
In the first phase we studied different object detection techniques like “Lateral Histogram Projection”, which is very robust and best technique for object detection but it has certain drawbacks which is discussed in detail in upcoming Chapters.

Second technique which I studied for object detection from a surveillance video frame is “Running Weighted Average” of video. In this first, background estimation is made then the difference between average weighted frame and current frame is calculated. If there exit a difference between the two frames then further calculations are made to detect and track the object. The background estimate is updated continuously when every new frame is taken. Depending upon the above information the motion is detected and a bounding box is drawn around each object in the current frame. This is the technique which we used in our video analytics algorithm and is discussed in detail in subsequent chapters.

  • Second Phase

In the second phase I studied software languages and tools to implement object detection techniques and video summarization. The major technical challenges were the selection of language and tool. Matlab results are very good but its processing power is very slow and if we use OpenCV[1] with C#, we must have to use some of wrapper for using OpenCv through C# like EmguCV[2] , OpenCVDotNet[3], OpenCvSharp[4] and CodeProject. Then problems may arise because some wrappers are supportive only for specific versions of tools and operating systems. Like CodeProject only supports Visual Studio 2005 and .Net Framework 3. And EmugCV have different versions for different operating systems and it is difficult to use OpenCV library through OpenCV C# wrappers. The best option is to use OpenCV with Visual C or Visual C++ as they are native languages in which OpenCV is written , it runs on multiple platforms and performance wise far better than Matlab. So, I decided to use OpenCv computer vision library, as it also provides a rich collection of built-in functions for image and video processing. And the tool which I selected was Visual Studio 2010 Ultimate edition and all this runs on 64 bit machine, which is mentioned in detail in implementation section later.I studied OpenCv library and its basic functions for image and video processing.

I studied CvMat matrix structure and IplImage structure. And decided to use IplImage structure which is very recent progress in OpenCv and contains all functions present in CvArr and CvMat matrix structure. I also used some functions from CvMat matrix structure.

  • Third Phase

Thirdly I studied the methodology which I used in my video analytic algorithm which consists of following major steps:

  •   Video input from the video file stored on hard disk and starting of video.
  •   Querying frame from the video file, finding average weighted frame then converts the frame to binary, this first frame will act as background image where there is no motion.
  •   A loop starts that takes a frame and converts it into binary, find difference of frames.
  •   Binary threshold on difference frame/image, according to requirement for rough blob extraction of object.
  •   Erosion and dilation of difference image to separate connected objects, so to find objects in frame.
  •   After that algorithm draws bounding box around each object in the frame and on the basis of area, we decide about to retain or discard the frame in the final summarized video file.

Finally my video analytics algorithm system will provide:

  • Video analytics algorithm for summarization based surveillance system in OpenCv C++
  • A CD containing code and software tools for running code
  • Final year project report
  • Project Specification

The major points about the algorithmic development environment and tools which I used are mentioned below:

• Visual Studio 2010 Ultimate Edition
• OpenCV 2.3.1 , 64 bit version for 64 bit machines
• Visual C++ and Visual C languages

what is scope of this project and what is utilization of this thesis?

As discussed earlier in the introduction and previous work sections that the project is unique in a sense that it directly deals with our University environment. The project idea is unique in a way that it will provide solution which will different in a sense from all other similar kind of algorithms because the project addressed issues which are common in our University, as our University is located in a rural area having fields around it and due to this reason there are large number of cats and dogs wandering around the departments in day and night times. Cats also move in the department in the day time as well and often air pressure is also high which causes plants inside the department to move and results in unnecessary video capturing. So, the project is different from summarization algorithms that are already developed as they are often designed for government offices or business places located in highly populated areas.

what are Limitations of this project?

There are certain limitations regarding this project:

  • Motion detection CCTV cameras work very fine for the detection of objects which are very near to their range but often miss to detect the motion of small objects and plants leaves, which are placed at end boundaries of camera range.
  • The algorithm divides each frame into two parts on the basis of height of video frame/image, in future we can divide frame into more than two parts and setting bounding box rectangle area threshold for each region separately.

The project is divided into multiple implementation phases; the first phase is about blob extraction or object detection or region extraction. I studied two different techniques: “Lateral Histogram Projection” and “Background Subtraction/Running Average”. Than is the part of video summarization which is based on the basis of bounding box around each blob or object.

The following are the research papers which I studied under the guidance of my supervisor:

  • Research Area And Proposed Methodology:

This paper is about the detection of unintentional fall which may be due to various reasons and diseases e.g. like slip of the foot and heart attack. All this causes severe injuries and health threats to patients and old people. The system proposed in the above mentioned paper detects the fall and alarms.

  • Research Area:

Machine learning and computer vision are two major areas which are working on the well being of humans and providing different type of solutions, along with digital image processing field. Now-a-days video cameras are so cheap that even ordinary people can buy them and every house in city areas, has at least one computer .So , we can build a system on the principles of computer vision and machine learning , which not only records video of different portions and areas within or outside a particular house or some building . This video not only uses for security purposes but also have some other major advantages. One major advantage of daily video is that it can be used to detect the unintentional fall of people which may be due to different reasons and causes injuries very often. There is a need of immediate help and treatment to injured people on emergency basis who injures from fall. The video algorithm is developed for this, discussed in the paper, which involves posture analysis and classification of fall detection. The major stages involves are detection and tracking of person.

  • Proposed Methodology:

First the motion history is maintained on regular intervals and there is close monitoring and watch over the motion. If then large motion is detected by the system, generally happen when person sit or run quickly. Here, motion gradient is used to find the speed and orientation of the motion. Foreground blob or silhouette of the person is detected than motion history is maintained and upcoming motion is calculated. If there occurs a large difference between estimated and actual motion than the speed and orientation is noted. Now if large motion happens than during a fall, person moves quickly and at the end there is few and small motion of person’s body.
Than at next stage we extract the human shape, to decide whether large motion detected is normal (such as a person sits) or abnormal (such as a person falls).
Now the system converts the image into binary image and on the basis of lateral histogram projection technique, top left and bottom right points of the human blob is found through the use of horizontal and vertical histograms. A bounding box or rectangle can be drawn easily by these two points and thus our region of interest i.e. human body image is saved in the memory through special OpenCv functions like region of interest ROI( ) and cvSaveImage( ) .

After which analysis of the human body image is made and width, height and aspect ratio is calculated and compared with threshold value which differentiates the fall from other normal daily routine activities.
Finally, system notices if there is no body movement after the fall than alarming system notify the respected people.

  • Research Area And Proposed Methodology:

This paper is about the lectures handout generation system from captured video lectures. In the proposed system, first instructor is removed from the video lecture than white board region is extracted and then scanning of white board takes place for text extraction. And at the end, lecture handouts are generated which can be taken out on paper through printer.

  • Research Area:

It is often difficult for students to note lecture on paper with pen while teacher is delivering lecture. Especially at university level, the teachers are very experienced and very fast, while writing lecture important points on whiteboard. Computer vision and machine learning provides a solution for this problem. By the help of these two fields we can develop a system easily which can generate video lecture handouts efficiently and intelligently. The solution of this is described by the technique in the paper.

  • Proposed Methodology:

First in the proposed methodology, a static camera is used for the video lectures than from this, first white board is captured which cannot be an easy task because it suffers from noise, reflection and background people motion. Than Canny edge detection technique is used for the extraction of white board on the basis of four prominent edges of whiteboard. Now the image is converted into binary image by binary image and by binary threshold, whiteboard boundary is detected and marked. After which scanning takes place to fill the white boundary, results in a reference image which we use for further comparison.

After which system enhances the white board by passing image through low pass and average filters to reduce the noise and for smoothing respectively.

Than instructor detection is made through background subtraction technique .In this, first a reference image is taken from which the algorithm subtracts the current image or frame. Background is removed and foreground is achieved, which is instructor. Than by using Lateral Histogram projection technique width and height of the instructor blob or silhouette from the image is measured, which is used to draw the bounding box which helps in tracking the instructor motion. Than image is divide into segments 0f 20×200 pixels. First image is compared with reference image of white board which we took earlier for finding boundaries of the whiteboard, is compared with current image without instructor occlusion in front of whiteboard, in order to get the text. Finally, we also get text from the blocks which contains instructor, as instructor moves away from the whiteboard. Block or segment tagging technique is used to get the text frame.

We in this way achieved the text frame which furthers compare with upcoming frame to find the black pixel values in the frame, if value increases or decreases by 5% than upcoming frame is considered updated frame.
In this way from text frames lecture handouts can be generated easily from the sequence of text images.

  • Research Area And Proposed Methodology:

This paper is about the summarization of surveillance video on the basis of detection of human beings carrying a specific object, like PC, projector and laptop etc and moving outside from entry gate of department. The proposed system not only detects the human but also alarms the security officials for any illegal activity through beep.

  • Research Area:

Now-a-days cameras are very cheap and their prices are very less as compared to past. So, their major use is for security purposes, in order to monitor the daily activities. Daily security video consumes storage in gigabytes and also searching of this daily video for the detection of a particular event is very cumbersome and difficult. The proposed methodology is about the summarization of daily surveillance video at runtime to detect the human beings out of them. The detection is based upon the special event, when human beings carrying special objects like projector, PC and move outside from the department. Than the system notify the security officials by beeping the alarm.

  • Proposed Methodology:

In the system proposed in the above paper, first the image is acquired from the video then processing takes place on it. In this a background image which contains no object is acquired for future comparison and named as “reference image”. Now current frame or image is subtracted from the reference image to find the difference of two images. This gives foreground object, if their present any object.

After obtaining foreground object dilation is applied on the difference image which enlarges the edges of the foreground object in the difference image and fill areas.

After dilation of image filled silhouette or blob of human is formed. Now bounding box is drawn around the human blob.

After the detection of human, now object in the hands of human needs to be detected. For this purpose convolution and normalization to the bounding box around the human, this generates a graph of the bounding box image. Now double mean of the graph values are calculated and on the basis of difference between these two values then there is a fair possibility of object in the bounding box. Double mean value should be in between 0.95 to 1.00 of the maximum graph value. If the double mean value is less than 0.95. Then there is maximum chance of presence of object in the bounding box.

From convolution values, another bounding box is drawn around the object. Now two bounding boxes are there, one around the human and other around the object.

Now at the end, the object in the second bounding box is matched with the already stored images in the database by the technique of template matching. If the object is matched than alarms beep to signal the security officials.

  • Using Histograms to Detect and Track Objects in Color Video

This methodology is proposed by “Michael Mason and Zoranic Duric”, department of Computer Science, George Mason University

Object detection and tracking is very wide and interesting area of research and development now-a-days. Researchers are working for solution that is robust, fast and efficient. One solution is discussed in the paper above.

The proposed methodology surrounds around the detection and tracking of foreground object by the division of each video frame into small portions (cells) and the comparison of the histogram of each cell to the background model. Background model is calculated earlier and maintained for further calculation and comparison purposes. And after comparison of histogram of each portion with background model then comparison results are taken to show for human activity in the video sequence i.e. on the basis of which human object is detect and track.

  • Scenario of the Video Analytics System:

In French, word “Surveillance” refers to the ‘watching over’ and thus, means to monitor closely over a particular area. Surveillance cameras provide public safety and help to capture criminals. They ensure security and results in crime prevention. University of Engineering and Technology Taxila is using Close Circuit Television Cameras (CCTV) for surveillance video. The cameras work on the detection of motion and record the video to the storage servers. This not only consumes a lot of hard disk space on daily basis but also the searching of stored video for some particular event is very difficult and cumbersome. So, this system provides the solution for summarize video system which not only helps in searching and make it efficient but also results in less storage space occupation. There are certain ways for the summarization of surveillance video, like static video summarization and dynamic video skimming. Static video summarization can be used to summarize videos and it will give video summary in the form of static frames. While dynamic video skimming when used, provides summarized video of original video after removing certain frames.

Few months ago there was an incident happen in the Software department of University, which was about the robbery. The surveillance room in-charge and his team watched and searched videos in order to get evidences about the incident and they found it very difficult as videos were not summarized. In order to get summarized based video of surveillance system, I under my supervisor studied and implemented certain important issues, which if handle properly, not only minimize storage server capacity but also enable efficient video searching. The main issues are:

  • To extract and remove frames that contains motion of cat
  • To extract and remove frames that contains flight of wrappers
  • To remove frames that was recorded due to the motion of plants

Selection of Particular Camera:
I selected a particular camera to study and implement video summarization. The camera is located at the inner side of the software engineering department, in a straight line at a distance of 2-3 meter from surveillance room door. This camera also suffers from lighting conditions and other noise factors. So, I studied and noted results by dividing the camera image/frame into two equal parts, on the basis of height.

  • Software and Hardware

All the implementation is done through OpenCv library whose native language is C and also supports C++. The tool for implementation is Visual Studio 2010 Ultimate edition and the hardware used is Corei5 HP630 machine and RAM is 4Gb. The version of OpenCv which is used for this project is 2.3.1 and it is for 64bit window. But this can be run on every machine which is 64bit and having specifications more than normal Pentium IV.

Implementation Technique and Its Details:
Step 1:
First I studied OpenCv how to read a video file from hard disk and what type functions that are useful for reading video file then how to get or query frames from a video file. For that I used CvQueryFrame( ) function in an infinite for loop, depends upon the number of frames.

Now the actual work started, the phase of object detection from a acquired frame. I used two techniques and studied results of them:

  • Lateral Histogram Projection Technique:

Lateral Histogram projection technique is very famous technique for object detection. In this technique, first I converted the color image frame into binary image by using a binary threshold function then I dilated and erode the binary image to get the complete blob.

From this blob I got 2D matrix containing binary values of the image blob, which is similar like this.
From this matrix we calculated the number of one’s along X-axis and Y-axis. That is we added all the entries in a particular row and column and repeat this for all rows and columns. After addition, we start from X-axis added values of column. When first non-zero point is retrieved (in above diagram its X1=3), we mark it as X1. Similarly, we search for the point after which zero value is present or it’s the last non-zero value than we mark it as X2(in this case its X2=6) . Similarly we do this for all the added rows along Y-axis to get the Y1 and Y2 points of the blob. This gives the starting and ending points of silhouette along X and Y axis. Through which we get two pair of points, one at the top left of the blob/silhouette. Here points are, Starting PointP1=(X1,Y1)=(3,2) and Ending Point P2=(X1,Y1)=(6,5) . We pass these two points to rectangle function in OpenCv to draw bounding rectangle around the object. In this way object is detected through Literal Histogram technique.

  • Draw Back Of Literal Histogram Technique:

Literal Histogram Technique is good for object detection, when there is only one object present in the video frame. It may work well for more than one object detection in some scenarios. But fail in the scenario when two or more objects are in the following form.
Here above, Lateral histogram technique fails because we get the points along X-axis after adding columns are four, but they must be six because three objects are present in the frame. Points along Y-axis after adding rows are six which also give evidence about the faulty result of Lateral Histogram Projection technique.
Suddenly for more complex and real time scenarios, this technique is not applicable. So, I study and implement another technique for object detection.

  • Weighted Image Subtraction / Running Average Technique:

It is the most easy and simplest way of object detection and good for use of study purposes. And results are good in real environment as compared to Lateral Histogram projection Technique. I use this technique for object detection.
First my algorithm get the first frame from the video file by the CvQueryFrame() function of OpenCv , mark it as first movingAverage image . Than for second and upto last frame our video analytics algorithm subtract the moving/runningAverage frame from the current frame and then replaces the movingAverage frame from the current frame.
Now after subtracting the movingAverage from current frame and replacing the running/movingAverage image with current frame. The resultant image i.e. the difference image which we get after subtraction is converted into grayscale image, it will help us in object detection. Than we convert this grayscale image into binary image by applying threshold. In binary threshold we set pixels value less than 70 to zero i.e. black in this case and above 70 to 255 which is the value of white color. So in this way get the binary image which has only two values, black and white. Black represents background and white represents foreground object. Now the object detected has some black holes in it i.e. noisy pixels. To get a complete blob we apply erosion and dilation to the binary image to get the complete Silhouette or blob.
Now after getting complete blob, now the time is to get bounding box around the blob i.e. to detect the object. For this OpenCv provide us two very useful things CvMemStorage structure which gives us dynamic memory to store values at runtime. Second important structure which OpenCv provides us is cvFindContours, it is used to find the contours i.e. 2D array of points representing a silhouette or blob. By cvFindContours() function we find all the blobs in the frame and store them in the CvMemStorage object , we declared earlier i.e. we store all the 2D(Two dimensional) points of blobs dynamically by using CvMemStorage object. Now we run a for loop upto “n” numbers of contours founded earlier. And draw the bounding box around each blob. For this OpenCv provides another function i.e. cvBoundingRect() . It is used to draw the bounding box around each blob. We only give 2D points of blob stored dynamically as contours. So, in this way we are able to draw the bounding box around each blob in the frame.

After getting bounding box around each blob, we now decide on the basis of areas of these rectangles, which frame to discard or which frame must move into our resultant summarized video file.

Main Summarization Technique:

Now this is the actual part of our video analytics algorithm and is designed to decide which frame is to retain and which is to discard. For this I study rectangles around each object and the feature of our video analytic algorithm is that it combines two or more rectangles to give one more rectangle if anyone side of them overlaps with one another. If there is no object in the frame then situation is very easy because that frame must be discarded, as it is not giving any information. If there is/are one or more bounding box/boxes than our algorithm finds the width and height of each frame. After finding width and height, our algorithm finds area of each rectangle, which is in the form of pixels. Now on the basis of area we decide about video summarization. If the area of any rectangle is greater or equal than the area which we decide as threshold after carefully analyzing and studying the areas then that frame must be retained in the resultant video file. We also divide the frame into two portions on the basis of height of the frame. If the four points of bounding box or rectangle around each blob is above the half line that divides each frame then the value of threshold is different. The value of area is different for four points in which one pair is above the line and one pair of points is below the line. Similarly, threshold value is different for four points which all are below the half division line.

We take different videos from one particular fixed camera as we discussed earlier that all our study is based on one particular camera which is fixed.
And we find a threshold area value of rectangle around the human beings.

We take different videos from one particular fixed camera as we discussed earlier that all our study is based on one particular camera which is fixed.
And we find a threshold area value of rectangle around the human beings.

  • CONCLUSION

It can be concluded from the above discussion that an efficient algorithm is developed from the methodology discussed above. Results have shown that this algorithm will save time by providing efficient searching and reduce the amount of storage as well. And is very good algorithm for summarization of surveillance system, where we only need video frames that mainly contains human presence and information.

  • Future Work

In the future we can improve this algorithm by more deep study of height and width measures of bounding box around the detected object. This can be used for tracking of object. And furthermore it will give better results if we divide each video frame into two or more parts along X and Y axis. If connected component labeling technique is used for object detection and for drawing bounding box around the object then results would be much better.

2 thoughts on “Video Analytics Algorithm For Summarization Based Surveillance System as a Final year project for University students

Leave a Reply