Cluster Analysis will bolster Cyber Security in Big Data

Rate this post

In the first of this two-part series, our blogger shares interesting insights in Cluster Analysis, and advancements that are powering this technology to handle Big Data with ease. Understand all about the technology’s growth, how Cluster Analysis can help tackleissues of cyber-attacks and malware better and more.Machine learning is … (Featured image is for representational purpose and has been sourced from https://pixabay.com/en/hacker-attack-mask-internet-1872291/)

***

In the first of this two-part series, our blogger shares interesting insights in Cluster Analysis, and advancements that are powering this technology to handle Big Data with ease. Understand all about the technology’s growth, how Cluster Analysis can help tackle issues of cyber-attacks and malware better and more.

Machine learning is being touted to become bigger than what social media was in the previous decade. Advances in machine learning are expected to transform day-to-day life. According to a study by IDC Futurescapes, two-thirds of Global 2000 Enterprises CEOs will center their corporate strategy on digital transformation using machine learning solutions. Cluster analysis is one of the most widely used techniques of machine learning.

Cluster analysis means grouping a set of objects in such a way that objects in a particular group (called a cluster) are similar in nature. Clustering of data is intuitive and inherent to the human brain. However, the human capacity to cluster data is limited as amount of data involved in modern clustering problems is too huge for humans to process.

The cluster analysis attempts to identify hidden structures in unlabeled data without using an initial classification as a training set. In itself, cluster analysis is not a single algorithm but a problem which is solved by various algorithms depending on the implementation. Various methodologies include hierarchical clustering, density-based clustering, centroid-based clustering, grid-based clustering, etc. Cluster analysis finds application in computer science, social sciences, medicine, business intelligence, education, criminal study, etc. to name a few.

Research and IP Trends

Cluster analysis became popular in the 1990s when, empowered by newly acquired computational capacity, work in machine learning shifted from knowledge-driven approach to data-driven approach. Scientists started creating computer programs to analyze large amounts of data and draw conclusions — or “learn” — from results. This trend is seen in both patent filings and non-patent search on Google Scholar.

The interest in the field grew steadily up to 2008 after which it saw a period of stagnation till 2011. After a series of huge investments from the U.S. government and major institutions in Big Data, research and filing picked up pace again in 2012.

Figure 1: Cluster Analysis – Patent Filing Trends

Figure 2: Google Scholar retrievals using search term “cluster analysis”, (a) for the years 1950-1959, 1960-1960, etc., up to 2000-2009; (b) for the years 2001 to 2011.

Application Areas

Information retrieval is the application that has attracted the most attention from cluster analysis. Search engines apply cluster analysis to retrieve relevant data at quick speed, considering the large quantity of information stored. This apart, cluster analysis is also largely leveraged by knowledge-based systems, medical and bioinformatics, pattern recognition and computer science applications.

Figure 3 – Application Area Segmentation of IP Assets

There has been a significant increase in more sophisticated application areas such as cybersecurity, fuzzy systems, text processing and knowledge-based systems since 2015. While mathematical and statistical systems, speech recognition and computer science were leveraging cluster analysis greatly in the past, recent patent filing trends show a drop in these areas.

Cyber-attacks are constantly becoming more dynamic and stealthy. This makes them extremely difficult to detect. Cluster analysis can reveal hidden patterns of compromise, leading to even identification of benign or dormant malware. Similarly, medical history of hundreds of people can be used to distinguish between healthy people and people suffering from a specific condition.

Clustering is an unsupervised machine learning approach, to gain insights in the world of Big Data. Clustering could offer insights to improve forecasting, production planning, sourcing decisions, profit optimization, root cause analysis, and other areas that require decision making.

In the next article of this two-part series, we will discuss advancements in cluster technologies and the top players in this segment.

(Featured image is for representational purpose and has been sourced from https://pixabay.com/en/hacker-attack-mask-internet-1872291/)

Vipan Bavoria
Vipan Bavoria

Vipan trains the eagle’s eye when conducting patentability tests and invalidity searches. She enjoys the challenge of connecting the dots and making intelligible knowledge out of sparse data. Her intrinsic interests in reading contemporary fiction, classics, history, philosophy and psychology exercise her mind towards hidden IP aspects.


Post a Comment

Your email address will not be published. Required fields are marked *