Recent Trends Of Knowledge Extraction And Decision-Making Process Using Web Mining

With the improvement of huge repository of information called as WEB, information isn’t constrained to one computer and can be stored, accessed and updated from any computer situated in any corner of the world. Information in web doesn’t stick to one single normal configuration on account of which mining and preprocessing the humongous data is an absolute necessity. The data is accessible and there are many hidden patterns which can’t be identified at one go. Data mining is required, which mechanizes the task of analyzing the information based on the specific point of view to solve the problem statement.

In brief (Highlights)


  • Data extraction: It is important to know whether the data being mined is structured or unstructured and accordingly, machine learning and automatic extraction strategies can be utilized. Likewise, a few data will be incorrect or fragmented and should be inspected with extraordinary exactness. Individual data in the Web must be appropriately shielded from unauthorized access.
  • Detecting Noises: Very frequently, the primary content of any webpage goes unnoticed because of surplus measure of hyperlinks, advertisements, copyright notices and so on in the web page. So extracting valuable information is a monotonous however an essential procedure.
  • Information integration & schema matching: Many websites and many websites provide different information so categorizing, classifying based on their similar data are classified and categorized from data ware.
  • Opinion extraction: It isn’t anything but difficult to translate the tone of opinions gathered from different chat rooms, discussion forums and online journals; misinterpretation of data assembled will give a totally unique outcome on examination.
  • Knowledge synthesis: The point here is to organize odds and ends of information spread around the web and receive something significant in return

Types of Web Mining


Current progress in every one of the three unique sorts of web mining are surveyed in the classes of web content mining, web usage mining, and web structure mining. For each research work, we look at such key issues as web mining process, strategies/methods, applications, data sources, and programming utilized. In contrast to past research, we separate web mining forms into the five subtasks:

  • Resource finding and retrieving
  • Visualization
  • Patterns analysis and recognition
  • Information and selection preprocessing
  • Validation and interpretation

Background


The data information is classified in to two types shallow web and deep web. Data in the deep web are often inaccessible to the search engine and it can only accessed by website’s interface while data in the shallow web can be accessed by the search engines (Purandare, 2008). In the data mining networks, there are three sorts of mining: data mining, web mining and text mining. Web mining lies in the middle of data mining and text mining, and adapts for the most part to semi-structured data and/or unstructured data. It very well may be recognized in the three unique sorts of categories: web content mining, web usage mining and web structure mining. Web content mining is performed by extracting important information from the content of a webpage/website (Zhang & Segall, 2008). Web mining is a piece of both information extraction and information retrieval. Web mining supports machine learning since it improves the classification of text. The principle point of web mining is to remove information. Web mining is integration of information that is accumulated by customary data mining systems with information assembled over World Wide (Johnson & Kumar Gupta, 2012). The research on web optimization includes some parameters like third party optimization algorithms, storage devices and dynamic search results so to improve search result technique like K Nearest Neighboring approach has been used for the data extraction (Raghavendra & Mohan, 2019).While extracting the data, huge data has been generated so there is a need for interesting  techniques , algorithms and patterns for mining such data (Nazir, Asif, & Ahmad, 2019).

Taxonomy of Web Mining

Web Structure Mining


The method toward discovering the structured data from web is referred to as web structure mining. This mining may be performed either document level or hyperlink level. The hyperlinks offer a clear route and purpose to the pages. This can be utilized to recover the important data in form of structure.

Web Content Mining


Web content mining information could be structured or unstructured/semi-structured even web is unstructured. It’s the method of retrieving information from web into a more of structured forms and classifying the data to retrieve quickly or finding valuable information from online page or web documents. Online page mining includes the online documents which contain text, HTML, multimedia documents, i.e., images, audio, video and sound etc. The search result mining contains online search results. It’d be structure documents or unstructured documents.

Web Usage Mining


Web usage mining is employed to find the usage patterns type from the usage data. This includes server data (IP address), Application server (web logic), and Application level (events). This can be otherwise a discovery of substantive patterns from data generated by client-server transactions on one or additional web localities. The source of the data is access logs, referrer logs, agent logs, and client-side cookies.

Web challenges and future works


Web usage mining essentially has numerous advantages which make this technology attractive to corporations including the government agencies. This technology has enabled e-commerce to do personalized marketing, which eventually results in higher trade volume. This technology is mostly criticized only for their privacy concern, and it is said that it is lost when an individual involving in the web usage obtained is used or disseminated or it occurs without the concern of the individual. In future usage of the data on the web grows rapidly, so research on the study is carried out more on process mining, web mining, privacy and threat and fraud analysis.

Web mining is often applied to perceive the behaviour of those services, and also the data extracted are often helpful for numerous kinds of optimizations. The self-made application of web mining for prophetical prefetching of pages by a browser has been demonstrated. It’s necessary to try and do analysis of weblogs for web services performance improvement. Analysis is required in developing web mining techniques to boost numerous different aspects of internet services. 

References

  1. Johnson, F., & Kumar Gupta, S. (2012). Web Content Mining Techniques: A Survey. International Journal of Computer Applications, 47(11), 44–50.
  2. Nazir, S., Asif, M., & Ahmad, S. (2019). The Evolution of Trends and Techniques used for Data Mining. 2019 2nd International Conference on Advancements in Computational Sciences .
  3. .Purandare, P. (2008). Web Mining: A Key to Improve Business on Web.
  4. . Raghavendra, T. S., & Mohan, K. G. (2019). Web Mining and Minimization Framework Design on Sentimental Analysis for Social Tweets Using Machine Learning. Procedia Computer Science, 152, 230–235.
  5. Zhang, Q., & Segall, R. S. (2008). Web Mining: A Survey Of Current Research, Techniques, And Software. International Journal of Information Technology & Decision Making, 07(04), 683–720.
X