journal6 ›› 2011, Vol. 32 ›› Issue (6): 55-58.

• Information and Engineering • Previous Articles     Next Articles

Representative Structures from XML Documents Based on Clustering Techniques


  1. (Software & Outsourcing Institute,Jishou University,Zhangjiajie 427000,Hunan China)
  • Online:2011-11-25 Published:2012-03-22

Abstract: Since an XML document can be represented as a tree structure,the problem how to cluster a collection of XML documents can be considered as how to cluster a collection of tree-structured documents.The author used SOM (Self-Organizing Map) with the Jaccard coefficient to cluster XML documents.Then,an efficient sequential mining method called GST was applied to find maximum frequent sequences.Finally,the author merged the maximum frequent sequences to produce the common structures in a cluster.

Key words: XML document, tree-structured, clustering, sequential pattern mining, common structure

WeChat e-book chaoxing Mobile QQ