Semantic Web Content Analysis A Study in Proximity-Based Collaborative Clustering
Contents… (1)
Semantic Web
(2) (3)
Proximity Based Collaborative Clustering
Experimental Studies
(4)
Semantic Web in Web Intelligence
Semantic Web
Semantic Web – Definition “is an evolving extension of the World Wide Web in which the semantics of information and services on the web is defined, making it possible for the web to understand and satisfy the requests of people and machines to use the web content.” The ultimate goal is to create a global mean for information exchange where the data will be available for processing from humans and machines
Semantic Web – Architecture URI = string that characterize a web resource XML = a user-defined syntax for web resources [No semantic issues] RDF = represent information and relations between web resources OWL = extend and enhance the RDF with more features Logic / Proof = semantic relations between data from lower levels under rules AND conclusions will be extracted from these rules Trust = information reliability tests / digital signatures etc
Semantic Web – RDF document Bob Dylan USA Columbia 10.90 1985 ….
RDF file
Semantic Web – Triple Form
Subject – Predicate – Object
Semantic Web – Graph Representation
Proximity Based Collaborative Clustering
Proximity Based Collaborative Clustering Collaborative Clustering …in keywords • Several Data sets (same objects – different features) • Process separately each data set • Collaboration at the level of the results (information granules) ---------------------------------------------------------------------------------------Proximity measure This mechanism allows us to use different number of clusters in the processing of each data set C
Prox i , j [ii ] = ∑ min(uki , ukj ) Æ matrix NxN k =1
Proximity Based Collaborative Clustering(2) The Algorithm Process
1) Compute X = [X1|X2|…|Xp] U = fcm(X, max(C1, C2,…,Cp))
DATA SETS
Prox(U) 2) For each { X[ii], C[ii] }
[ {X1, C1},{X2, C2}, …,{Xp, Cp} ]
U[ii] = fcm( X[ii], C[ii] ) Prox(U[ii]) 3) Repeat Optimization of index V Æ min
Proximity Based Collaborative Clustering(3) V = Prox(U) − Prox(U[1]) + Prox(U) − Prox(U[2]) + ... + Prox(U) − Prox(U[p]) We require that Prox(U) is made as close as possible to the matrices Prox(U[1]), Prox(U[2]),…Prox(U[p]) The optimization of V is carried out using a standard gradientbased mechanism
uij (iteration + 1) = uij (iteration) −
α ∂V N ∂ui , j
Experimental Studies
Experimental Studies Data Formulation 70 SWDs (RDF syntax) Grouping according to the main topic • 1-16 : docs with phone devices’ information • 17-34 : personal homepages • 35-51 : people and information about their workplace • 52-70 : semantic web area
Experimental Studies Data Formulation (cont…) Two Feature Spaces Semantic : a parser extracts the most relevant metadata, 12 in number Content-Based : a parser elicits the most meaningful words which represent the value assumed by metadata, and which are surrounded by the meta-tags. 10 in number
Experimental Studies(2) Data Formulation (cont…) We have to express these 2 feature spaces to 2 data matrices so we are able to start the clustering. “Semantic” Data Matrix : Rows = 70 SWDs and Columns = 12 semantic features “Content-Based” Data Matrix : Rows = 70 SWDs and Columns = 10 content-based features ---------------------------------------------------------------------------------------Each entry of these matrices represents the number of occurrences of the corresponding feature in the current document
Experimental Studies(3) Metadata-Features (1)airport:Airport (2)contact:nearestAirport (3)foaf:Person (4)foaf:knows (5)dc:title (6)foaf:Document (7)prf:NetworkCharacteristic (8)prf:HardwarePlatform (9)foaf:homepage (10)foaf:interest (11)prf:CcppAccept (12)foaf:Project
| | | | | | | | |
Content-based Features (1)2004-XX-XX (2)semantic web (3)web (4)network (5)internet (6)paper/document (7)project (8)UTF-8 (9)technology (10)information
particular cluster
distribution of the membership grades of certain docs
Comparison Issues A unique data set with the 70 SWDs and the 22 features (semantic and content-based) X[70x22] Æ Standard FCM with C = 4
Two different data sets and Proximity Based Collaborative Clustering X[70x12] and Y[70x10] and a global structure U
Comparison Issues “Proximity-Based VS Standard FCM” 1. Distribution of documents in Cluster 1 and 2 are similar 2. In prototype of cluster 2 the representative features (project, information) has higher values in Proximity-Based than FCM 1.40 > 0.54 n’ 2.35 > 0.79 3. FCM weakness Æ unable to discriminate the remaining documents in Cluster 3 and 4. Membership values close to each other - docs in range [35-60] Æ similar membership distribution for each component (Cluster 3, [< 0.44]) and docs in range [48-70] Æ same effect (Cluster 4, [max = 0.44]) - Proximity Based Æ (Cluster 3, [35-60], [ > 0.44, max = 0.72] and (Cluster 4, [48-70], [ > 0.44])
Comparison Issues(2) “Proximity-Based VS Standard FCM” 4. Documents in range [61-70]
The contribution of metadata clustering Proximity-Based Æ Cluster 2 FCM Æ NOT appear in Cluster 2 Proximity-Based collaborative clustering better reflects the partitioning realized in the individual clustering.
Prototypes, Proximity-Based and FCM
(2)
Proximity-Based Prototypes
FCM Prototypes
(2)
(1) S T A N D A R D F C M
(4)
values < 0.44
(3)
P R O X I M I T Y B A S E D
(1)
(4)
values > 0.44
(3)
Semantic Web in Web Intelligence
Semantic Web in Web Intelligence Data : “refers to a collection of natural phenomena descriptors, including the results of experience, observation or experiment, or a set of premises.” Information : “is the interpretation of the results came from data processing”
Web Until Now
---------------------------------------------------------------------------------------Knowledge “well, there are more than one definitions” “We have to extract the hidden knowledge from web and build an extension. Make the “new” web understandable not only for humans but also for machines”
Semantic Web
Semantic Web in Web Intelligence(2) • Web Intelligence : “exploits Artificial Intelligence (AI) and advanced Information Technology (IT) on the Web and Internet” • Semantic Web needs standards for both syntactic and semantic content Æ Ontology is a solution • Ontologies will enable Web-based knowledge processing, sharing, and reuse between applications. Also they’ll play a major role in supporting information exchange processes.
Semantic Web in Web Intelligence(3) The roles of ontologies for Web intelligence : • communication between Web communities • agents communication based on semantics • knowledge-based Web retrieval • understanding Web contents in a semantic way • web community discovery (implicitly-defined community)
Conclusions There are many types of algorithms that belong to the field of Computing Intelligence which have been used for problem solving. The web is expanding with great speed, while searching for new information organization techniques and knowledge extraction from these. So, why should we not advance to applications of classical algorithms from the field of Computing Intelligence in order to solve some of the existing problems?
Welcome to the world of Web Intelligence.
Thank you!