Norbert Fuhr
PROBABILISTIC MODELS IN INFORMATION RETRIEVAL
Introduction The
intrinsic uncertainty of IR. Two approaches: Relevance models Proof-theoretic model
Relevance models A user
assigns relevance judgments to document w.r.t. his/her query.
The
IR systems yield the approximation of the set of relevant documents.
Some
models: BIR model, BII model, DIA model, etc…
Relevance models Binary
independence retrieval model
(BIR) A document d_m is composed of a set of
terms and represented as a vector. Assumptions: “cluster hypothesis”: Terms are distributed
differently within relevant and non-relevant documents. A query q_k is also a set of terms.
Relevance models
Relevance models
An example
Ranking is (1,1),(1,0),(0,1),(0,)
The probability ranking principle
Let C be the costs for the retrieval of a relevant document. for non-relevant documents.
Retrieve that document for which the expected costs of retrieval are a minimum.
Proof-theoretic model IR
is interpreted as uncertain inference.
A generation
of deductive databases:
queries and contents are treated as logical
formulas. The query has to be proved from the formulas. A document
is an answer for a query iffthe logic formula is true.
Jane Cleland-Huang, Reffaella Settimi, Oussama BenKhadra, Eugenia Berezhanskaya, Selvia Christina
GOAL-CENTRIC TRACEABILITY FOR MANAGING NONFUNCTIONAL
Non-Functional
Requirements (NFR) are
difficult to trace: Global impact upon a software system Extensive network of interdependencies and
trace-offs
Goal
centric traceability (GCT) approach:
NFRs are modeled as goals and
operationalizations within SIG.
Dynamically establish traces from impacted
functional design element to elements in SIG.
Softgoal Interdependency Graph
GCT Model
Impact
detection in GCT
Documents Queries Index terms
The
relevance of a document query q is pr( ,q)
to a
Jane Cleland-Huang, Reffaella Settimi, Chuan Duan, Xuchang Zou
UTILIZING SUPPORTING EVIDENCE TO IMPROVE DYNAMIC
Introduction Current
work
Recall level close to 90% Precision from 10% to 45%.
Target: Maintain recall level at least 90% Precision at least 20%
Introduction Three
strategies to improve the performance of dynamic requirements traceability: Hierarchical modeling Logical clustering of artifacts Semi-automated pruning of the probabilistic
network.
Enhancement strategies
Motivation Example
Hierarchical Enhancement
R3
label is “De-icing” Using hierarchical information in R3 -> R5 describe de-icing service. Similarly, C4 describe about truck maintenance service. The link between C4 and R5 is not correct !!!
Hierarchical Enhancement
Solution:
Build a DAG graph to display the direct
relationship between artifacts.
Results
Clustering Enhancements
Links
tend to occur
in clusters: q <-> d_j => higher prob that q <-> d q <-> q_i => higher prob that d <-> q Care
about relationship of sibling artifacts.
Clustering Enhancements
Solution
Clustering Enhancements
Evaluation
Graph Pruning Enhancement Observation: Word “schedule” used for both de-icing
schedule and truck maintenance schedules Query with “schedule” will returns artifacts from both domains make precision lower.
Graph Pruning Enhancement Solution: Utilize initial decision made by the analyst to
place constraints and improve precision in “problematic” area. Rules to place constrains:
1. One or more links between two groups are all rejected by an analyst. 2. Basic retrieval algorithm generated candidate links between two groups.
Graph Pruning Enhancement Evaluation