Appendix A – Some basic IR definitions

Two basic metrics used to quantify success in an IR task are Recall and precision. They are given here as they are used in several references without explanation, and are defined:

Indices for evaluating Ontology coverage are presented in Extracting Ontologies from Software Documentation: a Semi-Automatic Method and its Evaluation, for evaluation of the success of a semi-automatically created Ontology in contrast to a manually created one (the gold standard).

The lexical overlap (LO) equals to the ratio of the number of concepts shared by both Ontologies and the number of concepts we wish to extract:

Here LO1 is the set of all the concepts extracted by the tested method and LO2 the set of concepts of the Gold Standard.

The Ontology improvement (OI) equals the ratio of new concepts extracted by the tested nethod (expressed as the set difference between extracted and desired pairs) and all pairs of the gold standard Ontology.

The Salton Index is an important measure of co-occurrence which is not biased by naturally high occurrence of certain keywords. It is defined as:

Cxy - The number of co-occurrences of x and y.
Cx - is the number of occurrences of x.
Cy - is the number of occurrences
of y.

