BioloMICS logo
×
BioloMICS menu

O - Algorithms

 
OFields are compared by computing how many parents they have in common, divided by the length of the path.
 
The root item (always with Id = 0 in the database) is not taken into account.
 
Example:
 
Using the following data set:
Id
Record name
OField full path, including the record Id
0
Root (not in the database, but the root of all records in memory)
-
1
Europe
0.1
2
North America
0.2
3
South America
0.3
4
Asia
0.4
7
Belgium
0.1.7
8
Germany
0.1.8
10
USA
0.2.10
43
Brussels region
0.1.7.43
44
Walloon region
0.1.7.44
52
Brussels city
0.1.7.43.52
65
Namur province
0.1.7.44
148
Gesves
0.1.7.44.148
 
Comparing Europe with North America will give a similarity of 0. Path of Europe is "0.1" where path of North America is "0.2". Removing the root value of zero shows that no common value remains.
 
There are three algorithms available:
  • Classification algorithm (Similarity=common values/longuest path)
  • Proportional identification algorithm (Similarity=common values/shortest path)
  • Identification algorithm (Similarity=presence of most detailed source value in path of reference)
 
The table below shows a few possible comparisons. The part of the path used to compute the similarity is given in bold.
Source record
Ref record
Source path (no root)
Ref path (no root)
Classification algorithm (Similarity=common values/longuest path)
Proportional identification algorithm (Similarity=common values/shortest path)
Identification algorithm (Similarity=presence of most detailed source value in path of reference)
Gesves
Belgium
1.7.44.148
1.7
2/4=0.5
2/2=1.0
148 not in 1.7 so similarity=0.0
Belgium
Gesves
1.7
1.7.44.148
2/4=0.5
2/2=1.0
7 is in 1.7.44.148 so similarity =1.0
USA
Europe
2.10
1
0
0
0
Belgium
Germany
1.7
1.8
1/2 = 0.5
1/2 = 0.5
7 is in 1.8 so similarity =0.0
Village 2
Village 2
1.7.44.108.215.98.132
1.7.44.108.117
4/8 = 0.5
4/5 = 0.8
132 is not in 1.7.44.108.117 so similarity = 0.0
 
We see that the similarity is the number of identical parents divided by the possible number of parents, which is in fact the length of the smaller path between the two records being compared.
 
The root is always ignored in the comparison, as it exist for all records.