BioloMICS logo
×
BioloMICS menu

G - Algorithms

 
Seven different algorithms can be used to compare series of molecular weights values.
 
In all cases, the values are compared one by one.
 
The number of values in the source and the reference must be identical.
 
For the first 3 algorithms (SYM, ID and NEILI), when comparing two series of numbers: s1, s2, s3...s10 for the source and r1, r2, ...r10 for the reference, one first compares s1 and r1.
If s1 < r1-tolerance then s2 is compared with r1 and the similarity between S1 and R1 = Ss1r1 = 0.
If s1 > r1 + tolerance then r2 is compared with s1 and the similarity between S1 and R1 = Ss1r1 = 0.
If s1 >= r1 and s1 <= r1 + tolerance then s1 is close to r1 and considered as identical (similarity between S1 and R1 = Ss1r1 = 1).
The similarity is then equal to 1.
Si and Ri are then compared and a global similarity coefficient is obtained.
The sum of the similarities is computed and divided by a value that is function of the algorithm selected.
 
SYMPRO and IDPRO algorithms are working exactly as the SYM and the ID algorithms (respectively) except that the tolerance is a percentage of the reference molecular weight and not an absolute value.
Therefore, the two latter algorithms allow for a larger absolute tolerance for high molecular weights and vice-versa.
 
For the CLOSE algorithm, the shortest distance is computed for all the values of the source S.
They are all compared with the ones of the reference R.
The difference is then divided by the highest of the two values abs(S - R) / max(S, R).
For every band, only the best similarity is kept.
When all the comparisons are performed, an averaging of all the similarities is computed to obtain a global similarity coef.
 
In the CLOSESYM algorithm, the same comparison (as in the CLOSE algorithm) is performed for both the source and the reference values.
This makes the CLOSESYM algorithm symmetrical.
If S and R are the values of the source and of the reference respectively, then:
Smax: number of data points/values in the source
Rmax: number of data points/values in the reference
Siclose: distance between the Si and the closest value in the reference Rj divided by the maximum(Si, Ri)
Riclose: distance between the Ri and the closest value in the source Sj divided by the maximum(Si, Ri)
N: number of identical values
Nmax: total number of values (Smax + Rmax)
T: tolerance defined by the user
 
 
Algorithm
Tolerance
Coef.
Symmetric
SYM
[R - T, R + T]
N / (Nmax - N)
Yes
SYMPRO
[R* (1- T), R*(1 + T)]
N / (Nmax - N)
Yes
ID
[R - T, R + T]
N / Smax
No
IDPRO
[R* (1- T), R*(1 + T)]
N / Smax
No
NEILI
[R - T, R + T]
2N / Nmax
Yes
CLOSE
none
Sum (Siclose) / Smax
No
CLOSESYM
none
Sum (Siclose + Riclose) / Nmax
Yes
Pearson
none
Yes