BioloMICS menu

G - Algorithms

Seven different algorithms can be used to compare series of molecular weights values.

In all cases, the values are compared one by one.

The number of values in the source and the reference must be identical.

For the first 3 algorithms (SYM, ID and NEILI), when comparing two series of numbers: s1, s2, s3...s10 for the source and r1, r2, ...r10 for the reference, one first compares s1 and r1.

If s1 < r1-tolerance then s2 is compared with r1 and the similarity between S1 and R1 = Ss1r1 = 0.

If s1 > r1 + tolerance then r2 is compared with s1 and the similarity between S1 and R1 = Ss1r1 = 0.

If s1 >= r1 and s1 <= r1 + tolerance then s1 is close to r1 and considered as identical (similarity between S1 and R1 = Ss1r1 = 1).

The similarity is then equal to 1.

Si and Ri are then compared and a global similarity coefficient is obtained.

The sum of the similarities is computed and divided by a value that is function of the algorithm selected.

SYMPRO and IDPRO algorithms are working exactly as the SYM and the ID algorithms (respectively) except that the tolerance is a percentage of the reference molecular weight and not an absolute value.

Therefore, the two latter algorithms allow for a larger absolute tolerance for high molecular weights and vice-versa.

For the CLOSE algorithm, the shortest distance is computed for all the values of the source S.

They are all compared with the ones of the reference R.

The difference is then divided by the highest of the two values abs(S - R) / max(S, R).

For every band, only the best similarity is kept.

When all the comparisons are performed, an averaging of all the similarities is computed to obtain a global similarity coef.

In the CLOSESYM algorithm, the same comparison (as in the CLOSE algorithm) is performed for both the source and the reference values.

This makes the CLOSESYM algorithm symmetrical.

If S and R are the values of the source and of the reference respectively, then:

Smax: number of data points/values in the source

Rmax: number of data points/values in the reference

Siclose: distance between the Si and the closest value in the reference Rj divided by the maximum(Si, Ri)

Riclose: distance between the Ri and the closest value in the source Sj divided by the maximum(Si, Ri)

N: number of identical values

Nmax: total number of values (Smax + Rmax)

T: tolerance defined by the user

Algorithm	Tolerance	Coef.	Symmetric
SYM	[R - T, R + T]	N / (Nmax - N)	Yes
SYMPRO	[R* (1- T), R*(1 + T)]	N / (Nmax - N)	Yes
ID	[R - T, R + T]	N / Smax	No
IDPRO	[R* (1- T), R*(1 + T)]	N / Smax	No
NEILI	[R - T, R + T]	2N / Nmax	Yes
CLOSE	none	Sum (Siclose) / Smax	No
CLOSESYM	none	Sum (Siclose + Riclose) / Nmax	Yes
Pearson	none		Yes