The text is taken from here.
No Free Lunch for Supervised Machine Learning
Hume (1739–1740) pointed out that ‘even after the observation of the frequent or constant conjunction of objects, we have no reason to draw any inference concerning any object beyond those of which we have had experience’. More recently, and with increasing rigour, Mitchell (1980), Schaffer (1994) and Wolpert (1996) showed that bias-free learning is futile.
Wolpert (1996) shows that in a noise-free scenario where the loss function is the misclassification rate, if one is interested in off-training-set error, then there are no a priori distinctions between learning algorithms.
More formally, where
d = training set;
m = number of elements in training set;
f = ‘target’ input-output relationships;
h = hypothesis (the algorithm's guess for f made in response to d); and
C = off-training-set ‘loss’ associated with f and h (‘generalization error’)
all algorithms are equivalent, on average, by any of the following measures of risk: E(C|d), E(C|m), E(C|f,d), or E(C|f,m).
How well you do is determined by how ‘aligned’ your learning algorithm P(h|d) is with the actual posterior, P(f|d).
Wolpert's result, in essence, formalizes Hume, extends him and calls the whole of science into question.
Can someone explain how is it possible "all algorithms are equivalent, on average, by E(C|f,d), or E(C|f,m)."
Correct me if I am wrong, but E(C|f, d) should be interpreted as average all learning algorithms given a fixed dataset and fixed problem (the labeling function f).
submitted by /u/Seiko-Senpai
[link] [comments]