AI and the Law of Diminishing Marginal Returns

08-12-2024

Economics

The Law of Diminishing Marginal Returns is a fundamental mechanism. It is essential knowledge if you ever studied economics. Simply put, the higher your total investment, the same increase in investment will result in a lower increase of return.

Generalization

This "law" holds in many cases. Take any ability. When you start from scratch, progress is fast. Every extra hour of training shows in your mastering of the ability. After some time, progress seems to come to a halt, if the trend is not declining. It takes weeks to improve further. Even if your progress model is more like a staircase, the flat parts of it tend to become longer and longer, and the step up lower and lower.

Statistics

In statistics you have a similar phenomenon with a solid theoretic background. It baffles people often that the sample size required for good predictions does not depend that much on the size of the population. If you have good, non-biased sampling, prediction is going to be very decent very soon.

The way it can easily be understood is that the quality of the prediction depends on the correctness of determining the parameters of the distribution. Let us use the Gaus curve distribution as an example. You need to have two parameters right to determine the curve and any probabilities you would like to calculate: the mean and the standard deviation. It does not matter if the curve is covering 100 or 100 million points. How many points do I need to give you before you start having a good idea on how the curve will be? With 30 points you could probably draw something that is close. The computational values for mean value and standard deviation are very close to correct.

Suppose you have 100 samples and draw the Gaus curve. How much do you expect it to change if I add another 100 samples? Not noticeable. The point is reached that adding more data is doing (almost) nothing to improve what you got already.

Even more important: any precision you gain is irrelevant as the result is used to estimate (!) probabilities accepting some uncertainty.

Will AI hit a ceiling?

The current success of AI is based on massive data. It uses an approach that is based on deriving probabilities. That is why some people say it is just statistics (I agree). That explains the link with my previous point.

The fantastic progress AI could make relied on processing never seen before quantities of data. For most topics in languages more texts will not change things substantially. We hit the bad area for the law of diminishing returns.

Voice recognition and generation are also in the performant region. Further improvements will not stupefy people anymore. Let us not forget to thank Fourier for his contribution.

For other things like music or pictures improvements are becoming harder and harder, yet, there may still be room for improvement, for now.

It seems to dawn in the mind of the AI people that the law they hit is not a small barrier on the road to world domination of AI. The processing to reach fair results for most topics may have been reached or will be reached soon. A significant step forward demands something extra.

One way to look at it is that to move beyond the average knowledge about topics, you must have specialist information. That information will likely correct or even contradict the majority vote on crucial aspects, their input might look like outliers. Including outliers is a slippery path to take.

Specialist information is valuable knowledge. Attempts to include it may also heat up the debate on intellectual property rights.

Addendum: statistics and sampling

The figures show 3 curves:

- Orange: the "real" distribution

- Green: the computed distribution

- Blue: the actual sample data

Notice the improvements going from 10 samples to 30 samples. Also, the computed distribution with only 10 samples is not that bad. This example was selected after a number of trials, because it shows more significant differences than typical.

The improvements of the computed versus the actual distribution become nearly invisible after 30 samples:

This is a specific situation: we deal with numerical data, and we know/assume an important property: the data has a Gauss curve distribution.

Even if we would not know the kind of distribution, it will start appearing relatively fast. That effect is likely occurring in AI learning as well.