Feeding the AI Beast

To learn and keep improving, AI needs large data sets. Of text. Images. Videos. Whatever. But already, writes Rahul Matthan:

“The trouble is that the availability of high-quality content needed for training these models is fast dwindling.”

 

There are multiple reasons for this. One, the cost of storage media keeps falling, so you can store more data at the same cost. The processing power of CPU/GPU’s keeps growing, so they can process more data in the same time. This combo, in turn, means that the rate at which the training data set is fed to the AI far exceeds the rate at which humans produce new content (that could be used as new training material).

 

Two, as the importance of data sets for AI is understood, Internet sites (which were the biggest source of such data) have started to put restrictions on how much data can be “scraped”.

 

Three, content producers have started to file lawsuits on copyright infringement, further reducing the available data to feed the AI. On a similar note, privacy concerns impose further restrictions.

 

The obvious solution was to create “synthetic” data to feed the AI, i.e., let one AI create content which serves as input to the second AI to learn, whose output can be used to train a third AI and so on.

“If successful, not only does this give us a virtually infinite supply of training data, it suffers from none of the intellectual property and data protection concerns that scraped content must contend with.”

 

Unfortunately, synthetic data doesn’t work as intended.

“(Already) both the quality and diversity of these underlying models have shown evidence of substantial degradation, a phenomenon the authors call Model Autophagy Disorder (or MAD for short).”

Apparently, there is no substitute for “fresh, real data”, i.e., human created content, if we want the AI to keep improving.

 

If you thought AI was moving and impacting multiple fields at an insane speed, well, it also seems to have hit a roadblock on how much further it could go. Everything happens at a ridiculously fast pace.

Comments

Popular posts from this blog

Student of the Year

Why we Deceive Ourselves

Handling of the Satyam Scam