AI: What's the Hype?
February 28, 2025
AI, specifically deep learning, has unsurprisingly been dominating the tech scene for the last few years. It has led to rapid growth in almost every data-driven industry, moving forward technology at a massive scale. To understand why companies like NVIDIA have skyrocketed to success and the “verbification” of ChatGPT, we must first understand the underlying architecture that allows AI to function.
Since as long ago as the 1950s and ‘60s, the general field of machine learning has been developing. Early algorithms included decision trees, nearest neighbors, and perceptrons, all of which remain widely used today. Yet, amidst this dawning age of machine learning, there was no awakening for the potential of the technology to seep into every aspect of life. Deep learning, responsible for the now monumental rise of AI, was seen as a dead end until around 2010, when computing power caught up to the technology itself. Until that point, while effective, neural networks were not able to solve tasks efficiently and were very computationally expensive. One of the first groundbreaking neural networks for image classification on the famous MNIST dataset, LeNet, took days at a time to train. Considering the simplicity of the model and the dataset, while an important discovery, it was simply not practical at the time to be applied to more realistic and compute-heavy tasks. The most complex version of LeNet, LeNet-5, has about 60,000 trainable parameters. For reference, ChatGPT, the holy grail of AI today, has around 175 billion trainable parameters. So, when powerful GPUs started to come around, the deep learning sphere expanded tremendously.
A critical question that many seem to discount is why GPUs matter in this process. Yes, they greatly increase efficiency, but a deeper understanding of how they do it may also provide more insight on why NVIDIA’s marketing as a company has become so centered around AI. Training a machine learning model, especially a deep neural network, takes lots of time and computation. None of the mathematical operations that underlie neural networks are very complex in the grand scheme of complexity. However, the sheer volume of these operations become very time consuming. Neural networks are all about matrix multiplication. Each perceptron has a weights matrix associated with it which is multiplied by the input that enters it. When a model has thousands of these perceptrons, thousands or hundreds of thousands of inputs, all to be iterated over multiple times, time, not innovation, becomes the limiting factor to success. Unlike CPUs which can only handle one process at a time, like a mathematical operation, a GPU can run thousands of processes in parallel, meaning thousands of times faster results. This revelation for deep learning has allowed it to blossom and create the tools that it was previously gated from. With the floodgates open, over only the past 15 years, countless revolutionary advancements have been made in the field of deep learning with many more to come.