Generative AI - Session 1 - How did we get here?

Research into AI extends way back to the 1950’s but the current paradigm of deep learning using neural networks using a large scale statistical approach only started in earnest in 2010.

Prior to 2010, most of the approaches were rule-based expert systems or logic-based symbolic AI systems. It’s too involved to go into those here but suffice to say that they key difference between these approaches and the current Deep Learning paradigm is the use of data and letting the machine derive the patterns by itself rather than having humans explicitly tell it what the rules of the world are.

While many of the ideas that underly the current Deep Learning paradigm were conceived decades ago, it was the confluence of these three key factors that fuelled the current wave of AI developments starting in 2010’s:

Data: availability of large amounts of data in digital form (thanks to the internet and mobile phones - images, text were available to be freely collected and used to train models)
Compute: increase in computing processing power that can handle the training of large complex models with many layers of neurons
Architecture: New ideas about how to connect neurons and how to set the optimization objective of the models that helped to unlock performance.

Between 2012 and 2016: Deep Learning for Domain Specific Applications

During this time, much progress was made in the creation of large structured datasets which involved collecting the data off the internet and then having humans tag them in a consistent way such that the machines can learn. For example, Fei Fei Li was instrumental in collecting 14 million images, tagging each of them with the correct class, out of thousands of classes (e.g. dog, cat, cup, chair etc.)
These datasets also offered a way in which AI researchers can objectively measure the performance of their models and compare across different research teams. This led to the creation of competitions which drove waves of innovations to attain greater and greater accuracy and performance against these well-specified tasks and evaluation methods.
During this time, AI researchers tended to focus on specific tracks of development - for example, image process to make sense of images, natural language processing to make sense of text and reinforcement learning for playing games/self-driving cars. For example:

Convolutional Neural Networks for Image Recognition (CNN) 2012 - AlexNet
Generative Adversarial Networks (GAN) 2014: The introduction of generative adversarial networks by Ian Goodfellow and his colleagues in 2014 revolutionized the field of generative modeling. GANs have since proven instrumental in generating realistic images, transforming the fields of computer vision, image synthesis, and data augmentation.
Residual Networks (ResNets) 2015: ResNets introduced the concept of skip connections, allowing neural networks to effectively train much deeper architectures, solving the vanishing gradient problem. The skip connections enabled the flow of gradients through the network, alleviating the vanishing gradient problem and facilitating the training of extremely deep neural networks. ResNets have since become a standard building block in many deep learning models.
Reinforcement Learning 2016: In 2016, Google's AlphaGo, powered by deep reinforcement learning, defeated the world champion Go player, Lee Sedol. This landmark event showcased the potential of deep learning in complex decision-making tasks and demonstrated the ability of AI systems to outperform human experts in strategic games. AlphaGo's success highlighted the power of deep learning beyond image classification and sparked further research in reinforcement learning.

In 2017 to now: Convergence and rise of Foundational Models

In 2017, a seminal paper emerged from Google by Vaswani which introduced the “Transformer Architecture”. This achitectural innovation gave rise to GPT (Generative Pre-trained Transformer) models - the ones that chatGPT is based on.

These models, often referred to as "Foundational Models," serve as powerful starting points for various AI tasks. Instead of having separate architecture and training with separate labelled data for every task, a single massive model is trained on vast amounts of unlabeled data (texts pulled off the internet) and then fine-tuned on specific tasks using labeled data.

Based on this approach, OpenAI further enhanced it’s models using more and more parameters as well as a technique called “Reinforcement Learning with Human Feedback” (RLHF) to shape the output of the model to align with what the human trainers approve of. RLHF has been likened to a dog clicker - every time the dog does something good, the clicker is pressed and triggers a reward signal to reinforce that behavior.

GPT-2:

Released: 2019
Parameters: Up to 1.5 billion
Capabilities: GPT-2 demonstrated impressive language generation capabilities, capable of generating coherent and contextually relevant text. It received attention for its ability to generate high-quality, human-like responses.

GPT-3:

Released: 2020
Parameters: 175 billion (considered one of the largest models at the time)
Capabilities: GPT-3 showcased a substantial leap in size and performance compared to GPT-2. It demonstrated enhanced language understanding and generation abilities, offering more coherent and contextually appropriate responses. GPT-3 impressed with its versatility across a wide range of tasks and its ability to exhibit few-shot and even zero-shot learning capabilities.

Future Developments:

While GPT 4 is already very impressive, one of the major deficiencies is the lack of explainability in how it works and difficulty in controlling hallucinations or the generation of false statements. This makes it hard for humans to completely trust the output.

Some areas of ongoing development where we can expect progress in coming months and years include:

multi-modality: Ability to accept prompts which include images, audio, video and text as well as output the same.
real time connectivity: Currently chatGPT has a cut-off date of 2021. Anything that happened after this date, it has no knowledge of. Future models are expected to be able to be more integrated with real-time happenings
Sources / references: to increase trust in the reliability of the outputs, it is expected that future models may do better jobs at linking to references.
Execution via plugins: Currently chatGPT only outputs text but cannot “do” anything. With the release of plugins, the AI can integrate with 3rd party services to take actions on a user’s behalf. For example, find prices for a hotel room and book it.
Greater explainability (”explainable AI”): There is a stream of research related to how to create powerful AI that is more explicit about how it works therefore creating providing greater control to the human creators.