A Future Of Deep Learning Engineering
Deep learning models have evolved in design, performance and in how much data they ingest and process. However, its felt like the leading-edge models were still laboratory experiments, with a few wide-reaching use-cases (i.e. self-driving cars). Then in late 2022 we got to experience ChatGPT. This was the first wide-reaching technology powered by deep learning that everyone intuitively knew was transformative. You’d sign-in, start a chat, then instantly begin getting value from it. This is a model that you can see being applied to almost every area of our lives — a technology where you can get contextual answers to specific questions. It seems like the closest thing we’ve gotten towards general artificial intelligence… or at least it feels like it. I personally don’t believe the model has genuine human-like intelligence — it just reflects it very well.
ChatGPT had massive results because the model that powers it (GPT-3) is massive, and it was trained on a massive dataset (45TB), enabled by a massive budget ($4.6M). Because the dataset was so large and diverse, GPT-3 was able to capture and generalise a huge space of patterns. Natural language is the surface structure of the deep structure of our thoughts, ideas, beliefs, feelings and memories. Most natural language models learn surface structure patterns (grammar, syntax, etc). However, along with GPT’s transformer model, it was large enough to learn deep structure patterns within the massive dataset. In other words, it was able to dive deeper than language itself and learn the patterns of human thought. If this is true, then GPT-3 contains abstract thought patterns very similar to our own. I don’t mean to say it’s sentient or anything like that. Instead, it’s ‘just’ an artefact of a large 45TB dataset that contained human thought encoded as text. If that’s what makes GPT-3 stand out, then could these ‘thought patterns’ be abstracted out and refined into composable ‘micro-model’ components?
ChatGPT may have anecdotally proven that computers can emulate human thought processes. Mental processes we perform implicitly such as parsing, interpreting, reasoning, validating, critiquing, analysing, planning, calculating, creating, etc. can be encoded as patterns at various levels of abstraction within a model. Could each ‘mental process’ be separated out into different models?
The GPT-3 model cost a lot of money to train and required a lot of data. This suggests that only large tech companies will be able to develop cutting-edge models. This might not be true. What we’re really interested in are the abstract ‘thought patterns’ that emerged from the training. Its these patterns which can be generically applied to many contexts then fine tuned. The question is — can we abstract these ‘cognitive’ patterns and package them into component parts? Or can we design and train new models with the objective to obtain a subset of these patterns? In other words, can we break down a massive monolithic model like GPT-3 into smaller interoperable abstract components? If so, we can then fine-tune each component, then compose them into composite models to suit our specific use cases. After all, we don’t need our models to do everything at a mediocre calibre, we want them to do a few things very well.
The proliferation of this approach could achieved within the open source ecosystem. The number of components could be vast, akin to the large selection of electrical components an electrical engineer may choose from. The large private tech companies simply can’t produce the vast range and diversity of micro-models that the wider community can. Furthermore, the large repository of open source models will empower all businesses, communities and individuals to leverage this new era of computing to solve problems they couldn’t before. It’s a freedom that open source provides to the world, unlike paywalled products. This is already happening (ie HuggingFace) but not at the abstract & fine-grained level we may see in the future.
This feels like a whole new territory of computing we’ve finally stepped into… and it’s very exciting! Sure, there will be misuse, just like with everything else, but the wider community tends to buffer against the threats made by the few. It’s an on-going evolution.
So, if we’re able to distill ‘cognitive patterns’ out of datasets and into small (yet effective) models, then we could see technologies we’ve only ever dreamed of. On top of that, we’ll see an explosion of novel use cases and solutions emerge from these new capabilities.