ChatGPT, NLP, & AI: Where is this train headed, and who is steering?
Few people grasp the significance of the unbelievable advances that were made in just days from ChatGPT 3.5 to 4, not to mention everything that came before the beginning of 2023, just 3 short months ago. But for those of us who do have an inkling of a clue, it is borderline terrifying and wonderfully exciting at the same time.
I have been obsessed with this field since I was a VERY small child. It all started with Star Wars, Star Trek, and Johnny 5 (from the original Short Circuit film, although I admittedly loved both the original and the sequel). I never thought I would actually be working with the tools that make all this possible. And to be clear: it is not the concept of consciousness within AI that I find all that intriguing, although I do feel that we are currently crossing a boundary that we naive humans did not believe even existed aside from the fantasies of Hollywood films. I am more interested in what this all means for how data is utilized, distributed, and digested.
I was a little late jumping onto the ChatGPT train. It caused such a HUGE uproar that I felt surely it cannot be as earth-shattering as all the hype made it sound. I am just that kind of girl. If everyone is in love with something, I assume it will just come off as lame to me and will disappoint me. But in many ways, I have come to find out that ChatGPT is actually as amazing as many people say it is. It is not perfect, not at all. I get plenty of nonsense in response to my queries. When I inquire about ways to configure my code, I almost ALWAYS get responses that include parameters that are not recognized by modules, etc. But next month, this technology will most likely be light-years beyond what it is today, that is if we keep going at the pace we currently are.
The downside: It only took me less than a day of working with the ChatGPT API, during which I sadly expended my complementary grant of $18. And when I looked at the fees for using the API (I prefer to do things in code...), I decided to get ChatGPT Plus and just work outside of the API for now and deal with the whole internet browser experience. But that cuts me off from the research I would be able to do if I had unlimited access to the API. Is that how it should be? Should this technology only be available to those who have thousands of dollars per month, or even per week depending on the magnitude of the work and research, to spend on it?
I personally do not believe it should be the case that only the rich and powerfully-backed individuals get access to this technology, so I have spent a good fraction of my time since getting on this train just trying to work out alternatives to the ChatGPT API. And I have found some hope. I will share with you one place where I have found said glimmer of shining light: Dalai, Llama, and Alpaca.
For full details on these advancements, check out the following:
- Stanford Alpaca - Stanford's answer to ChatGPT that rivals the original and was trained for only $600
- Meta's Llama - Unfortunately, almost as exclusionary as ChatGPT, but the weights have been leaked. So you no longer have to prove your research's viability to Meta in order to be able to work with the technology.
- Dalai - Some really great folks who decided Meta and OpenAI shouldn't keep us from working with these models.
Here is what I have come up with so far, in the past day or so working on this predicament. It, again, is far from perfect. But I do feel I am getting somewhere.
If you visit the Dalai link above, you will find the GitHub repository that helps make it possible to work with Llama and Alpaca. I decided to take this approach and create my own mini-API that I can run on my local machine. This is just the beginning. But I decided to go ahead and post at this point, because that is how excited I am about the advancements to come. I just want to document the whole process.
The code above makes it possible to query the Alpaca 7B model after having installed it from the Dalai GitHub repository. I am currently in the process of just exploring how these models are configured and how to work with them for the projects in which I am currently involved. Next will come fine-tuning and adapting the models as needed for my research and data.
The exciting part is that I can get my hands...and more importantly, my CODE...on this technology. While interacting in a browser window is fine for most people, it is slow and not good enough for someone who wants to implement this technology in research and projects. So this is my first step to that effect.
Check back soon for more! I am working tirelessly on this every single day. I look forward to sharing more very soon!