GPT Weekly
Posts
Llama 2 release, GPT-4 Performance, ChatGPT custom instructions and more

Llama 2 release, GPT-4 Performance, ChatGPT custom instructions and more

GPT Weekly
July 24, 2023

Happy Monday!

This week we’ve got:

🔥Top 3 news: Llama 2 - the free commercial model, GPT-4 Performance, OpenAI releases
🗞️Interesting reads - AI regulation in different countries, Fall of Customer Service, Coming war for on-device LLMs and more
🧑‍🎓Learning - Llama 2 resources: Easiest way to use Llama 2 on Windows, Mac and Ubuntu, Train Llama 2 on local machine and deploying it on M1/M2 Mac

Let’s get started.

🔥Top 3 AI news in the past week

1. Open-source Llama Model

After weeks of waiting, Llama-2 finally dropped.

Salient Features:

Llama 2 was trained on 40% more data than LLaMA 1 and has double the context length
Three model sizes available - 7B, 13B, 70B. Pretrained on 2 trillion tokens and 4096 context length.
Outperforms other open source LLMs on various benchmarks like HumanEval, one of the popular benchmarks.
Partnership with Microsoft. It seems Microsoft has a finger in every LLM pie.

This announcement has its share of controversies.

First, despite what Meta says the model isn’t open-source. There are restrictions on certain users and usage.

Another perspective is that Meta’s constant use of “open source” might be confusing but it’s okay. Laymen wouldn’t get the difference between open weights and open source.

Second, the restriction on using Llama 2’s output. Meta doesn’t want anyone to use Llama 2’s output to train and improve other LLMs. This is hypocritical and impossible to track.

The hypocrisy comes from the fact that they have been using other’s data to train their LLM but don’t want others to do the same.

The tracking challenge is how do they ensure no one is using synthetic data from Llama 2? Unless there is a whistleblower this is going to be impossible.

This is important because even Microsoft and OpenAI realize there is a limit to human data to train LLM.

The main takeaway Open-source and commercial-free models are the future. Llama 2 is a step in the right direction. Hopefully, OpenAI is going to release their own version soon.

2. GPT-4 Performance

Discussion on GPT-4’s performance has been on everyone’s mind. A lot of people keep saying it is dumber but either don’t have proof or their proof doesn’t work because of the non-deterministic nature of GPT-4 response. There is always a chance that one response is dumber than the other.

Last week a study tried to quantify and measure GPT-4’s performance over the past 4 months. While the title of the study is “How is ChatGPT's behavior changing over time” many took this as proof that GPT-4 has deteriorated.

To measure GPT-4 performance authors used snapshots. OpenAI maintains two snapshots of GPT-4 - a March version and a June version. The authors used a set of standard questions to measure the performance variability.

The authors used 500 math problems and chain-of-thought prompts on both the versions. The March version got 488 questions right (97.6%) while the June verison got only 12 questions right. That is 97.6% right in March vs 2.7% right in June.

They also used 50 coding questions from LeetCode to measure the programming performance. And measured how many GPT-4 answers ran without any changes. For March version 52% of the generated code worked and for June version 10% of the code worked.

Lot of people took this as proof that GPT-4 performance has gone down. But there is no claim of performance degradation.

This study shows there is inherent bias in the March vs June model. For the math problem where the worst performance was seen, one keeps saying every number is prime while the other model says every number is composite. So, the performance might not be bad overall just that there is training bias.

3. OpenAI Releases and Announcements

Last week we saw important announcements from OpenAI:

First, the Android app is one the way. You need to pre-register to download it as soon as it is available.

Second, custom instructions for ChatGPT. Have you ever wanted ChatGPT to respond in a particular way? Like every response needs to have: A pros and cons list. Or a bullet points list? Or respond in a particular tone and tenor? Now instead of telling ChatGPT - “Now respond in X voice or respond with bullet points” during the chat, you make it a default setting.

You can enable this by going to Settings: (Click on your user name)

Once done you should see a Custom instructions option (when clicking user name). Now you can fill it in.

Third, increased messages for GPT-4. Now you can send 50 messages every 3 hours. This is a 2x increase over the previous 25 message limit. Though you wonder if people believe performance is going down then what is the point?

🗞️10 AI news highlights and interesting reads

White House reached an agreement with tech giants on managing the risks from AI. It is a voluntary commitment. It underscores how different regions and countries are approaching regulation. The US is more about self-regulation, the EU wants consumer protection and safety, China wants state control.

The first domino to fall in the AI race seems to be customer service. Just replacing the customer service teams with chatbots. First, it was an Indian startup Dukaan which replaced 90% of the team with chatbots. Now, Shopify is doing the same but with NDAs to avoid bad press. And the same goes for call center workers who are battling with AI.

Apple is testing Apple-GPT. They also have built an internal framework called Ajax. New technology always leads to people writing frameworks and re-inventing the wheel. So, this is going to be interesting.

In the meantime, Meta is also working with Qualcomm to enable on-device use of Llama 2. Currently, models are run in the cloud and have privacy and security concerns. An on-device AI is secure, more private and provides more scope for personalization.

An entirely AI made South Park Episode was created. They used GPT-4 to generate the dialogue and text, diffusion model to generate the character and voice cloning to provide voice. This is an achievement in combining multiple AI techniques to create a unified flow and product. It is both exciting and dangerous.

Who’s next on the AI chopping block? The news writers, maybe? Google showcased an AI tool which can write news articles. Internally it is called Genesis (not so subtle nod to Terminator: Genisys?) News companies have been under lots of pressure and this doesn’t help. It has left people unsettled. While many chose to not comment, this is concerning. Especially, when you have executives pushing for more AI content.

Open Source is digesting the AI research results quickly. Now researchers need to learn how to balance performance vs practicality of solutions.

LLMs might pose a threat to digital conversations. Researchers found Stackoverflow contributions are down 16%. Though you have to remember that SO has been a difficult place for newbies. With ChatGPT providing an easier answer, why would they want to go to SO?

34000% growth in AI projects says Replit. Tell me there is no hype.
LangSmith, a unified platform for debugging, testing, evaluating, and monitoring your LLM applications

🧑‍🎓3 Learning Resources

The easiest way I found to run Llama 2 locally is to utilize GPT4All. Here are the short steps:
1. Download the GPT4All installer
2. Download the GGML version of the Llama Model. For example the 7B Model (Other GGML versions)
3. For local use it is much better to download the lower quantized model. This should save some RAM and make the experience smoother.
4. Go to the installation directory. Place the downloaded file into the “models” folder.
5. Start GPT4All and at the top you should see an option to select the model. See below.
6. Keep in mind the instructions for Llama 2 is odd. Check the prompt template.

That’s it folks. Thank you for reading and have a great week ahead.