• GPT Weekly
  • Posts
  • GPT4’s New Role: Content Moderation

GPT4’s New Role: Content Moderation

PLUS: Upcoming death of content marketing? and more

Happy Monday!

This week we’ve got:

  • 🔥Top 3 news: GPT4’s New Role, Content Marketing is dying?, LLM Evaluation

  • 🗞️Interesting reads - You don’t need to finetune LLMs, Open Challenges in LLM and more

  • 🧑‍🎓Learning - How does Llama.cpp work, Mathematics of training LLM and “Attention and the Transformer”

Let’s get started.

🔥Top 3 AI news in the past week

1. GPT4’s New Role: Content Moderation

Content moderation is like the internet secret sauce. It keeps our digital platforms in check. Otherwise spam will rule the roost.

Content moderation requires a lot of empathy and understanding of context. Moderators also need to update themselves to keep up. But it can be slow and quite stressful for moderators. There have been ML models built to tackle the moderation issues.

Now OpenAI is now exploring the usage of GPT-4 for content moderation. The way it works is:

  1. You feed GPT-4 the policy document and a set of examples.

  2. GPT-4 reads the policy document and labels these examples based on its understanding

  3. You then examine human labels vs GPT-4 labels. If you find discrepancies you can ask GPT-4 to explain the reasoning. And further update the policy document to be much clearer.

GPT-4 can then create classifiers to be moderate content.

As per OpenAI, this will help with consistent labels - removing a lot of discretion in moderation. It will also reduce mental burden on human moderators.

There is an obvious drawback of this approach. The base models might have inherent biases introduced during training. In which case, a human in the loop is required to check the outputs.

Why does this matter? People have treated content moderation as both a boon and curse. Musk bought Twitter in some part because he didn’t agree with the content moderation policies. There are companies already using ML classifiers to moderate content. It’ll be interesting to see how a GPT-4 based approach works.

2. Content Marketing is dead?

Google’s AI powered search has a couple of new updates.

First, in the AI-generated responses pop up, you can now hover over unfamiliar terms in subjects and definitions will pop up. There might even be some diagrams or images to help you wrap your head around things.

Second, the AI-generated code suggestions will be color coded. They've thrown in syntax highlighting that makes keywords, comments, and strings pop.

Third, “SGE while browsing” feature. On some pages there will be an option to summarize article key points and links to take you to those points. Plus, there's an "Explore on page" feature. It helps you dive even deeper into the article. It achieves this by showing you the questions it answers and taking you straight to the relevant bits.

Why does this matter? The first two updates are going to affect content ranking for sites relying on Q&A format. So, Stackoverflow and Quora etc are going to take the hit.

The large impact will be due to the “SGE while browsing” feature. One of the popular marketing concepts has been content marketing. The way it works is that you write a blog on a pressing issue in your market. Within this content you drop hints about your product. For example, a video editing software will write about how to splice videos. It will give you many free options but drop subtle hints about their product.

Now imagine this happening with the “SGE while browsing” option. I can read the quick summary and/or find what I want and close the window. So, this is going to impact content marketing a lot.

3. LLM Evaluation

The rapid rise of LLMs has fueled optimism and transformative potential in various sectors. But there has been an increased focus on LLM outputs as well. Many news outlets have focused on model misbehavior and black box behavior.

The model misbehavior concerns will lead to policies on LLM outputs. On that end, Scale has launched a Test & Evaluation for LLMs.

As Scale, Test & Evaluation will happen on parameters like:

  • Instruction Following—meaning: how well does the model understand what is being asked of it?

  • Creativity—meaning: within the context of the model’s design constraints and the prompt instructions it is given, what is the creative (subjective) quality of its generation?

  • Responsibility—meaning: how well does the model adhere to its design constraints (e.g. bias avoidance, toxicity)?

  • Reasoning—meaning: how well does the model reason and conduct complex analyses which are logically sound?

  • Factuality—meaning: how factual are the results from the model? Are they hallucinations?

These tests will utilize both model and human inputs to grade models.

Why does this matter? LLMs are notoriously hard to test and evaluate. GPT-4 outputs are non-deterministic. Latest research shows that you can automatically generate LLM jailbreaks. So, it will be interesting to see if they manage to resolve these issues. And provide proper LLM evaluation.

🗞️10 AI news highlights and interesting reads

  1. A common misconception is that you need to finetune LLMs for your use case. Probably, you don’t need to.

  1. List of Open Challenges in LLM Research. Some of the challenges are being actively tackled like multi-modal models, LLM context length etc.

  1. The economics of generative AI. TIL: OpenAI might be spending 700k each day to keep the lights on.

  1. Ethan Mollick has been a strong voice for AI. Even then this article surprised me where he bats for AI helping with innovation and creativity. 

  1. Self-hosting provides control over model architecture, customization, and long-term integration. A case for why you should host your own LLM.

  1. Google workin on life advice AI apps. I hope they plan to use Scale’s T&E to ensure it doesn’t give some horrible life tips.

🧑‍🎓3 Learning Resources

That’s it folks. Thank you for reading and have a great week ahead.