OpenAI is Going to Kill Startups

PLUS: Teaching Language Rules is Fair Use and Google Next Cloud Updates

Happy Monday!

This week we’ve got:

  • 🔥Top 3 news: OpenAI is going to kill startups, Teaching Language Rules is Fair Use and Google Next Cloud Updates

  • 🗞️Interesting reads - Hype-Free LLM Reading List, Google Gemini will beat GPT-4, LLMs in the Browser, Fun Game predicting GPT-4 Outputs, and more

  • 🧑‍🎓Learning - Fine tune GPT3.5 for Natural Language to SQL, Optimal chunk size for building AI Summary bots, Options for solving Hallucinations

New - I am adding all the news stories I collected in this week as a notion document. Please see at the end of the newsletter.

Let’s get started.

🔥Top 3 AI news in the past week

OpenAI announced its plans to release the ChatGPT Enterprise Version.

The salient features are:

  1. Unlimited access to GPT-4 (no 25 messages per 3 hours)

  2. Unlimited access to advanced data analysis (formerly known as Code Interpreter)

  3. Large context windows for more data

  4. Customer prompts and company data are not used for training OpenAI models.

  5. Encryption at rest and in-transit

  6. SOC 2 compliance

Upcoming feature includes a way to feed enterprise data into ChatGPT. This will create a “chat with data” using enterprise data inside ChatGPT.

Why does this matter? Enterprise customers do not like to share data. They need SOC compliance. They need to know their data is secure at rest and in-transit. This announcement assures enterprise customers that they can use ChatGPT without blow back.

There are tons of LLM apps out there. And 100s of “chat with data” apps. Each offering a data wrapper on top of OpenAI. These apps would’ve realized that enterprise customers are sensitive around data. Now with ChatGPT offering SOC protection, it will affect these apps. Many “chat with data” are going to either pivot or die. 

Additionally, with this OpenAI is now competing with Microsoft’s Azure Enterprise offerings. The current impact on Microsoft is going to be low. The reasons an enterprise might still want to use Azure are two fold. One, they already have data stored in Azure blobs. That makes creating data pipelines much easier. You can connect to stuff like Excel and PowerAutomate. Second, enterprises can still control the architecture. They can pick a cheaper model like Llama-2 for some tasks and use GPT-4 for others. They can even control the look and feel of the chatbot.

A lot is going to depend on connectors OpenAI will build for feeding enterprise data.

2. Teaching Language Rules is Fair Use

OpenAI responded to the lawsuits filed by Sarah Silverman and other authors. As per OpenAI the authors misunderstood the scope of copyright laws. Their books are a tiny part of a large dataset. The copyrighted works are being used to teach its model the rules of human language.

Another contested aspect was the fact that all ChatGPT outputs are copyright infringement. Their examples included words like “Yes” and the name of the President of the United States.

The authors had also claimed ChatGPT outputs remove copyright information (CMI). And this violates the DMCA. OpenAI contends that any such violation is unintentional.

Why does this matter? LLMs need a large amount of data to train. The easiest way to teach a model the rules of underlying human language is written work. Stuff like news articles and books. So, the Silverman case is going to be very important and interesting.

Benedict Evans has taken a position similar to OpenAI. A particular book doesn’t matter at the total level and it isn’t exactly piracy. What do you guys think? I am interested to know.

This case will also inform the laws on training other LLMs on ChatGPT outputs. If it is okay to use copyrighted works to teach the model “rules of human language" then it should be okay to use to teach other models too. If the ChatGPT outputs are a “tiny” part of the training data.

3. Google Cloud Next and GenAI

Google Cloud Next was held last week. Some of the key AI highlights were:

  1. The PaLM LLM has expanded context length.

  2. Llama 2, Claude 2, and Falcon are now available in the model garden.

  3. Partnerships with medical companies to use Med-PaLM 2

  4. Duet AI for Workspace is now available.

The last update is quite interesting for me. Duet AI promises to summarize emails and create presentations. That is one feature I want to try. I have been avoiding running the summarization plugins due to privacy issues.

Why does this matter? Google Docs and Microsoft Office 365 are the two main productivity office suites. And Microsoft has been ahead of the AI integration game for far too long. At one point it looked like Google was going to repeat history. Now It is good to see Google finally catching up.

🗞️10 AI news highlights and interesting reads

  1. Lot of training and learning resources on LLMs come from tool vendors. It is part of content marketing. But if you want to go beyond tools, then there is the Hype-Free LLM Reading List.

  1. We know that LLMs need large GPUs to work. The Web LLM project aims to explore the possibility of running LLMs inside the browser. There is no server inference at all. You can try the demo here (Warning. It is currently slow). This requires Chrome 113 with WebGPU. Earlier Chrome versions (≤ 112) are not supported.

  1. Put your understanding of LLM capabilities to a test. See if you know enough about what GPT-4 can do.

  1. OpenAI has approached the startup process in a contrarian way. Raising tons of money, focusing on R&D and not finding product market fit. It even makes Sam Altman feels bad about his earlier advice.

  1. OpenAI has released a guide on how to utilize AI for teaching.

  1. The US Copyright Office is seeking public comments on copyright in Generative AI.

  1. Multiplayer games are full of toxicity and hate speech. So, this is going to be interesting. Activision is implementing an "in-game voice chat moderation” using AI technology called ToxMod. It has partnered with a company called Modulate. This tool will analyze voice chat conversations to identify toxic behavior. The final decisions will be from the human moderators.

🧑‍🎓3 Learning Resources

That’s it folks. Thank you for reading and have a great week ahead.