- GPT Weekly
- Posts
- Multimodal ChatGPT is here
Multimodal ChatGPT is here
PLUS: Bard Extensions, Microsoft’s AI companion
Happy Monday!
This week we’ve got:
🔥Top 3 news: Multimodal ChatGPT, Bard Extensions, Microsoft’s AI companion
🗞️Interesting reads - DALL·E 3, OpenAI finetuning, The Reversal curse and more
🧑🎓Learning - Fine Tuning comparison between Llama 2 and GPT3.5, RAG Explained (again) and Lessons from building GPT4 Summary App
New - I am adding all the news stories I collected in this week as a notion document. This will be available for free for next 2 editions. After that it will be available only for subscribers with 1 referral. Please see at the end of the newsletter.
Let’s get started.
🔥Top 3 AI news in the past week
1. Multimodal ChatGPT
As we spoke last week, Google has been working on Gemini. And one of the features of this model is going to be multimodal input. That is the model will respond to audio and visual inputs as well.
The Information reported that OpenAI was trying to release their multimodal input feature before Google.
Today OpenAI announced that it is adding voice and image capabilities to ChatGPT. Users can now engage with ChatGPT using voice and image inputs.
As per the announcement the audio feature will be available on the mobile app. For audio, you need to enable it by going to Settings → New Features on the mobile app. Then opt into voice conversations. There will be 5 voices to choose from.
The image option will be available on both web and mobile apps.
OpenAI is gradually rolling out the feature. Plus and Enterprise users will be getting this feature in the next 2 weeks.
The company says that the gradual rollout is to ensure safety. The audio models might be used for fraud and impersonation. So, this feature will be available only for voice chat. Other companies, like Spotify, are going to use the same technology for Voice Translation. While the image model has been tested in a Red Team environment.
There are limitations to the multimodal input. The model doesn’t work well for non-English inputs. So, OpenAI advises against using the model for those use cases.
2. Bard Extensions
Google announced an array of features for Bard.
First, to resolve the hallucination issue there is now a “Google it” option. You can click on the “G” icon in Bard. It will check the answers against the information available on the web. If there is supporting evidence then Bard will highlight in green. If there is no supporting evidence then Bard will highlight in red. You can click and read each of the highlights to get a better idea.
Second, you can now share chats. Anyone can take your chat and continue from where you left off.
Third, Extensions. Bard can now pull information from Gmail, Flights, Youtube etc. Google says that if you opt to use this feature your data will not be used by Bard to show ads or used to train Bard.
This announcement is driving the nail into the coffin of “chat with your data” apps. As Bard gets smarter and Google integrates it more, the size of the “chat with your data” market will grow smaller.
3. Microsoft’s AI companion
In the September event Microsoft had promised a rollout of the AI companion. This companion is now called Microsoft Copilot. It will replace Cortana. At least, Cortana was a better name. If I encounter an issue with “Copilot for Github it is going to be annoying to keep repeating the full name.
Copilot will integrate across multiple apps. It'll use context to provide better help to users.
It will begin rolling out as a free update to Windows 11, starting Sept. 26. Some of the integrations include:
Copilot in Windows - Replacing Cortana. It will be available using the Win + C button.
Photos - Inclusion of AI features to make photo editing easier.
Clipchamp - With autocompose features
Bing will also include integration with DALL·E 3. See the announcement below.
🗞️10 AI news highlights and interesting reads
DALL·E 3 had an announcement of an announcement. It will integrate to ChatGPT and be better at following instructions.
OpenAI fine tuning feature is now live.
OpenAI Devday - OpenAI’s first developer conference. While the in-person option is closed you can sign up for livestream.
This is an interesting paper on LLM knowledge - The Reversal Curse. A model trained on “A is B” cannot respond “B is A”. Example - If a model is trained on “Tom Cruise's mother is Mary Lee Pfeiffer”. It can answer the question “Who is Tom Cruise's mother?” (as in A is B). It doesn’t work when asked “Who is Mary Lee Pfeiffer's son?” (B is A) GPT4 makes this logical deduction 79% of the time.
As per Ethan Mollick, the future of work is AI. He co-authored this paper. To quote him - “Consultants using AI finished 12.2% more tasks on average, completed tasks 25.1% more quickly, and produced 40% higher quality results than those without.”
OpenAI is calling for domain experts for Red Teaming exercises.
Youtube is rolling out a host of AI features for creators.
We have talked about how Apple’s use of Transformer architecture for texting is a great idea. Now a technical deep dive on how it works.
A good read on AI regulation.
🧑🎓3 Learning Resources
That’s it folks. Thank you for reading and have a great week ahead.