- GPT Weekly
- Posts
- AI Copyright and Adobe’s Safety Net
AI Copyright and Adobe’s Safety Net
PLUS: Open-Source AI Context Lengths, OpenAI's World Tour and more
Happy Monday!
This week we’ve got:
🔥Top 3 news - AI Copyright and Adobe’s Safety Net, Context length in open source AI and OpenAI’s world tour
🗞️Interesting reads - Companies slam the EU Act, Khanmigo is doing “okay” in limited runs, Facebook’s ongoing openness and more.
🧑🎓Learning - What is Langchain? And 3 Guides on Embeddings.
Let’s get cracking.
🔥Top 3 AI themes and news in the past week
1. AI Copyright and Adobe’s Safety Net
Last week’s theme seemed to be AI and copyright. It's like mixing oil and water - chaotic, slippery, and, honestly, a bit of a hot mess.
First off, there's this game developer who tried releasing his game on Steam. The guy used AI to whip up some art assets, and guess what? Steam said, "no way." His game got the boot because the copyright laws around AI-generated art are about as clear as mud right now. Platforms like Steam are playing it safe, and who can blame them? No one wants to wade into the murky waters of copyright infringement, especially when AI's the one stirring the pot.
Now, let's talk about the lawsuit against Microsoft and OpenAI. Sixteen pseudonymous people have come forward, alleging that the tech giants took their personal info without asking and let their AI models have a field day with it. They claim that this breaches their privacy. Though as you read the complaint you wonder if they can win. Reddit posts etc are essentially public data. You should expect it to be scraped and used. I fully expect someone to write GPT summaries of this newsletter.
So, what's the way forward? Adobe has stepped up and said, "We've got your back." At least for people using Adobe Firefly. They're promising to cover any copyright claims that come up. It's a bold move and a massive relief for anyone nervous about dipping their toes into AI-generated art.
Why is it important? AI and the ownership of the data has been a big challenge. This is going to get even more murky as people start using generative AI for complex and enterprise purposes.
What’s next? We need clear guidelines on both ends. Who owns AI-generated content and how training data is used. Transparency is no longer just a buzzword—it's a necessity.
Adobe is making strides, but it's only the tip of the iceberg. Other companies might have to offer similar clauses to ensure that companies can use their product without concerns.
2. Open-Source AI: Expanded Context Lengths
Last week we also saw HuggingFace’s CEO, Clement Delangue addressing the US Congress. Delangue's testimony comes amid ongoing concerns about potential misuse of powerful AI models, such as the leaked LLaMA model.
Delangue underscored how open science and open-source AI are critically tethered to American values and interests. He highlighted how today’s AI evolution wouldn't be possible without open-source tools like PyTorch, Tensorflow, Keras, transformers, and diffusers, all made in the U.S.
He also explained Hugging Face's approach to ethical openness using open documentation, community driven moderations and opt-in/out of datasets and respecting copyrights.
Why is it important? Open Source has been taking large strides in improving access and bringing technological advancements to AI.
The latest addition to the open-source AI accomplishments is the expansion of context lengths. While Anthropic and OpenAI have pushed for larger and larger contexts, open source models based on Llama have been on 2k context length.
LongChat-7B and LongChat-13B, have managed to extend the context length to 16K tokens. These models showcase impressive retrieval accuracy, marking a significant stride toward closing the gap between open models and proprietary long context models, such as Claude-100K and GPT-4-32K. This is a long road but it is a start.
What’s next? It's clear that the open-source AI community will continue to drive cutting-edge innovations. It is not just about creating the next best model, but about doing it openly and ethically, fostering an environment that encourages learning, collaboration, and progress for all.
3. OpenAI - The World Tour
Sam Altman had traveled to 22 countries to discuss OpenAI and its impact. OpenAI published a list of insights and how they will be incorporating the feedback.
To me the important aspect was about where people see the most impact. Healthcare and education - seems to be on top of the list.
Other major topics were:
The major concerns are around - misinformation, economic impact and safety.
Building a framework to safeguard from the AI systems
Data usage by OpenAI
What’s next? Based on the feedback OpenAI has created a list of things to work on:
Improving the model’s non-English performance to enable wider use.
Promoting best practices for foundational models.
Improving AI literacy.
🗞️10 AI news highlights and interesting reads
“Companies developing these foundation AI systems would be subject to disproportionate compliance costs and liability risks, which may encourage AI providers to withdraw from the European market entirely”. As I have been saying in the last two editions, the EU Act makes things difficult for foundational models. Now companies are threatening to withdraw from EU again.
Khanmigo is at the forefront of LLM and AI in education. How’s it working so far? “Could use improvement.”
Meta is taking transparency and openness seriously. They explain how AI influences what you see on Facebook and Insta.
One way AI is killing the web is to make it easy to create fake reviews. Now those fake reviews might be illegal.
Another way AI is killing the web by supercharging link farms.
The emerging role in IT will be “AI Engineer”. Someone who understands the underlying foundational models and APIs and implements them into a product.
Migrate code from one language/framework to another. I think this is very cool. Though you have to ensure all copyrights are in place. You cannot simply take a non-commercial code and convert it to something else and use it.
🧑🎓3 Learning Resources
People seem to have a love-hate relationship with Langchain. Some think it is useless while others find it good. In case you have wondered, What is Langchain and why should I care as a developer?
LLMs require embeddings to work. What are embeddings? Did you know embeddings also power the recommendation engines? Another guide on embeddings.
That’s it folks. Thank you for reading and have a great week ahead.