Markets by Grant | Frameworking the Future of Content
Hello and Welcome back to Markets by Grant.
Before we get started, don’t worry about how small the scroll bar is / how long Medium says it’s going to take you to read this. This piece is mostly pictures, so it’ll breeze by. And I don’t like Medium underestimating your reading speed like that.
In today’s piece, I’ll analyze the history of the content industry to try to predict its future — and close by highlighting some interesting startups in that potential future.
In order to do this, we need to get framework-y. And getting framework-y with this industry isn’t that straightforward.
In fact, my biggest struggle with the state of writing on content today is the lack of structure around how we define the space. Specifically, I think we’re lacking perspective around product format in the content industry.
What is a content product? What products belong to what categories? What characteristics define these categories?
Let’s start with the current state. Nothing is MECE. Here’s a smattering of the frameworks I got when I searched for visual representations of the content industry:
Result: pitiful word clouds. Even the ‘periodic table’, which has the visual appearance of a framework, has little logical structure to it whatsoever. Conclusions?
- Maybe I’m bad at prompt engineering on Google Image search
- Maybe nobody bothers to map the content industry because…
- It’s too big
- It’s too tough
- They have day jobs
- They don’t feel there’s a point
- Some combination of the above
Ok — time for me to stop hating and start framework-ing:
As a guiding principle, I want to abstract away from today’s definitions as much as possible, to try to comment on new areas of innovation without being bogged down in old ways of thinking.
Also, I’m going to try to be mutually exclusive — ie: if it doesn’t fit in my framework, it ain’t content. I’m not going to try to be collectively exhaustive, because:
- It’s too big
- It’s too tough
- I have a day job
- I don’t feel there’s a point
So, instead of MECE, this will be all about ME.
Grant’s girlfriend: “Typical”
Let’s start with the essential.
How, physically, do we ingest information? With our five senses, of course.
Reader: “Wow, Grant really is starting high level”
I’ve got to draw the line somewhere, so we’ll focus on sight and sound. With the exception of Braille for touch, we don’t really use our other senses to encode and decode information. And while we theoretically could, that’s not the direction I want to take this piece.
And how do we interact with sight and sound stimuli? The most basic and common way is passively, with static stimuli. The content is there, and we consume it as we please. If we go back to it later, it’ll be the same. This is the first level of content product formats:
Example product formats:
- Texts
- Illustrations
- Photographs
- 3D images
- Sound tones
- …and most other forms of ‘old-school’ content
Conceptually, text, illustration, photos, paintings, etc, are actually very similar. They’re static visual images which encode information, which you can decode by looking at them. Folks may be surprised to see 3D images listed here, but I’ll stand by it. Using a program to explore a LIDAR visualization isn’t that different from turning a page on a book. You may be moving around to consume the content, but it’s still static. For sound, you could make the argument that single tones or short repeating sound patterns fall under this static category — like Morse Code.
… . -.-. .. — — -. -. / -.. .. …- .. -… . .-.
(that’s ‘section divider’ in Morse Code)
The next level of depth of content is time-based, or temporal, consumption. For these product formats, you hit ‘run’, and they change before your eyes and ears. Recorded/created audio and video lives in this category. They’re not static; they play out over a defined span of time. At different moments, you experience different content. When finished, you can repeat them and they’ll be exactly the same. In this category, we start to see layering of both audio and video for even more depth and richness.
Example product formats:
- TV / film
- Music
- Podcasts
- …and most other forms of modern content
Although they may seem quite different, both of these categories of content product formats (static and temporal) have something in common. The content consumer is passive. There’s nothing they can do to shape the content they experience. The products are what they are, and they can only be absorbed as created.
The next categories of content product formats involve active interaction with the content consumer. In these categories, the consumer shapes her content experience.
I’ve dubbed this experiential paradigm the ‘environment’. These are little realms of content which can be experienced by the user in a rules-based way. Instead of just hitting ‘run’ once, users have many opportunities to ‘run’, and have multiple options of what they would like to ‘run’ each time. There is a finite amount of content that can be experienced in one rules-based environment, but it is quite large and often actively growing:
Example product formats:
- Audio/video games
- Social media
- Virtual spaces
- …and most other forms of post-modern, “metaverse-y” content
Gaming and social are the key product format evolutions here. In gaming, social, and metaverse (simply defined as a combination and extension of the two), you can define your own experience, to some extent. You operate ‘freely’, but within the rules and limitations set by the environment.
The last category I have is ‘the sandbox’, for generative content. We’re on the cusp of this happening. This is consuming content and creating it at the same time. It is actively shaping the content you want to consume. Another way of thinking of this, is fully defining the content environment, without boundaries or rules. What if I could define the song I want to listen to? The video I want to watch? The game I want to play?
(Early) example product formats:
- Text-to-text: ChatGPT, Stable Diffusion, Cohere, etc
- Text-to-image: Dall-E, Midjourney, Runway, etc
- …and most other generative tech application interfaces
But hang on, is this a product format, or just a new way of creating/interacting with existing product formats? Ie: Dall-E, ChatGPT, etc, all create familiar things, which I’ve already categorized above. Shouldn’t this be a creation shift, rather than a product format and consumption shift?
In other words, shouldn’t we start a new cycle in our framework — one for AI-generated content rather than human-generated content? Like this?
In short, no.
This is my core point. In this piece, I’m not interested in opining on how generative AI will improve upon or replace human production of traditional content like articles, TV, film, etc.
Instead, I’m interested in how the generative content experience itself will be productized and consumerized.
My thesis is that human interaction with generative tech applications will become a new format — a boundless creative content environment personalized to each consumer.
The bet here is on creation as consumption. Generative tech has massively accelerated the already fast-accelerating democratization of content creation abilities — and this democratization will continue to shape consumer preferences (I’ll explain better later on). As content creation becomes more gamified, personalized, and immersive, it will solidify as a consumer product category, not just a technology.
So, back to this framework:
To be fair, although the generative product format is new, it’s also kind of a restart of a familiar cycle. To understand this future perspective, let’s look to the past.
Ancient History:
I want to elaborate with a historical analysis of technologies we’ve used for content capture and consumption.
I know that nobody asked for this, but:
1) it’s interesting
…(for me),
and, 2) I think this will be useful to set the stage for the future, as I highlight trends and cyclical similarities.
Before we dive in, a quick orientation. Back in the day, the technology by which content was captured/created was often quite different from where it was consumed.
- Example: You go back in time and take a photo of your great-grandparents using an old-school camera. Days later, after a harrowing journey in a dark room, the content manifests in a piece of paper called a photograph. There’s a bunch of processing and intermediate technologies between the paper photo (consumption technology) and the camera (capture technology). You get the point.
In the chart below, capture technologies (in dotted lines) are paired with, but separate from, consumption technologies (in solid lines).
Ok, good luck:
As we can see, the last ~200 years of content innovation have yielded some interesting trends!
*entire third grade class falls asleep*
The first is technology consolidation. Old inventions were rolled into and replaced by new ones, effectively culminating in the modern personal computer, which can do everything these inventions could, combined, all in one place.
As part of this consolidation, we found that previously separate technologies for capture of content and consumption of content became more and more proximal.
- In other words, we moved from old-school cameras and darkrooms, to disposable cameras with rolls that could be developed at your local pharmacy, to digital cameras where you could view photos on a screen immediately after taking them, to phones where you could take, view, edit, and share the content all in one place.
So what?
The result of this is a democratization of content creation, as content capture technologies became more affordable and accessible, resulting in more content created.
In parallel, accessibility and centralization of consumption technology accelerated the popularization of content consumption, since consumers can increasingly access more content in the same place.
I’ll pause to congratulate myself for mapping ancient history using publicly available information — a real feat of the visionary venture investor intellect.
Ok, time for a quick break.
~Recess~
*Third grade students cheer*
~Back to class with Mr. Grant~
*Third grade students sigh despondently. One student starts to cry*
Let’s now characterize more recent history. As you have likely figured out by now, this is just a more detailed, technology-focused, historical lens of the framework I introduced at the beginning of my piece. From the 1800s to 2000, we moved from static content to temporal content, to the beginnings of experiential content.
The computer, and the consumerization of programming language as a result, shifted the paradigm from hardware-focused innovation to software (another remarkable insight from this gifted mind) — and rules-based programs introduced rules-based content experiences:
The broad trends brought about by this move to the content ‘environment’ product format, are accelerated democratization and popularization — but unlike the hardware consolidation of yesteryear, we see application fragmentation within our unified hardware. The experience has gotten virtual, and there’s a glut of virtual experiences to be had. I’ve highlighted two prominent examples:
Gaming
…is the most obvious. Gaming is an easy-to-understand extension of image and video. We started with hardware-driven gaming innovations via consoles, and have slowly repeated cyclical technology consolidation trends with mobile gaming and other embedded gaming experiences.
So gaming has gotten closer and closer to the consumer. Meanwhile, software advances have created increasingly rich environments. The trend in gaming has thus been a move toward immersive, expansive content experiences.
Social Media
…is also worth a mention. Social media follows the same rough timing of the content inventions of yesteryear — like a little microcosm of the past 200 years. We start with social media focused on text (ie: blogging), and slowly introduce other forms of static visual content, like photos, and finally land squarely in today’s audio-video-focused social media environment.
Social media is the ultimate result of democratization of content creation, and accelerated popularization of new content formats. Like gaming, social media is gamified, and players win and lose social capital. Also like gaming, social media is expansive and immersive, and the environments, though finite in the content they contain, are constantly shifting and expanding. In particular, the rise of social media has firmly set the consumer appetite for user-generated, socially-gamified content experiences.
TLDR:
Thanks to gaming and social media, consumer tastes have crystallized toward content experiences which are immersive, expansive, social, gamified, and user-directed (I’ll define all these ten-dollar words later). A few years ago, we started talking about this thing called ‘the metaverse’, which brought all of these characteristics under one roof.
Then, all of the sudden, AI barged in
…seemingly interrupting the nice thing we had going with gaming and social. And we kind of pivoted our interests from keeping the good times rolling, to going back in time and modernizing all previous content creation technologies using generative tech.
Unsurprisingly, generative tech innovations have moved basically in the same order as the inventions of yesteryear, and my framework. The difference is that they’ve moved at an incredibly accelerated comparative pace:
(note: this representation is illustrative, but not even close to exhaustive)
The reason for the similarity and adherence to my framework is almost too obvious to mention. As we move from static into temporal content formats, the content itself has much more data.
Ok, let’s pause and summarize where we’ve been. We started with a high-level visual as a framework for trends in content innovation. And we filled it in with actual trends and innovations, up to today — for a more detailed, historically-validated view:
So, what do we expect to happen next?
In short, I expect us to return to where we left off…kind of. We can think of generative tech as a little AI-focused detour, but consumer preferences are still where we left them. Now, instead of approaching those preferences with the tech of yesteryear, we’ll be addressing them with generative tech.
In the short term, once video generation tech reaches maturity, we expect it to merge with audio generation tech, for accompanying music or voice.
Soon thereafter (and in parallel), we’ll get into the fun stuff. We’ll start to see gamified, rules-based interactivity with our generative environments, like in gaming.
Fanboying a16z’s Connie Chan:
“Not only will AI propel the creation of more games, but it will advance a new type of game that is more dynamic and personalized to the preferences of each gamer. Eventually, this may expand to entire virtual worlds you can create from scratch.”
I love the characterization “a new type of game”. There are VCs out there who believe gamification is the future of every consumer experience, and I kind of agree. An abstraction of the term gamification is, for me, the idea that an environment is created where there are rewards and punishments — carrots and sticks. If they’re meaningful and elegant enough, consumer behavior happily follows. Patron is a new-ish fund which shares this belief.
So, let’s dissect the consumer trends which I expect to lead us into the future. In short, I expect some to continue, and some new ones to emerge. Here’s my take, high-level:
And here are my definitions of the ten-dollar words:
Technology consolidation is happening again.
- Runway, Catbird.ai, and others have started to consolidate generative technologies and offer them in the same platform.
We’re seeing blisteringly fast democratization and popularization of generative content platforms.
- ChatGPT got 1M users in 5 days and 100M users in 2 months. ‘Nuff said.
Social gamification is here to stay.
- This was a key insight we left off with in the metaverse. There’s no use buying virtual ‘land’ if others can’t see it, enjoy it, or compete for it. The consumer value of generative content products too, is vastly limited if not shared socially (as I discussed in my previous piece).
User immersion is the future.
- Immersion isn’t about having a ‘rich’ or ‘hi-def’ content experience. It’s conceptually about being being closer and closer to the content you’re experiencing, until you’re in the content itself — both informing it and experiencing it simultaneously — at ‘runtime’. As AI learns us better, our actions, thoughts, and desires will be tied closer and translate better to the content we experience. Simultaneously creating and consuming will become more intuitive and elegant.
AI enables massive expansiveness and extensibility of content.
- This is another ‘duh’ one. Generative tech can do what would have taken years in seconds. A fun example from a16z:
“Microsoft Flight Simulator enables players to fly around the entire planet Earth, all 197 million square miles of it. How did Microsoft build such a massive game? By letting an AI do it. Microsoft partnered with blackshark.ai, and trained an AI to generate a photorealistic 3D world from 2D satellite images.”
AI enables massive personalization of content.
- TikTok delivers personalized content by proxy. It finds out you what you want, incentivizes others to make that for you, and feeds it to you (as I discussed in another previous piece). Now, instead of the approximation delivered by that giant human-machine orchestration, that same machine can just create that content itself, for you and you alone.
Finally, here’s the big picture:
I’ll ask the question again, Grant — what do we expect to happen next?
Let’s look back to the framework. Around this time in the last cycle, a little thing called the PC came to the fore and changed the paradigm from software to hardware, from temporal content to experiential content. I think something similar is coming soon. For lack of a more creative name, I’ll call this version the “PG” (personal generator). It’s a one-stop shop meta-environment — where you both create and consume content.
It’s a natural evolution of the centralization, democratization, and popularization of generative technology and the content it produces. But it’s a paradigm down from a PC. With your PG, you create and consume environments. Or, to re-quote Connie Chan, “entire virtual worlds you can create from scratch.”
*explosion*
*mic drop*
*sunglasses*
*smug smile*
*sunset*
(incase you were wondering (and I know you were), here’s a collage of Dall-E’s renderings of said image)
In short:
Using your PG, you can build landscapes, twins of yourself, new people with personalities and virtual physicalities, games, new content — extended and automated by algorithmic logic — personalized based on your AI knowing you — for you and others to explore and enjoy.
Ok, so who is out there doing this?
In short-er:
Nobody, yet.
And this isn’t a space I expect to reach maturity anytime soon. Most startups are currently point solutions starting with very specific use cases. There is no ‘PG’, and won’t be for a while. Still, there are some promising starts which might conceivably evolve in that direction.
Here are a few:
Anything-building:
It’s already pretty consumerized, and it’s easy to see how this could get gamified and social, as users build libraries of ‘their’ content. It’s all fairly basic for now, focused on image and video tools — but there’s a clear focus on continued expansion of toolkits.
World-building:
Both of these players have started with professional game-developers as their customer, with the stated objective/big picture of democratizing world development for anybody.
These are rudimentary, consumerized world-builders. I’m not sure if they properly use generative AI, or are just rules-based engines which place assets from a big library. At any rate, you’ve got to start somewhere.
Meta-world-building:
These players started with the metaverse already set up, and have embedded generative tools within. Your generative environment is literally the environment you’re already in.
These social players are similarly integrating generative tools into the user experience, and gamifying generative content creation in-platform.
The world within-building:
There are virtual worlds, full of modern content like images, animations, music, sound — but there’s also the idea of building the virtual ‘world within’ — full of emotional and logical landscapes. These two players have pre-built AI companions/people who you can train on your data and customize a bit.
These three apps allow you to build flatter ‘characters’ — but you can get deeper into the algorithm to influence their aesthetic, motivations, personalities, and behaviors — a la Westworld.
Ok, I’ve gone on long enough. I’ll admit it — there has been some length-creep in my writing. To the ~0.1% of readers who made it to the end, thanks for sticking with me. The first 10 of you to DM me on LinkedIn will get a free “I Survived Grant’s Latest Longwinded Medium Piece” t-shirt.
(In a sad irony, even the t-shirt description is hopelessly longwinded. At least it’s on-brand)
Next time, I’ll give us all a break and write about something totally unrelated to content. Get pumped!
Questions/feedback/ideas? HMU at grant.demeter@av.vc
— G
References
- https://a16z.com/2022/11/16/creativity-as-an-app/
- https://a16z.com/2023/02/07/everyday-ai-consumer/
- https://a16z.com/2023/03/17/the-generative-ai-revolution/
- https://a16z.com/2022/11/17/the-generative-ai-revolution-in-games/
- https://www.cbinsights.com/research/generative-ai-funding-top-startups-investors/
- https://www.immersivewire.com/p/generative-ai-metaverse
- https://www.prweb.com/releases/2022/9/prweb18886309.htm
- https://techcrunch.com/2017/12/19/googles-tacotron-2-simplifies-the-process-of-teaching-an-ai-to-speak/
- https://techcrunch.com/2022/12/30/quickvid-uses-ai-to-generate-short-form-videos-complete-with-voiceovers/
- https://techcrunch.com/2019/03/07/wellsaid-aims-to-make-natural-sounding-synthetic-speech-a-credible-alternative-to-real-humans/
- https://techcrunch.com/2020/09/22/wellsaid-labs-research-takes-synthetic-speech-from-seconds-long-clips-to-hours/
- https://www.theverge.com/2023/3/20/23648113/text-to-video-generative-ai-runway-ml-gen-2-model-access
- https://venturebeat.com/business/resemble-ai-launches-voice-synthesis-platform-and-deepfake-detection-tool/
- https://www.wired.com/story/generative-ai-video-game-development/