# AI, Design, and the Power of Open Models

Podcast: The a16z Show
Published: Jun 17, 2026
Reading time: 22 min
Canonical: https://podbrew.app/briefs/the-a16z-show-ai-design-and-the-power-of-open-models

Yoko Li and Justine Moore join Ideogram founder and CEO Mohammad Norouzi. Their discussion centers on image generation models, creative design workflows, and the dynamic relationship between artificial intelligence and artistic endeavors.

The conversation covers Ideogram's decision to release an open-weight model, the complexities of accurately generating text and precise layouts within images, and the increasing research focus on controllability. They examine various aspects of prompting, customization, and editing, along with the trade-offs between general-purpose models and systems optimized for specific creative tasks.

This exchange is crucial for understanding how image generation models are likely to evolve as creators and businesses demand greater control over their outputs. It sheds light on the strategic importance of open-source AI, advanced design tools, and agentic workflows in shaping the future of creative industries.

## Key takeaways

- Ideogram introduced a compact 9.3 billion parameter open-weight image model designed for editable designs, prioritizing user control for specific styles, typography, and layouts.

- Ideogram pivoted to an open-weight model strategy to dedicate more resources to foundational model development and enable collaboration with inference providers, chip makers, and enterprises.

- The new open-source model facilitates photorealistic 2K image generation and precise layout control, with future plans to introduce editable text and design capabilities.

- Ideogram employs highly detailed, multi-word prompts, including layout and bounding box controls, for precise manipulation of image elements, fonts, and positioning.

- A key differentiator for Ideogram is its ability to accurately render long-form text directly within images, addressing a critical need in the graphic design and storytelling sectors.

- Training innovations include novel data processing techniques that teach the model fundamental visual concepts such as bounding boxes, layering, and color palettes.

- Ideogram enhances training data quality by using an AI model to convert images into detailed text descriptions (including bounding box and element information), rather than relying on short or inaccurate alt-text.

- Their core recipe for training text-to-image models involves a multi-stage AI process: an AI model generates detailed text from images, which then trains a second AI model to create images from those text descriptions.

- Ideogram explores structured prompting using JSON representations, which function as an intermediate language allowing language models to translate vague user ideas into precise and detailed image descriptions.

- Providing detailed JSON inputs offers professionals greater transparency, control, consistency, and fine-grained editing over AI-generated images, especially for adhering to brand guidelines.

- Human designers are essential for evaluating and enhancing the "taste" of AI-generated images, as current AI systems lack the ability to effectively assess aesthetic quality.

- Ideogram secured state-of-the-art results with a 9.3 billion parameter model by prioritizing innovation and differentiation in specialized domains like graphic design and editable text, rather than competing on raw compute scale.

- Ideogram is developing small AI models optimized for consumer GPUs and on-device use, enhancing privacy and allowing artists to train them on their specific art styles and textures.

- Custom-trained AI models can effectively learn and apply an enterprise's unique "brand DNA" for design ideation and marketing, addressing the limitations of generic AI models.

- Ideogram offers multiple customization options, including open-source fine-tuning, an upcoming in-app training tool, and bespoke solutions for enterprise clients with data curation.

- Image editing and model fine-tuning are complementary methods that, when combined with prompting, significantly enhance the composability and customization possibilities within creative AI workflows.

- Creative tool development is moving towards "agentic loops," where AI automates modifications through API requests, shifting away from traditional UI-based human input to scale creativity by exploring vast design variations efficiently.

- Ideogram's model design, with minimal reinforcement learning, enables a greater diversity of artistic styles compared to other established image generation models.

- Building powerful diffusion models requires simplifying their task by providing highly detailed specifications of the desired image.

- Prompt representations may shift from JSON or natural language to HTML, leveraging LLMs' training on HTML for improved structured output.

## 00:05 - 02:06 Ideogram releases open-weight image model emphasizing control and taste for creative workflows

Ideogram launched its first open-weight image model, which is notably small at 9.3 billion parameters compared to the previous state-of-the-art 80 billion parameter models. This release aims to provide editable designs for marketing and design use cases, rather than just single flat images, achieving results comparable to larger models like Nanobanana or GPT Image.

The company focuses on building models with "taste" that artists can customize to the nuances of their individual style and canvas texture, targeting 2K output quality. This approach prioritizes giving users extensive control over generated content, including typography, layouts, editing, and customization, to integrate seamlessly into professional creative workflows.

This open-weight release is a significant departure for Ideogram, as their previous models were closed-source. The decision to make the model open reflects a recognition of rapid progress in the industry and a commitment to enabling broader customization and integration for users.

Ideogram's CEO and founder, Mohammad Norouzi, explained that the key challenge in image generation has shifted from simply creating images to giving users more control over the creative process, ensuring the models fit into professional workflows.

> It's not about how good a model is in the general sense, it's about how good is this model for my use case.

## 02:06 - 04:06 Ideogram Shifts to Open Weights to Focus on Foundational Model Development

Ideogram has made a strategic decision to release open weights for its models, shifting its focus from primarily operating its first-party app and API to concentrating more on foundational model development. This move is driven by the belief that significant potential lies in advancing the core models themselves.

By adopting an open-weight approach, Ideogram aims to extend its reach and foster collaborations with various partners. This includes working directly with inference providers, large enterprises, and chip makers, allowing for greater customization, on-premise hosting, and device optimization of their models. The company sees this as a declaration of its commitment to building robust foundational AI.

The new open-source model has already unlocked several advanced capabilities and use cases. It allows for highly photorealistic image generation, capable of producing images up to 2K resolution even with smaller models, alongside precise layout control. This release serves as a preliminary step, with Ideogram testing the waters for working with the open-source community through platforms like Hugging Face and ComfyUI.

Looking ahead, this foundational release is intended to pave the way for more exciting features. A highly anticipated development is editable text and layout control within generated designs. This feature is expected to be particularly transformative for design and marketing applications, moving beyond static outputs to fully editable content.

> this is basically us saying, "Hey, we are very serious about building the foundation model, and we would like to work with you, whoever you are, whether you're an app developer or a chip maker or an inference provider."

## 04:06 - 06:07 Ideogram's Precision Prompting and Accurate Text Generation in Images

Ideogram's model uses exceptionally detailed prompts, often thousands of words long, to control every element within an image. This includes precise layout and bounding box controls, which enable users to meticulously fix specific elements, their positioning, and fonts. This high level of versatility is essential for various design applications, allowing for fine-grained control over the final image.

A standout feature of the Ideogram model is its capability to render super long, accurate text directly within images. Users can either provide extensive text in the prompt or instruct the model to generate it, and it performs remarkably well. This capability is notable because early image generation models were infamous for producing garbled or incorrect text, which often led to humorous internet memes.

Ideogram deliberately focused on accurate text generation as a core differentiator from their very first model, released three years ago. At a time when competing models struggled with text legibility, this became a unique selling point. They recognized the significant value this feature brought to the graphic design and storytelling industries.

The precision prompting allows for complex multi-word instructions to dictate layout and content. This detailed approach, combined with the ability to accurately render text, distinguishes Ideogram's model in creating images where text is not just present but perfectly legible and integrated.

> image generation was synonymous with garbled text, and there were memes about DALL-E 2 generating travel posters with incorrect city names

## 06:06 - 08:07 Ideogram's Training Emphasizes Text Accuracy and Photorealism

Ideogram's brand is built on high-quality text generation within images, essential for logos, t-shirt designs, and graphic design. While previous models didn't lead in text generation, a continued focus and research breakthroughs have resulted in their current model achieving very accurate text, despite its small size.

A key innovation in Ideogram's training process involves new ways of processing data. This method allows the model to learn concepts such as bounding boxes, layering, and color palettes, which contribute to its distinguishing features.

The success of Ideogram's model is attributed to intense focus and meticulous evaluation. The company acknowledges that evaluating image models is challenging, as many existing benchmarks do not consistently correlate with desired pixel fidelity and realism. They also avoid relying on novice users for quality judgments, as such users may have uncalibrated monitors.

Ideogram continuously measures both text accuracy and photorealism throughout the entire training process. This ongoing measurement informs detailed updates to the model and data, ensuring that performance is consistently improving in these critical areas.

> throughout training, we always measure text accuracy and we update very detailed Changes to the model and data and see how that results in performance.

## 08:07 - 10:09 Ideogram uses AI to generate detailed image descriptions for training and employs structured JSON prompting

Ideogram trains its text-to-image models by generating rich textual descriptions from images. They moved beyond simple, often inaccurate, alt-text by developing models that convert images into detailed text, including bounding box and element information, especially for text within images. This process creates high-quality data for training their image generation models.

The core recipe for this involves a unique two-step AI process. First, Ideogram gathers images from the internet and uses one AI model to transform these images into detailed text descriptions. Subsequently, another AI model is trained to convert these generated text descriptions back into images, effectively teaching the model to understand and create visuals from intricate textual input.

Ideogram also employs a distinctive JSON prompting structure, as noted in their technical blog. This approach suggests an implicit structure for prompts, moving beyond plain text. This exploration into JSON as a potential standardized representation for image models indicates a shift towards more structured and detailed input methods for advanced AI image generation.

## 10:09 - 12:10 JSON as an Intermediate Language for Image Generation

The image generation model requires inputs in a specific JSON structure for high-quality outputs, as it is trained with JSON prompting. Giving a simple, one-word prompt might result in a

safety blocked

image, not due to content, but because the input lacks the necessary JSON specification.

While users are not expected to write in JSON directly, this format acts as a crucial intermediate representation. It helps bridge the gap between a vague human idea and the precise image description needed for diffusion models.

> JSON prompting. It's the intermediate representation that we think language models can describe images in that format and then image generation can happen.

## 12:10 - 14:10 Advanced Editing and JSON Inputs Enhance Creative Control in Image Generation

The new frontier in image generation involves advanced editing, moving beyond simple text prompts to a sophisticated combination of JSON and image interaction. This approach allows for a much richer and more precise output than traditional methods.

Typically, a relatively simple user prompt is translated on the backend into a detailed JSON input. This internal process enables the model to generate more intricate and tailored results, going beyond what a short, basic prompt might otherwise produce.

For professional creative applications, transparency is key. Unlike some services that abstract away the detailed input, providing the actual model input in JSON format gives users critical control and consistency. This avoids unpredictable image interpretations and empowers professionals to guide the AI with precision.

Creative professionals are increasingly adopting AI, viewing it as a valuable tool to enhance their ideation process. Humans excel at providing essential context to these models, and capabilities like JSON prompting enable them to leverage AI more effectively, ultimately fostering greater innovation and creativity.

> For professional use cases, you don't want to just roll the dice and then get some other completely different image interpretation of your prompt.

## 14:10 - 16:11 JSON prompting enables precise editing and enterprise brand guideline adherence.

JSON prompting ensures highly consistent output by meticulously describing every detail within a scene. This level of granular definition means that even minor alterations to one element can be made without affecting other parts of the generated image or design.

The consistency provided by JSON prompts is crucial for fine-grained editing. Users can adjust a tiny detail, like an element in the corner of an image, while maintaining the overall integrity and composition of the rest of the scene.

This foundational approach has significant implications for enterprise use cases, particularly in adhering to strict brand guidelines. Companies can define precise specifications for elements such as text size, font, and layout, allowing for automated generation that consistently meets brand standards.

> for every brand, you have brand guidelines in terms of, okay, the size of text, the, the font of text, and we think, this kind of foundation allows us to really get into a lot of the, enterprise use cases.

## 16:11 - 18:11 Ideogram Prioritizes Taste in Graphic Design Through Human Evaluation

Ideogram places a strong emphasis on graphic design, particularly text rendering, recognizing its significance for business use cases and storytelling. They consider graphic design to be a crucial frontier, driving them to prioritize it since the release of their first model, which excelled at text rendering.

A key focus for Ideogram is ensuring their models possess "taste," an elusive quality that involves going beyond average outputs and not conforming to typical opinions. This pursuit of unique aesthetic judgment often puts them at odds with standard AI leaderboard metrics that favor conformity.

Because AI models are not yet capable of performing taste evaluation effectively, Ideogram relies on internal evaluations conducted by human designers. They perform side-by-side comparisons of various model versions and outputs from other models to continuously refine and elevate the aesthetic taste of their generated content.

> AI is not very good at doing the actual taste evaluation yet.

## 18:11 - 20:11 Ideogram's Small Model Strategy: Innovation Over Scale

Ideogram achieved state-of-the-art quality using a 9.3 billion parameter model, which is significantly smaller than many competing models. This efficiency allows it to run on a single GPU, making it more accessible to a wider range of users.

The company consciously decided not to compete with tech giants like Google on the scale of computational resources. Instead, Ideogram prioritized innovation, focusing on overlooked areas within the field.

Their strategy includes differentiating in specific domains, particularly graphic design and editable text, which many larger labs are not prioritizing. Additionally, Ideogram adopted an open-weight approach to partner with other platforms, aiming to be a strong alternative for users focused on design.

Looking ahead, Ideogram sees opportunities for future scaling with advanced architectures like mixture of experts. This approach could significantly enhance the model's power without necessarily increasing its operational slowness, building on the foundation of their current high-quality, smaller model.

> we focused on innovation.

## 20:11 - 22:12 Ideogram Prioritizes Small, Customizable AI Models for Artists

Ideogram is focusing on developing small AI models that can run efficiently on consumer GPUs and mobile devices, rather than solely pursuing massive, multi-trillion parameter models. This strategy aims to support on-device image generation and editing, addressing growing concerns about user privacy by keeping data and processing local.

The team believes that while a general understanding of the world is necessary for AI models to be effective, true innovation lies in customization. By providing a base model, artists can fine-tune it using their own work, even with as few as fifty pieces, to capture the unique nuances of their style and canvas textures.

This customization enables artists to achieve high-quality 2K output directly reflective of their individual aesthetic. The goal is to integrate AI into existing workflows, significantly augmenting artists' productivity and creativity. Some artists have reported being three times faster in tasks like making a comic book by incorporating these AI tools.

> This at least made me three X faster in making this comic book.

## 22:12 - 24:12 Customizing AI Models to Meet Enterprise Brand Guidelines

Generic AI models often fall short when applied to enterprise needs, particularly in visual design. Companies frequently find that off-the-shelf AI solutions fail to adhere to their specific design bars, established brand guidelines, and unique stylistic requirements. This limitation prevents businesses from effectively leveraging general-purpose AI for their distinct visual content.

To address this, custom AI models are developed and tailored for specific enterprises. Once trained on a company's unique data, these models demonstrate a deep understanding of the brand's DNA. This specialized AI can then be deployed for various applications, including design ideation and marketing content creation, significantly enhancing creative workflows.

The release of open-weight models further empowers enterprises by offering a glimpse into customization. This approach allows developers within companies to explore fine-tuning and scaling AI solutions to their particular use cases, crucial for encoding intricate styles and brand kits that are difficult to define solely through documentation.

> We tried these generic models, and they don't meet our design bar. They don't follow our style, they don't follow our brand guideline.

## 24:12 - 26:12 Ideogram's Flexible Customization Options for Its Models

Ideogram provides several pathways for users to customize its models, catering to different needs and budgets. These options range from self-service fine-tuning to comprehensive enterprise collaborations.

One method involves leveraging open-source, quantized models for fine-tuning. Additionally, Ideogram offers an in-app custom model training tool where users can upload images to personalize models, with a version 4.0 release anticipated.

For enterprise clients, Ideogram delivers bespoke solutions. This includes their annotation team working closely with client design teams to define specific prompts, keywords, and incorporate brand mascots. This process involves extensive data curation and cleaning to create highly tailored models.

The choice of customization depends on the user's budget and desired return on investment. Users can start with lower-cost open-source options and scale up to higher-budget custom model development with direct Ideogram support.

> We think depending on your size and your budget, you should still be able to customize the model, maybe use the open source at a low budget, and then you can come and talk to us so that we can build the model for you at a high budget, but then depends on really, the ROI that you have in mind for your models.

## 26:12 - 28:13 Image Editing and Fine-Tuning Offer Complementary Creative Workflows

Creative workflows in AI image generation benefit from two distinct yet complementary approaches: direct image editing and model fine-tuning. While direct editing allows for quick, iterative changes to an existing image or style, fine-tuning offers a deeper level of customization and consistency.

Image editing is highly effective for post-generation adjustments. It's a rapid process where users can take an initial output and refine specific details without the need for extensive model training. This method is ideal for iterative workflows, enabling creators to quickly fix minor elements after the first image generation.

Conversely, model fine-tuning, or customization, provides significant freedom by enabling consistent style adherence and precise character details without explicit prompting. This is particularly valuable when ideating within a general style or maintaining complex character characteristics, such as intricate outfits or asymmetrical features, which can be challenging to convey solely through editing inputs.

Ultimately, both image editing and model fine-tuning are powerful and are not mutually exclusive. Their composability, alongside prompting, allows for a vast array of customization options, empowering creators with flexible and robust tools for their creative processes.

> I don't think they're mutually exclusive, but they're both very powerful. With prompting, editing, and model fine-tuning, the composability aspect of the model is just huge.

## 28:12 - 30:13 AI Agentic Workflows Enhance Visual Customization

The creative tools landscape is evolving from direct human interaction through user interfaces to automated "agentic loops." This shift means AI agents can now make modifications to creative content via API requests, streamlining what was once a manual, UI-driven process.

A crucial distinction exists between language models and visual generation. While language model customization is present, it's not as broadly adopted by every company. In contrast, the visual world exhibits immense diversity, allowing immediate brand recognition through distinct visual representations, a level of differentiation often missing in written communication. This inherent visual diversity presents exciting opportunities for specialized customization.

This diversity enables unique interaction methods with visual AI models. For instance, inputs can include 3D representations of joints or object positions, or stylistic variations to define the output. This capability sets visual AI apart from language models, where the primary input is consistently text.

> there is a lot more diversity in the visual world, and that's very exciting for customization.

## 30:13 - 32:14 Agents and APIs streamline iterative design for creative tasks

Agents are becoming crucial for automating creative workflows, especially when connected to APIs. For instance, to release a new feature, an agent can be prompted to use an API to generate numerous images. This allows users to quickly select the best options and have a landing page up and running in a few hours.

The API business for design is evolving beyond single-shot image generation. It's now characterized by iterative loops that involve prompting, editing, and refining. This means instead of just generating an image and calling it complete, the process includes getting an image, using an edit model to adjust it, and generating new images with JSON blobs for better control if needed.

This agentic workflow emphasizes the 'long tail of design,' where evaluation and editing are integrated into the process. It allows for scaling creativity by providing high-level direction and exploring many different approaches, potentially yielding hundreds of thousands of diverse designs.

> It's no longer just you prompt something, you get an image and call it a day, right? It's so much of a, get an image, use the edit model to, you know, edit it, see if it works well, if it doesn't, get another image with the JSON blob, which is, you know, easier for control.

## 32:13 - 34:14 Integrating Evolving AI Models with User Interfaces for Creative Design

Language models enable vast exploration of creative possibilities, but effective user interfaces and experiences are crucial for refining these outputs through editing, whether regional or text-based. Designers face a complex challenge in developing user interfaces while simultaneously adapting to the rapid evolution of underlying AI models, requiring a deep understanding of how these systems function.

Ideogram's model significantly democratizes design, allowing individuals without formal training to quickly produce high-quality visuals. For instance, one user shared that they created a polished design in just two minutes despite having no prior design experience.

The model offers rich artistic capabilities, derived from its training with unique style descriptions that embed a wide array of distinct artistic styles. This varied output differentiates Ideogram's model from some other highly-ranked frontier models, which often yield less diverse design variations.

> I have no design training, and I got this design in two minutes, and it actually looked really, really nice.

## 34:14 - 36:14 Ideogram Prioritizes Diverse and Tasteful Image Styles

Ideogram's image generation model stands out because it uses very little reinforcement learning, resulting in a 'raw' model. While this requires more precise prompting, it allows for a much wider variety of artistic styles compared to other frontier models, which often produce a repetitive aesthetic.

The design philosophy behind Ideogram emphasizes enabling diverse and tasteful outputs. This is particularly valuable for the art and design community, where creations like infographics, ads, or logos need to be distinct and attention-grabbing to effectively communicate an idea.

Users have noted that Ideogram often generates images that are uniquely different from typical AI-generated art, making them stop and take notice. This ability to produce novel and engaging visuals helps designs stand out and hold an audience's attention, fulfilling the goal of communicating effectively.

The model is intentionally designed to handle many different styles without forcing a complex output, allowing for minimalist designs if desired. This flexibility ensures that the model can cater to a broad spectrum of creative needs while maintaining a high standard of taste.

> Wow, this is different than anything I've seen before coming out of an image model, and like, 'This is doing an amazing job of both communicating what I wanna communicate and also holding someone's attention.'

## 36:14 - 38:14 Evolving Prompt Representations: From JSON to HTML

To create more powerful diffusion models, the goal is to make the task as straightforward as possible for the model by specifying exact image details. While current representations like JSON abstract elements like lines or pixels, the ultimate extreme would be to provide the pixels themselves, though this is challenging for language models.

The primary constraint is that the intermediate representation must be token-based, as large language models (LLMs) are not proficient with continuous output like pixel values or very high-dimensional vectors. LLMs are currently limited to token counts, for instance, around 4,000 tokens.

Currently, natural language is used because LLMs are trained extensively on it. However, future prompt representations for diffusion models may evolve from natural language or JSON to HTML. This is because large language models are also heavily trained on HTML and understand its structured tokens, which could offer a more detailed and consistent way to specify visual information within token limits.

> it may become more close to HTML, for example. That's okay, because again, large language models are trained with, with HTML and they know the tokens.

## 38:14 - 40:14 Join Ideogram as a Craft Engineer or Partner for Open-Weight Innovation

Ideogram is actively seeking "craft engineers" to join its small team, emphasizing a high-agency environment where individual contributions have significant impact. The company positions this as an ideal opportunity for engineers who want their work to matter and to be part of the academic and open-source ecosystem.

Beyond recruitment, Ideogram also aims to collaborate with external partners, specifically creative brands, to develop designs and provocative advertisements. They are also open to partnering with other startups and companies at various levels of the tech stack.

A core tenet of these partnerships is Ideogram's use of "open weights," which allows them to offer solutions that provide companies with greater control and data privacy. This approach is designed to create win-win scenarios, offering a distinct option to organizations seeking more autonomy.

The discussion briefly touched on technical considerations, noting that language models have been trained on HTML, making it a more sensible choice for representing text elements and buttons compared to introducing a new JSON structure.

## 40:14 - 42:15 Ideogram Provides Custom Model Training and Tailored Enterprise Solutions

Ideogram offers a "model tab" feature where users can train their own custom models. This service costs $60 per month for two model trainings and requires a minimum of 15 images to get started. It is presented as a valuable tool for professionals looking to develop a unique style.

For enterprise clients seeking to fine-tune models, Ideogram recommends direct consultation. Companies can fill out sales forms to discuss their specific needs, as requirements vary significantly. Ideogram can provide tailored solutions for different use cases, such as automating marketing ads or enhancing editing processes.

> It's sixty bucks, for two model training per month, but I think for professionals that's totally worth it.

---

Get podcast briefs for shows you follow: https://podbrew.app/