[PIG] Visual Metaphors as the new Dynamic Storytelling Interface for Programming

Visual Metaphors as the new Dynamic Storytelling Interface for Programming

Team member names

Warren Koch (Senior Programmer and Artist)
Jason Griggs (Senior Programmer and Engineer)

Short summary


(Alt Image Link)

Thanks to the progress in Visual Programming, text based code has an easy mapping equivalent to visual graphs of function nodes. These are a significant improvement, but we think there’s yet another tier. With AI, we now have the ability to map code to dynamically-generated stories with interactive controls (example shown above). These could be animated, bespoke, designed to show a metaphor of what’s going on under the hood in any level of detail, from multiple angles. A personalized interface tuning itself to best aid your understanding.

While this might seem like a huge leap, the fundamental operations are no longer complicated. The abilities of AI make this much bespoke recontextualization cheap, fast and easy for any problem - the only challenge ahead is finding stable-enough tiers of quality we can reliably hit so this can be relied upon as a basic interface. This exploration takes the form of trying out different patterns of interaction to find the best ways to represent data and processes - it’s a conversation format between a user and an AI. A protocol.

What is the existing target protocol you are hoping to improve or enhance?

  • We hope to improve Visual Programming to be much more lively, intuitive-at-a-glance, natural and game-like - which will have downstream improvement effects on Text-based Programming, Operating System Applications, and the way we interact with our computers (and each other!) on any technical or complex task
  • Intuition: We’ll use AI tech to dynamically combine stylish Video Game User Interface patterns of design with typically dull, dreary Visual Programming Interfaces, to hopefully make something useful AND visually intuitive

What is the core idea or insight about potential improvement you want to pursue?

  • Visual and tactile programming have proven to be some of the most accessible and enjoyable ways to interact with computers, especially for less technically-fluent people
  • Video games have demonstrated ways to preserve complex operation logic behind intuitive and fun visual metaphors for almost any kind of task
  • Interfaces like Scratch, Unreal Engine, and ComfyUI have done the heavy lifting mapping between textual and visual representations of programs (to a node-graph metaphor), but it’s time to take this further
  • Using new AI capabilities to generate bespoke images in seconds, we can dynamically adapt the visual representations of objects and processes to metaphorically represent the underlying data
  • This can be adaptive to any scale of detail or fidelity, and tunable to the user’s preferences and the wider context of the system being represented
  • Understanding and mapping out this process connects programming methods to real world visual protocols and vice-versa, opening countless avenues for new insights between the two
  • Visualizations become Actions. Actions become Visualizations

What is your discovery methodology for investigating the current state of the target protocol?

  • Examine existing workflows for the popular visual programming application ComfyUI (e.g. https://comfyworkflows.com/) and use those as primary “easy” examples for our mapping process
  • Examine existing popular github codebases/libraries in text form and apply this method to them (“hard” mode)
  • Examine existing OS applications and unexpected real world examples of complex systems and apply this method to them too (“wild” mode)
  • Aim for small prototype niche demos, then general (80%) applicability, then rigorous/exhaustive applicability

In what form will you prototype your improvement idea?

  • We aim to build a rough map of the most accessible and useful visual metaphors we can apply to various example processes. With the help of AI prompts, we may be able to make this exhaustive and auditable to high expected quality
  • Using LLM prompt-based programming, we then build metaphor selection protocols (in the form of questions and answers) and using generative image AIs we build a series of intermediate visual representations before outputting a final image (see AI QR codes for examples of how we can dynamically shape such interfaces)
  • We will tune this general method by applying it to a wide variety of example programs/applications/problems and attempt to produce various valid visual representations which try to balance accurate representation of the underlying data with visual and metaphorical elegance
  • Deliverables will take the form of individual example demonstrations, limited-scope open source code extensions to ComfyUI, or a generally-applicable open source program

How will you field-test your improvement idea?

  • Sharing example mappings with the SoP community for protocol-savvy feedback
  • Friends and family demos for non-tech-savvy feedback
  • Colleagues and friends for programming-savvy feedback

Who will be able to judge the quality of your output?

  • Primary aim is to make complex systems understandable and intuitive to laymen. Thus, we hope to be judged by the public, on social media, reddit, and Summer of Protocols. “That’s pretty cool” would be a good barometer.
  • Coaxing the harder-core nerds might be more difficult, having suffered so long glued to the bastions of eye strain and egotism that are complex textual git repos. It will take many more steps before they can be convinced of a softer future. One day we’ll top HackerNews though - you’ll see.
  • We are mostly doing the work of condensing visual programming courses and methods into a generalized protocol for AIs to dynamically start whittling down complexity and represent any problem in a visual language
  • Thus: anyone from a Visual Programming background would make a great judge
  • These folks (VPL - Visual Programming Lab)
  • Or these (Scratch)
  • Or any developers from the ComfyUI community!
  • Let’s be honest though, the real judge will be GPT5-or-whatever analyzing our solutions with a UX designer hat. You guys just get the final product

How will you publish and evangelize your improvement idea?

  • Release an open source module plugin to the popular AI programming application ComfyUI implementing dynamic metaphorical interface generations
  • Release general open source code and documentation making steps towards generalized application of this method across various operating system and applications
  • Blog posts articles with demonstration images and videos showing the mapping process and examples of the method, shared to social media

What is the success vision for your idea?

  • Create a scalable general protocol for mapping any complex system to a set of visual metaphors and analogies
  • Create a library showing the mapping options available for any given base process (exhaustively, if possible)
  • Create working code demonstrating the method in a popular programming ecosystem
  • Specific target: mockup or modify ComfyUI node interfaces into visual metaphors of the underlying node computation
  • Specific bonus target: make ComfyUI interfaces fluidly tune to pick a visual metaphor closest to the underlying data. For example: a classic workflow prompting “picture of a cow in a field” would be dynamically redesigned to shape the visual interface for each program node in the workflow to use a Stardew Valley-esque farming metaphor for each step in the processing logic, before outputting the original program output (in this case, a cow picture). This would entail live bespoke generation tuned to fit templating parameters and program scope - but we’re confident it’s quite doable. Just might take some work to get it high quality in the general case!

Follow-up Discussion (if interested):

Elaboration on the example:
While the labels are just layered on right now via Paint.net, the examples do represent an honest possible depiction of the current node-based workflow, and were imagined, articulated and generated almost entirely by ChatGPT with generic workflow-agnostic prompts and the source json. With a little tuning and tweaking we can do much better. Each interface control could easily become an interactive element with a small object representation within a “scene”, a context-sensitive font and visual, and a thoughtful metaphor fitting the whole scene.

I show 6 images here, but those could easily be condensed into a single “video game”-like 1-image scene with different objects representing the components and an animation loop showing the actual work being performed (note: animations are about 60seconds to generate with maybe 50% hit/miss on a local machine). For now you have to imagine the animation:

  • Step 1: Model loading. An artist walks back and forth from another room delivering boxes of paints and materials to the studio. They pile up and get unpacked til the room is ready. Model loading time can be estimated well - should be able to time animations accordingly. Different model types might entail different material styles. This scene should ultimately reflect the model being loaded. Different models should be selected by clicking different boxes in the studio.
  • Step 2: Artist walks to the empty canvas marked with “<- width ->” and “<- height ->” interactive controls, as well as a “batch” control to multiply the cavass count. Artist picks up canvas to move to the KSampler scene.
  • Step 3: Scrapbook moodboard showing the various text prompt inputs and negative “No” dont-include prompts. This should get redrawn to just be a fun mini visualization of the prompt being requested, but be responsive to typed (or spoken) modifications.
  • Step 4: KSampler depicts the artist sitting down to paint, with increasing fidelity in every timestep. Background depicts multiplicity of painters (reflecting what happens with multiple batches in parallel)
  • Step 4A: KSampler parameters influence how the painting progresses.
    • A metronome-timer shows a “Step” countdown from 20 (settable default),
    • a “mode” enables multiple scheduler toggles
    • a “seed” shows one random permutation as a plant - the plant grows or shifts position in every cycle as the seed changes, simulating “hours” or “days” passed in virtual time.
    • “Seed: random” is a toggle for the seed change mode (random, fixed, increment (grow), decrement (shrink)), each corresponding to plant animation
    • “Sampler: Euler” is one of many different sampling styles, each represented by a paintbrush or tool. These could have signature looks reflecting their style of how they affect pictures
    • “CFG Guidance: 8.0 (Medium)” is a toggle showing how abstract the image should be from its prompt (aka scrapbook moodboard). Click it to cycle through settings, showing a wilder <=> more precise moodboard image.
  • Step 5: VAE Decode and Save - the “final framing and glossing” step before it’s sent away to be saved.

This was just one possible metaphorical representation of what is happening in the default ComfyUI Stable Diffusion image generation process. There are probably many different metaphors which would work. While it requires a little bit of meta overhead to generate these, it’s a one-time cost per workflow (or per setting change if we’re not caching/limiting changes) - and it would be easily shared across the internet as a new downloadable interface.

Let me emphasize that again: this is an interface. You would no longer ever need to touch the underlying graph-based code, nor the text-based code under that. Adding new nodes or modification beyond the parameters I’ve outlined would possibly trigger a new metaphor redraw (a new “scene”), but likewise would not require deeper understanding. Requests to delve into the details of any one node or source code could likewise trigger yet another metaphor mapping, this time in more detail for the code in question - and will often be preferable to reading the raw code if/when it can more quickly explain what’s going on to the viewer and detect anomalies. As we get better at this, we will make ways to ensure this is a two-way mapping with high precision and utility.

This example generation costed about 10 images, 4 rejected. Expect procedural generation to do maybe 100 and sort out the best ones behind the scenes. On a decent local computer that’s still only about 1-2 minutes of overhead generation pessimistically, with today’s models and hardware. Thus, dynamic interfaces are entirely within the realm of technological possibility now - the only trick is making workflows that generate them well. Or hell - generating them all by hand and simply sharing them! No poor user should ever have to stare at an ugly screen interpreting inscrutable text again - not when it can be converted into a beautiful scene tacitly explaining what’s happening in a visual language humans have built up since well before the first words were scrawled in the sand.

None of this is even considering more advanced methods, like boxing prompts into regions and rendering everything in a single big image - or even 3D object generation from prompts. Manipulating a bespoke 3D wand could very-well become a programming interface.

I will be building these this year. I expect many others will do as well, even as OpenAI’s SORA looms to map everything to everything else. If SoP happens to fund this venture, we’ll make it very accessible and map out the whole space in a more academic and theory-minded way that others can easily build on, but otherwise expect it to be a show-don’t-tell and hope to be wowed when it comes! Anyone interested in building this direction, hit me up here or on Twitter, and we’ll see how we can collab in an open source way. Cheers everyone!