Immaginare l'immaginario

Immagini AI per testare una avventura testuale

<pre>To waiter in an old hotel in the dolomites: "Where can I find the toilet?" you ask. "Oh yes, it's that door you can see there." he answers addressing the door in the dining room. "Thank you very much." you say. "You are welcome" he answers and goes away.</pre> — To waiter in an old hotel in the dolomites: "Where can I find the toilet?" you ask. "Oh yes, it's that door you can see there." he answers addressing the door in the dining room. "Thank you very much." you say. "You are welcome" he answers and goes away.

Da qualche tempo sto pubblicando delle pagine con una sorta di tutorial del linguaggio di programmazione Inform7; si tratta di un linguaggio specializzato per la creazione di avventure testuali (Nuova pagina) o, come vengono chiamate ora, interactive fictions.

E' un genere di gioco nato negli anni 70 del novecento, sono delle storie descritte solo con il testo, anche perchè sui computer dell'epoca era difficile fare diversamente, in cui il giocatore si muove e può maneggiare degli oggetti per scoprire tesori nascosti, indagare su fatti accaduti.

Si tratta di risolvere enigmi, trovare modi ingegnosi di aprire passaggi e contenitori...

Ne ho fatto cenno anche in un articolo su NOC Sensei (Nuova pagina), interessante rivista di fotografia online.

Ma cosa centra la fotografia con creazioni senza alcuna illustrazione?

La fotografia tradizionale, quella che conosciamo da quasi due secoli, nulla, ma da pochissimo tempo sono apparsi strumenti che consentono di realizzare immagini simil-fotografiche a partire da descrizioni testuali: perchè non provare a passare a questi strumenti le descrizioni del gioco per vedere come potrebbero essere interpretate?

<pre>Dining room A large room with many tables, you can sit on a bench that runs along the walls or on the chairs near the tables. On the table is a dinner menu.</pre> — Dining room A large room with many tables, you can sit on a bench that runs along the walls or on the chairs near the tables. On the table is a dinner menu.

Ecco che la maestosa sala da pranzo, degna di un castello, prende forma.

La storia che avevo in mente di realizzare era ambientata in un sottopasso di un hotel in un piccolo paese dell'Alto Adige; in quel sottopasso era custodita una raccolta di cavallini giocattolo che avevo fotografato durante l'estate.

Con una di quelle foto ho anche realizzato la copertina che può essere inserita nel gioco compilato.

Ho utilizzato una delle mie fotografie ed il risultato finale non mi sembra così male, ma provando a passare a Bing Image Creator (Nuova pagina) (che utilizza il motore Dall-E (Nuova pagina)) il banner iniziale del gioco si ottiene quello che riporto sotto.

<pre>The house of the rocking horses A fascinating exploration in an abandoned town in the Alps by Roberto Ceccarelli</pre> — The house of the rocking horses A fascinating exploration in an abandoned town in the Alps by Roberto Ceccarelli

La cosa inizia a diventare divertente ed inizio ad utilizzare le immagini create in questo modo per illustrare il tutorial che ho citato all'inizio.

Quella trattazione in effetti più che a spiegare ad altri come fare le cose serve a me, principiante, per capire come farle e le immagini servono come test per individuare che tipo di immaginazione può essere stimolata.

<pre>On the left table are ten dishes, five bibs, ten glasses (empty), a jug of water, a jug of milk, a jug of orange juice, a jug of ACE juice, a jug of clear apple juice and a jug of pear juice. >take a glass Taken. >fill it from the jug of orange juice Now you have a glass of orange juice, it is full.</pre> — On the left table are ten dishes, five bibs, ten glasses (empty), a jug of water, a jug of milk, a jug of orange juice, a jug of ACE juice, a jug of clear apple juice and a jug of pear juice. >take a glass Taken. >fill it from the jug of orange juice Now you have a glass of orange juice, it is full.

Perdo un sacco di tempo a generare le immagini, perchè mi piace provare soluzioni diverse: mica siamo tutti uguali e i giocatori potrebbero avere reazioni diverse alla stessa descrizione.

Prendo ad esempio lo stesso prompt e lo inserisco su Stable Diffusion (Nuova pagina) impostando lo stile a cinematic: il risultato diventa tutt'altro.

Il prompt utilizzato altri non è che una sequenza del gioco costruito nel tutorial in cui il protagonista si trova accanto al tavolo del buffet della colazione e si versa un bicchiere di succo d'arancia.

Buffet
A corner where you can pick up food from two tables, one on the left and the other on the right.

On the left table are ten dishes, five bibs, ten glasses (empty), a jug of water, a jug of milk, a jug of orange juice, a jug of ACE juice, a jug of clear apple juice and a jug of pear juice.

On the right table are ten slices of speck, a decorated canister (closed), a white bowl (in which are ten knobs of butter), a white basket (in which are eight slices of white bread) and a brown basket (in which are ten slices of pumpernickel bread).

>take a glass
Taken.

>fill it from the jug of orange juice
Now you have a glass of orange juice, it is full.

>drink orange juice
(the glass of orange juice)
You drank orange juice; pleasant.

Nello studio del sistema mi faccio sempre più ardito, al momento sto imparando a costruire i dialoghi: perchè non passarne uno come prompt?

In questi casi la differenza tra Dall-E e Stable Diffusion (che con il medesimo prompt ha creato l'immagine all'inizio della pagina) si fa sentire: la differenza sta probabilmente nel fatto che Dall-E può avvalersi anche dell'analisi del testo fatta da ChatGPT (Nuova pagina).

Di contro Dall-E è anche troppo puritano e talora si rifiuta di generare le immagini come nel caso in cui si voleva semplicemente avere una porta del bagno chiusa.

<pre>You are outside a toilet in an old hotel in the alps. The door is closed because someone is inside</pre> — You are outside a toilet in an old hotel in the alps. The door is closed because someone is inside

Salvo poi generare immagini come quella utilizzata per creare questa storia di Instagram (Nuova pagina).

<pre>a plumber installs a wc in the bathroom in an old hotel in the alps, no windows in the room, adventure game</pre> — a plumber installs a wc in the bathroom in an old hotel in the alps, no windows in the room, adventure game

Ha proprio sbagliato tutto: l'idraulico doveva montarlo il wc, non utilizzarlo; nella stanza non dovevano esserci finestre e invece ce n'è una enorme dalla quale l'idraulico si gode il panorama.

A tal proposito ho ricreato il prompt e non gli ho dato in pasto direttamente il sorgente dove una stanza senza finestre viene definita dark, cosa che non potevo pretendere che capisse.

The wc-room is a dark room. "The small room is floored and covered with white tiles.[line break][if wc-lamp is lit]The light comes from a ceiling lamp[otherwise]A lamp is at the ceiling, but it's off; the light comes from the door[end if].".
The printed name is "WC room".
The wc-lamp is a scenery ceiling lamp in the wc-room. Printed name is "wc ceiling lamp".
The toilet light switch is a scenery light switch in the bathroom-antechamber.
The toilet light switch powers the wc-lamp.
The toilet bowl is an enterable supporter in the wc-room. "In front of the door there is the toilet bowl, above the button of the flush."
The flush button is scenery in the wc-room. The description is "A large white button. You should press it after using the wc."
Instead of pushing the flush button:
	say "You cleaned the bowl with the water flush.";
	now the toilet bowl is clean.

Quello riportato sopra è proprio il frammento di codice sorgente che definisce la toilet, con il wc ed il relativo sciacquone, la plafoniera e l'interuttore per accenderla.

Siamo abituati a pensare ad un sorgente di un programma per computer a qualcosa come:

#include 

int main() {
    std::cout << "Hello World";
    return 0;
}

Quella che spesso è definita come una passeggiata del gatto sulla tastiera.

L'esempio appena mostrato è scritto in C++, linguaggio diffusissimo e disponibile per praticamente qualsiasi microprocessore, da quello della lavatrice ai grandi datacenter; l'unica operazione che fa è scrivere Hello World quando avviato.

La descrizione della toilet di sopra invece utilizza il linguaggio Inform 7 (Nuova pagina), espressamente dedicato alla creazione di giochi interattivi testuali, in cui l'equivalente è:

Hello by Roberto Ceccarelli

Hello World is a room.

Questo qui invece lo capisce chiunque.

<pre>In an old hotel in the alps The practitioner of literate programming can be regarded as an essayist, whose main concern is with exposition and excellence of style. Such an author, with thesaurus in hand, chooses the names of variables carefully and explains what each variable means. He or she strives for a program that is comprehensible because its concepts have been introduced in an order that is best for human understanding</pre> — In an old hotel in the alps The practitioner of literate programming can be regarded as an essayist, whose main concern is with exposition and excellence of style. Such an author, with thesaurus in hand, chooses the names of variables carefully and explains what each variable means. He or she strives for a program that is comprehensible because its concepts have been introduced in an order that is best for human understanding

L'idea di scrivere dei programmi che fossero facilmente comprensibili alle persone prima che alle macchine non è affatto nuova; oggi possiamo chiedere in modo discorsivo a ChatGPT di scriverci un programma che poi potremo compilare e far eseguire al nostro pc, ma nel 1983 non era nemmeno pensabile.

User
good morning

ChatGPT
Good morning! How can I help you today?

User
Could you write a c# program to write the fibonacci's numbers?

ChatGPT
Absolutely! Here's a simple C# program that generates Fibonacci numbers up to a specified limit:
using System;

class Fibonacci
{
    static void Main()
    {
        Console.WriteLine("Enter the number of Fibonacci numbers to generate:");
        int count;
        
        // Validate user input for a positive integer
        while (!int.TryParse(Console.ReadLine(), out count) || count <= 0)
        {
            Console.WriteLine("Please enter a positive integer:");
        }

        GenerateFibonacci(count);
    }

    static void GenerateFibonacci(int count)
    {
        int first = 0, second = 1, next;

        Console.WriteLine("Fibonacci Series:");

        for (int i = 0; i < count; i++)
        {
            if (i <= 1)
            {
                next = i;
            }
            else
            {
                next = first + second;
                first = second;
                second = next;
            }
            Console.Write($"{next} ");
        }
    }
}

This program prompts the user to input the number of Fibonacci numbers they want to generate and then displays that many Fibonacci numbers.

User
thank you

ChatGPT
You're welcome! If you have any more questions or need further assistance, feel free to ask. Happy coding!

Ecco qua un esercizietto scolastico brillantemente svolto.

Nel settembre del 1983 Donald E. Knuth (Nuova pagina), informatico conosciuto soprattutto per la serie di volumi The Art of Computer Programming (Nuova pagina), invia alla rivista accademica The Computer Journal (Nuova pagina) un saggio dal titolo Literate Programming (Nuova pagina) nel quale si sostiene che i programmi dovrebbero essere considerati alla stregua di opere letterarie e comprensibili innanzitutto agli esseri umani.

Il prompt delle due immagini sopra (ad eccezione della prima frase aggiunta per mantenere una certa coerenza stilistica con le altre immagini) è tratto proprio da quel saggio.

La cosa curiosa è che tutto il sistema del literate programming viene chiamato WEB, che nulla c'entra con quello che oggi noi chiamiamo allo stesso modo e che sarebbe stato teorizzato (Nuova pagina) solo nel 1989 da Tim Berners-Lee (Nuova pagina).

Graham Nelson (Nuova pagina), l'autore del linguaggio Inform, in un saggio all'interno di «IF Theory Reader» (Nuova pagina) (pag. 141), ci spiega quali sono le ragioni della scelta di utilizzare un linguaggio quasi naturale per la scrittura di programmi di interactive fiction.

Tradotto in immagine da Bing:

<pre>IF is a form of creative writing impossible before the development of computing, but whose 30- year history has seen a flourishing of experimentation if not mainstream acceptance (except in an early commercial phase): the author creates an imaginary textual world that can actively be explored by a “reader,” or “player,” directing the actions of a protagonist. Such works have hitherto been created as if computer programs, using specially adapted programming languages, but the Inform 7 project aims to replace such syntax with natural language: specifically, a subset of English. This change proved far more radical than had initially been expected, and it became clear that semantic analysis and related branches of linguistics were of great relevance to practical issues of how design systems for IF should work.</pre> — IF is a form of creative writing impossible before the development of computing, but whose 30- year history has seen a flourishing of experimentation if not mainstream acceptance (except in an early commercial phase): the author creates an imaginary textual world that can actively be explored by a “reader,” or “player,” directing the actions of a protagonist. Such works have hitherto been created as if computer programs, using specially adapted programming languages, but the Inform 7 project aims to replace such syntax with natural language: specifically, a subset of English. This change proved far more radical than had initially been expected, and it became clear that semantic analysis and related branches of linguistics were of great relevance to practical issues of how design systems for IF should work.

Riprendiamo ancora una volta come prompt (salvo il solito incipit aggiunto) una citazione dal saggio di Knuth.

<pre>In an old hotel in the alps. I chose the name WEB partly because it was one of the few three-letter words of English that hadn’t al- ready been applied to computers. But as time went on, I’ve become extremely pleased with the name, because I think that a complex piece of software is, indeed, best regarded as a web that has been delicately pieced to- gether from simple materials.</pre> — In an old hotel in the alps. I chose the name WEB partly because it was one of the few three-letter words of English that hadn’t al- ready been applied to computers. But as time went on, I’ve become extremely pleased with the name, because I think that a complex piece of software is, indeed, best regarded as a web that has been delicately pieced to- gether from simple materials.

Il generatore di immagini ci propone un hotel accanto ad un lago con delle imbarcazioni ed uno strano ponte sinuoso che lo raggiunge, parrebbe quasi una ferrovia.

<pre>Once upon a time there was a small village in the Alps. To reach it a railway was built and an enterprising local lady had the idea of building an hotel next to the railway station to accommodate the tourists who would arrive. After years the railway should have been modernized, but the high costs and the ever-increasing diffusion of road traffic meant that it was dismantled. The road ran on the other bank of the river and one by one the various activities moved to the other</pre> — Once upon a time there was a small village in the Alps. To reach it a railway was built and an enterprising local lady had the idea of building an hotel next to the railway station to accommodate the tourists who would arrive. After years the railway should have been modernized, but the high costs and the ever-increasing diffusion of road traffic meant that it was dismantled. The road ran on the other bank of the river and one by one the various activities moved to the other

Questa invece è l'introduzione dell'avventura The house of the rocking horses che avevo iniziato a scrivere e di cui avevo generato l'immagine, sempre con Bing, tempo fa.

Con ogni prompt vengono generate quattro immagini e con quello del WEB c'era anche questa qui sotto.

Quei cavalli non glielo ha detto nessuno di metterli in questa occasione, ma ormai old hotel in the alps viene abbinato a rocking horses e me li mette ogni volta che può.

L'intelligenza artificiale evidentemente si auto-alimenta con i prompt che gli vengono sottoposti.

Proviamo ora a passare un pezzo del codice sorgente come prompt; è il punto in cui si definiscono i due camerieri presenti nella sala.

<pre>The first waiter is a male waiter in the dining room. Hair is "blond". Eyes are "blue". Proper name is "Hans". The first waitress is a female waiter in the dining room. Hair is "auburn". Eyes are "green". Proper name is "Mila".</pre> — The first waiter is a male waiter in the dining room. Hair is "blond". Eyes are "blue". Proper name is "Hans". The first waitress is a female waiter in the dining room. Hair is "auburn". Eyes are "green". Proper name is "Mila".

Spezzoni di sorgente vengono capiti benissimo dal generatore di immagini; in questo caso era una situazione statica, ma Bing (ovvero Dall-E) è decisamente sorprendente quando c'è azione.

<pre>In an old hotel in the alps. A waiter is coming to you. >z Time passes. A waiter is at your table. "Good morning!" he says. "Good morning!" you answer. Then he asks you: "What can I serve to you?" 1 - a coffee 2 - a cappuccino 3 - a hot chocolate 4 - a tea "Ok, I will come back in soon with your cappuccino." the waiter states; then he goes to the kitchen.</pre> — In an old hotel in the alps. A waiter is coming to you. >z Time passes. A waiter is at your table. "Good morning!" he says. "Good morning!" you answer. Then he asks you: "What can I serve to you?" 1 - a coffee 2 - a cappuccino 3 - a hot chocolate 4 - a tea "Ok, I will come back in soon with your cappuccino." the waiter states; then he goes to the kitchen.

Dunque l'intelligenza artificiale farà sparire programmatori, fotografi e disegnatori?

Sicuramente li spingerà in nicchie sempre più specializzate, ma sicuramente non mi priverà del gusto di programmare la mia avventura, di scattare fotografie alle cose che mi incuriosicono o che voglio ricordare, però mi permette di ottenere generiche immagini per illustrare un altrimenti noiosissimo tutorial di programmazione.

Collezione cavallini e sala lettura dell'hotel

Come tutte le cose va utilizzata con intelligenza (umana), prendete ad esempio l'immagine sotto: non c'entra nulla con la richiesta, ma vuoi vedere che attira più di tutto il resto?

<pre>A young woman switch off the light in the bathroom. Fascinating adventure game in an old hotel in the alps</pre> — A young woman switch off the light in the bathroom. Fascinating adventure game in an old hotel in the alps