Breaking of the Images

The pictures assembled by generative machine learning models, I've realized, lack juice in a very specific way: the total absence of any suggestion of a relationship between the object/subject of the image and the subject/object of the image-maker.

Any creation of a visual representation by a human mind and hand is the product of multiple intersecting relationships binding the depicted to the depictor, and as viewers our experience of the image incorporates our thoughts and feelings about what we infer regarding those relationships.

So much of what we discuss about visual art encompasses how the artist appears to feel about the subject and how the subject (if sentient) might feel about the artist. This tension is a source of vibrance and engagement. Without it the image feels cold, dead, empty and without purpose.

A painting, drawing or photograph of a person radiates the complexity of these emotional and material relationships. Exploitation, consent, love, hate, indifference, commercial exchange, eroticism (exhbitionistic and voyeuristic), ownership, power and submission, aesthetic axiology, alienation, etc.

Even creations which are entirely produced from the creator's mind, without a subject or model present, at minimum evoke these relational tensions between the creator and their own interiority, which is something we as an audience can all identify with.

Prompting a generative machine learning model is relationless. The object/subject being depicted exists nowhere, not even in the mind of the prompter, other than as a reactive impulse toward whatever the model has conglomerated and a decision whether or not to roll the dice again.

A prompter cannot fear the nonsubject of the depiction it iterates toward. Neither can it love, hate, be bored by or indifferent to, overcome shyness toward, boss around, be shamed by, be praised by, collaborate with, ignore, disappoint or thrill the nothing which was never anywhere.

Certainly not all visual depiction is "art". There is plenty of purely mercenary slop out there which serves no purpose other than to fill space and attract a tiny amount of attention (but not too much!) Still we feel a pathetic emptiness even in the degraded version of this swill produced by generative machine learning.

It is this palpably total lack of relationship involved in the creation of even these images which produces a void, an anti-experience for anyone involved.

The Resolution Question: Film vs. Digital

I was recently asked if Super 16mm film was capable of a higher resolution than 4K digital capture. Or, verbatim, the question was “also is 16mm a higher resolution than 4K [?]”. So, we have an opportunity to discuss the common confusion between raster and resolution.

Resolution is a measurement or a perception of how much discernible visual information appears in an image. It involves your entire imaging and display chain, not simply your capture medium. Resolution is measured in line pairs per millimeter.

The lens you're shooting with is capable of resolving a certain amount of information, your capture medium (sensor or film stock) is capable of resolving a certain amount of information, and your display or projection format is capable of resolving a certain amount of information. All of those things contribute to the 'resolution' of the viewed image.

'4K' is not a resolution, it is a raster size (meaning pixel or photosite dimensions, like 4096 x 2160, for example, or 1920 x 1080). It is extremely common both in camera (and display) marketing and in amateur cinematography circles to conflate the idea of raster with resolution. The two ideas are independent, and a given raster can “contain” a wildly varying amount of resolution.

If you were to compare lenses on the same capture medium (let's say the Alexa 65), you can produce very different resolutions at the same raster size. That is to say, both a very soft lens and a very sharp lens can be used when imaging to a 4K raster, but both the measured and the perceived resolution will be very different. Or, let's imagine you're shooting with a Master Prime, one of the measurably sharpest lenses available, and recording a 4K raster. What's your resolution when you throw the lens completely out of focus? It’s also common to use diffusion filtration to reduce the resolution of the system, usually for aesthetic reasons.

The perceived sharpness that a film stock is capable of rendering is described by a Modulation Transfer Function which plots how many line pairs per millimeter the stock is capable of discriminating at an acceptable level of contrast between the lines (i.e., can you actually distinguish between a black and white line, or is the image a gray mush?). 50% MTF tends to be the cutoff of acceptability. The MTF of film emulsion changes independently per dye layer, with the layers tending to diverge from each other between 10 and 20 lppm. The resolution of Kodak 500T at 50% MTF starts to decline steeply after 30-50 lppm, depending on the dye layer. Here is Kodak’s MTF chart for 5219/7219:

taken from https://www.kodak.com/content/products-brochures/Film/VISION3_5219_7219_Technical-data.pdf

taken from https://www.kodak.com/content/products-brochures/Film/VISION3_5219_7219_Technical-data.pdf

At 30 lppm each line is 1/60mm, or 16.7 micrometers, wide.

The Alexa's photosites are 8.25 micrometers in diameter, or roughly 1/125mm. At first glance this would seem to say that the Alexa sensor can resolve about twice the information as 500T film stock, but let's not forget our friend Nyquist, who tells us that we need to sample our information at double the highest frequency of that information, so this gets us back to being able to resolve lines about 1/60mm wide again. With the slight blur from the OLPF, the actual resolution at 50% MTF is probably roughly equivalent to 500T film negative, but in the sensor's favor that resolution should be roughly constant across the R, G and B-masked photosites, unlike the variable resolution of film's three dye layers.

But that's not all! We have to take into account the enlargement of the medium when displayed or projected. Why does 16mm film look both grainier and softer than 35mm of the same emulsion when projected? Because to fill the same size screen (or monitor), the 16mm film frame must be enlarged (blown up) much more than than the 35mm film frame. The same is true when scanning film; there's still a relative size difference between a 16mm film frame and the scanner's imager that's greater than that between a 35mm film frame and the scanner's imager. A similar effect happens if you're comparing 16mm film which has been scanned to a 4K raster vs. a digital image captured at a 4K raster from a Super-35 sized sensor. When viewed on the same display the 16mm film scan will appear softer and grainer than the digitally captured image (assuming a low amount of noise in the digital image).

So, to sum up, it is highly likely that a 4K-raster digitally captured image will be perceived to have a higher resolution than 16mm film scanned to a 4K raster.

The Relationship Between Focal Length and Format

The landscape of both cinematography and photography is littered with a wealth (or perhaps a glut) of choices in terms of “format”; the physical size of the imaging surface in the camera. You may be familiar with such diverse options as:

  • “Full Frame”

  • Super 35

  • APS-C

  • Micro 4/3

etc.

A tremendous amount of ambient confusion reigns regarding how differing focal lengths of lenses interact with format sizes to affect the field of view of the image. People use terms like “crop factor” to get a handle on how the expected field of view may differ between formats, but this often misleads novices into believing that one can simply use a lens “meant for” the format they’re shooting with (typically one of the smaller formats) and then they won’t have to consider the so-called crop factor and can get on with their lives.

The only way in which a lens can be “meant for” a particular format is if it has been engineered such that it projects a sufficiently large image circle across the imaging area to avoid vignetting (darkening of the sides and/or corners of the image). Focal length is a constant. A 50mm, for example, is always a 50mm, no matter what format it is projecting onto*. If we take Nikon’s naming conventions as an example, a 50mm lens sold for their DX (APS-C) system is only different from a 50mm sold for their FX (“full frame”) system in that the latter projects a larger image circle than the former.

“But!”, you may be tempted to respond, “if I put that 50mm FX lens on my DX body, I see a narrower field of view than I do on my FX body!” This is true, but it’s obvious that the lens hasn’t changed. What has changed is the area of the lens’ image circle which is being “sampled”, as it were, by the smaller-sized imager.

Consider this diagram, in which the 60mm-diameter image circle projected by the Leitz Thalia line of cinema lenses is overlaid on various common cinema formats:

lenscoverage_anno.jpg

What this illustrates is that as the format in question gets smaller, the angle of view produced by the combination of focal length and format size also gets smaller. It is clear that a Super 16mm-sized imager “sees” significantly less of the image circle than the Alexa 65’s does. Taking our Nikon example above, if you were to attach Nikon’s 50mm DX lens to an FX body, you would see the same angle of view as when you attach your FX 50mm but the image would be “portholed”; extremely heavily vignetted, like a Thalia is on a 15/70 IMAX frame.

So, whether a given lens is wide-angle or telephoto depends entirely on what size format it’s being paired with. Let’s presume the Thalia used in our diagram above has a focal length of 100mm. On a “full frame”** imager a 100mm lens produces a horizontal angle of view of 20.4°, which is fairly telephoto. When paired with the whole image area of the Alexa 65, however, a 100mm lens produces a 30.3° HAOV, which is more of a medium telephoto feel, much like what you would get if you mounted a 65mm lens on a “full frame” body.

That comparison I just made there is what people are getting at when they speak of “crop factor”. Where crop factor is practically useful is when one wishes to match angle of view across different formats. If I were shooting a scene with both an Alexa 65 and another Alexa with a Super-35 sized imager, and I wished to match HAOV on both cameras, it’s useful to know that the lens I use on my Super-35 body should have a focal length 0.46x that of the one I use on my Alexa 65.


*Focal length is the distance from the rear nodal point of the lens to the imaging plane when the lens is focused to infinity. The greater the focal length, the more magnification of the image projected on the image plane.

**I keep putting that in quotes because calling it “full frame” when one’s frame can be much more “full” seems silly, but we don’t have a better name unless we want to say “35mm stills” or “8-perf 35mm”.