w. j. t. mitchell and the image

review of

'Iconology: Images, text, ideology', University of Chicago Press, 1986
'Picture theory, Essays on verbal and visual representation', University of Chicago Press, 1994.


In 1986, W. J. T. Mitchell published his ' Iconology', with a sequel - an 'applied iconology' - in 1994: 'Picture theory'. His program is ambitious: in a postlinguistic, postsemiotic age, 'completely dominated by images' (1994,15-16), W. J. T. Mitchell, just like Gottfried Boehm, wants to herald an 'iconic turn': in the footsteps of (Pierce), Nelson Goodman, Derrida (Grammatology), Foucault, the Frankfurter Schule and Wittgenstein II (1994, 12), he wants to counter the logocentrism of the 'linguistic turn' by emphasizing the role of 'non-linguistic symbol systems' (1994,12).

In his introduction, the author promises to answer two questions: 'What is an image?' and 'What is the difference between images and words' (1986,1). But, soon, he confesses that he is not so much out at providing a 'new or better definition' (1986,9) of the image, as rather to examine the ideologies responsible for the opposition of image and word. Or, to put it with the introduction of 'Picture theory': 'to picture theory' - to analyse the ideology of the image - rather than to construct a 'picture theory' (1994,6). Below, we will concentrate not so much on these ideological analyses, which are often quite illuminating, as rather on the presuppositions on which they are built.

For, although W. J. T. Mitchell contends that he is not out at providing a new definition of the image, that does not prevent him from adhering to a rather peculiar theory: 'The commonplace of modern studies of images is (...) that images must be understood as a kind of language'. 'Images are now regarded as the sort of sign that presents a deceptive appearance of naturalness and transparency, concealing an opaque, distorting, arbitrary mechanism of representation, a process of ideological mystification' (1986,8). To W. J. T. Mitchell, the 'iconic turn' does not amount to a restoration of 'naive mimesis, copy or correspondence theories of representation' (1994,16). That Gombrich - as a reaction to exaggerations like those of Nelson Goodman - has abandoned his earlier conventionalist stance in 1981 (1994,341), is no more than a regression to a 'one-sided naturalist account' (1994,341). W. J. T. Mitchell bluntly talks about 'aesthetics' as of a theory of 'artistic signs' (1986,47) and calls himself a 'conventionalist and nominalist' (like Nelson Goodman) (1986,61).


That W. J. T. Mitchell is not so much out at providing a new definition of the image, does not prevent him either from having a rather precise idea of what may be called an image: 'I have tried not to rule out any widely used sense of the term' (p. 3). He gives an overview of all the 'language games' that can be played with the word 'image': there are 'graphic (pictures, statues, designs), optical (mirrors, projections), perceptual (sense data, "species", appearances), mental images (dreams, memories, ideas, fantasmata), and verbal images (metaphors, descriptions) '(1986, 10). Sometimes, literal images are distinguished from metaphorical ones, but W. J. T. Mitchell opines that both have so much in common, that such a distinction is not to be recommended. (1986, 14). Apart form the question what a mirror image may have in common with something like a world view (an ideology), it is apparent that W. J. T. Mitchell behaves like someone who would like to make a theory on wings that should apply not only to the wings of birds and insects, but also to those of airplanes, triptychs, parties, yes even music. We should also ask the question why such an extended concept of the image is not equally applied to its proclaimed opposite - words, or texts: why not include the world as a book, the open book of the heart, the Word that was in the beginning, and what have you...

W. J. T. Mitchell's readiness to include images in the metaphorical sense into the family of the images, contrasts sharply with his endeavour to hollow out the idea of the 'literal image', an endeavour that he shares with the early Gombrich, Nelson Goodman and Gottfried Boehm. There are two contradictory versions of this attack on the 'literal image'. Let us first concentrate on the first version. According to W. J. T. Mitchell, the 'natural attitude towards images, the naive conception of mimesis' has to be replaced with a 'historical account based in conventionalism' (1994, 335). He scorns 'the questionable assumption that there are certain kinds of images (photographs, mirror images) that provide a direct, unmediated copy of what they represent' (1986, 12). He is talking of the 'fallacy of the copy theory' and of 'atavistic notions' such as 'imitation', 'copy', 'correspondence' theories of truth (1994, 420) He thereby refers to the old argument of Socrates (Plato Kratylos) that images have to differ from the original in order not to be duplicates, so that they cannot be defined by similarity (1986, 92) - an argument that has been taken over by Nelson Goodman, who states that similarity cannot be constitutive of the image because it is symmetric: the Duke of Ellington would equally be an image of his portrait. But W. J. T. Mitchell refers foremost to Nelson Goodman's 'Languages of Art' (1986, 12), where it is stated that images (and ways of rendering like perspective) are not 'natural', but conventional, just like the meaning of words, so that we have to learn how to read an image (1986, 64). The conclusion is that the image is conventional and contaminated by language (1986,42) 'The image is the sign that pretends not to be a sign, masquerading as natural immediacy and presence' (1986, 43)... And, as if that did not suffice, W. J. T. Mitchell adds that there is not even something like a 'neutral, univocal, visible world' that could be rendered in the image: 'vision itself is a product of experience and of acculturation' (1986, 37). Worse still, as with Nelson Goodman: 'There is no world prior to representation' (1994, 355). 'Against this reified nature, we must set the historicity of the body and the senses', whereby Mitchell refers to Riegl's 'Kunstwollen' (1986, 119) and to Crary (1994, 20).

Needless to remind that none of these arguments is valid. Socrates justifiably asserts that the image differs from the original - otherwise it would be impossible to tell the difference. But only sensory reduction is the difference that is constitutive of the image: as already Narcissus learned by bitter experience, one cannot touch a mirror image... But such constitutive difference only highlights that sensory similarity is equally constitutive: otherwise Narcissus would not even have been tempted to kiss his image. It is a mere sophism, hence, to conclude with Socrates and Goodman that the image is not determined by likeness: it is determined by likeness (for a given sense) ànd difference (for the other senses). That is why Nelson Goodman's 'Duke of Wellington' is not an image of his portrait.

Whoever denies that an image is determined by likeness, inevitably has to run into problems when he has to concoct a term for what we see in the mirror: the very archetype of the image can no longer be called an image. Otherwise than Gottfried Boehm, who prefers to talk of degenerate images, W. J. T. Mitchell solves the problem by talking of 'an image which is not just a picture' (1986, 33) ...


Another sophism is the contention that the image would be 'conventional' in the same sense as the words of language. For a good understanding, we have to remind what is meant by 'conventional'. In matters of signs, it is an indirect way of saying that we are dealing with unmotivated signs, where there is no intrinsic relation between the sign and its meaning, so that there has to be a convention determining what the meaning of the sign is. The term 'conventional' is confusing, since there are other conventions - for instance styles with the concomitant conventional choice or approach of subject matter.

Let us examine, then, whether images are conventional departing from an example that Gombrich made popular: the rendering of a face. A face can be rendered from two equally informative perspectives: frontal and in profile. Both perspectives have their drawbacks: we are deprived either of the most informative view of the eyes or of those of the nose. A compromise is the three-quarter-view (or a diptych, like the mug shot). Another compromise is the construction of a new original: a face with lateral eyes, like with a doe:

Let us suppose now that the relation between image of original is of the same kind as the relation between sign and meaning (which is not the case, see 'Semiosis and mimesis'). Does it follow from this analysis that the relation between image and original is just as unmotivated (and hence conventional) as the relation between sign and meaning? That the relation is not 'iconic', but 'symbolic'? (1994, 85) In he three upper images, the original is nearly discernable from the model. With the Egyptian rendering however, the model is replaced with an original, that differs from the model, although the discrepancy is not so big as with centaurs or angels. But the rendering of angels and centaurs, as well as that of the doe-eyed face is 'true to nature' nevertheless:

It is not because an original differs from the model(s) serving as a starting point, that the image would not resemble the original - that its rendering would not be 'true to nature' (see: 'the mimetic triangle'). To be sure, the rendering in profile was an Italian convention, that in three-quarter-view a Flemish, and the doe-eyed an Egyptian. But these are conventions in the choice of the original. In semiotics, 'convention' has to do with the relation between sign and meaning (the relation between image and original). Whether the original differs form the model or not, in either case the relation between image and original is determined by (sensory reduced) likeness. That the word 'angel' is 'conventional', is not because it is a conventional representation in the West, but because an angel could as well have been called a devil, and vice versa. There is no such 'convention' with regard to the relation between an image and its original: an angel cannot be rendered by a devil. In matters of signs, the relation is ideally conventional (unmotivated), and in matters of signs, there is always (sensory reduced) likeness.

The likeness can be more or less suggestive. In the real world as well as in the image, we construct a world departing from sensory givens that function as object or image constituting signs. When we see two eyes lighting up in the dark, when we see a silhouette looming up in the mist, we read these sensory impressions as a spatial object against a background. In the image, a contour as well as a hazy stain can produce the same effect. Such imageconstitting signs differ from objectconstituting signs in the real world in the same way as different fonts differ from each other, while they are read as the same letters nevertheless. In either case, we are dealing with signs motivated in the same way as the sign 'smoke' for 'fire' (metonymy): wherever the same sensory configuration appears, there are objects that discern themselves from their surroundings. A contour as an imageconstituting sign is a motivated sign, hence, and there is no need of an acquired code to read it: already the fact that the contour is used universally points to that conclusion. Conventional in the use of a contour is only the choice of one of the many techniques: a line or one the many other ('homonymous') signs - think of the way in which, from the Renaissance onwards, the outline has been scorned (think of da Vinci's sfumato), only to be reintroduced by painters like Gauguin. An outline is not conventional, however, in the sense that we would have to agree on reading it as a sign for a three-dimensional object and that we could choose another sign at will - for instance a stain for an outline or a line for a surface... A similar approach goes for the many ways in which rounding can be evoked: by a gradient, points, hatching, even cloisonné - but certainly not by the agreement that we would read the colour red as a sign for volume. Not so for W. J. T. Mitchell. As opposed to Gombrich (II), he states: 'contour is just one of the conventional ways of indicating the separation of things from backgrounds (1986,86) - and there is no doubt that he is speaking of a convention about the reading of an unmotivated sign. And that is fundamentally wrong: imageconstituting signs are not unmotivated and hence conventional, but motivated and hence 'natural'.


At the other pole - that of the metaphorical images - W. J. T. Mitchell justifiably criticises the theory of the 'mental image' as the referent of words (1986, 14 ff, especially p.30). He refers to the endeavour of Wittgenstein II 'to expel the idea of mental imagery' (1986, 15), but also to Burke, who contends that only in one of twenty cases an image is formed in the mind. It must be stated with emphasis indeed that thinking does not proceed through manipulating images, but rather signs. In his fervour, Burke goes so far as to deny that poetry would conjure up images (1986, 124): words are only imitations in drama (1986, 138). He thereby obscures the fact that, in literature, words do really conjure up images in the mind - images in the literal, not in the metaphorical sense of the word. That literature conjures up images (for all the senses, also the interoceptive senses) is testified by the fact that we are mostly disappointed when we see the screen version of a novel. How much also W. J. T. Mitchell is blind to the existence of imageconjuring signs in novels and poetry, appears not only from the fact that he does not criticise Burke, but also from the fact that he states that pornographic literature is far more suggestive than images (1986,86) - as if erotic literature would consist of words and not of images...

After having reduced the image to a conventional sign, W. J. T. Mitchell now also denies the existence of imageconjuring sigs, and hence the 'pictorial' character of literature. Novels and poetry are thus equated with discursive texts, on the sole ground that they both make use of words. In the same breath, painting and literature are subsumed under the sign - as 'representations' on the same footing as scientific or philosophical texts: he is bluntly speaking of the domain of painting and poetry as the domain of 'reference, representation, denotation and meaning' (1986, 47).

The subsumption of the image under the sign is completed in that the same Mitchell, who overlooks the 'pictorial' nature of literature, cannot stop to stress the 'pictorital' nature of ideologies in the metaphorical sense.


In extending the concept of the image to metaphorical images; in denying that the image is not determined by likeness, and in contending that the relation between image and original is determined by convention; in misunderstanding the inherent 'pictorial' nature of literature and metaphorically welcoming the picturesque character of ideology, W. J. T. Mitchell succeeds in making the subsumption of the image under the sign acceptable. Henceforward, the image is merely a special kind of sign - although it remains unclear how it can be discerned from other signs (see below).

The first consequence of this subsumption is that there is no longer a difference between art and science. The second a reduction of the opposition between image and sign into the opposition between image and text, painting and literature. Before analysing these two consequences, we first have to make an intermezzo.


To endorse his view that the image is not determined by likeness, W. J. T. Mitchell not only invokes Socrates, but the Bible as well. Did God not create man after his likeness? Such likeness cannot be meant to be 'literal', but rather as a kind of 'spiritual likeness' - 'an image which is not just a picture' (1986,33). According to W. J. T. Mitchell 'spiritual likeness' must be understood as a list of predicates, so that 'likeness is linguistic in its inner workings' (1986,43), a new version of the idea that the image is a sign.

We will soon get to learn other versions, but concentrate now on the further course of history according to W. J. T. Mitchell. From biblical times, we rush at full speed over Maimonides and Milton to the Renaissance, when, with the invention of perspective, the foundations were laid to the literal concept of the image: the 'tyranny of the picture' (1986,37), the claim that the image would be 'a natural representation' of 'the way things look' (1986,37). In the nineteenth century, this approach is enforced by the invention of photography, which was presented as 'a natural mode of representation' (1986,37). According to W. J. T. Mitchell, this is completely mistaken: apart from the fact that perspectival rendering is a convention (Panovsky, Gombrich I), there is no such thing as a 'neutral, univocal, visible world', as we have seen.

This historical reconstruction is rather biased. Long before iconophobic priests wrote down the story of the creation, 'realistic' images were produced - just think of Aaron's golden calf, not to mention the cave paintings of Palaeolithic hunters; and long before Plato descried the shadows of ideas in his cave, Narcissus fell in love with his mirror image. From this broader perspective, the camera and geometrical perspective are only more advanced versions of the more archaic techniques of the mirror and the mould. W. J. T. Mitchell can only lend a shadow of probability to his historic relativism by not departing from this universal starting point in the development of the image, but from one of the many reactions against its magic.


Like all those who want to subsume the image under the sign, W. J. T. Mitchell comes to face the task of formulating the difference between images and other signs in terms of a new opposition - and therewith to indicate on what grounds such a distinction can be made. There are several versions.

One of W. J. T. Mitchell's favourites figures in the subtitle of 'Picture Theory': the opposition between 'verbal and visual representation' (or between 'Image and word' as in 'Iconology, 1986, 42ff). The terminology is remarkable: the terms 'visual' and 'verbal' belong to different logical domains. The term 'verbal' does not belong in the series 'visual, audible, tactile...'. The verbal belongs to the audible only in so far as words are audible signs. But there are more 'aural representations' than verbal ones. Why does W. J. T. Mitchell restrict himself to the word? Conversely, the term 'verbal' can only be opposed to the domain of the 'non-verbal' - and that cannot be defined in sensory terms, let alone be restricted to the visual. It is not difficult to see, then, that the opposition visual/verbal is an avatar of the distinction between 'natural' image and 'conventional' sign. The formula 'visual and verbal representation' is a compromise between two familiar - but inadequate - attempts to grasp the difference between image and sign. (Needless to remind that printed letters are visual, but therefore not yet images...).
Another avatar figures in the subtitle of 'Iconology': the opposition between image and text - probably a paraphrase on Goodman's 'pictures and paragraphs', but also and foremost on Foucault's 'le visible et le dicible', and its countless echos, among others in Rosalind Krauss' 'arts of vision' as opposed to 'arts of language' (Grids, 1985). The opposition appears also in the lofty guise of 'icon and logos' (1994,24). It is not difficult to see how this version came about. Since the image has become a sign, the place of the sign in the familiar opposition of 'image versus sign' becomes vacant, and is taken by the sign par excellence: the word (or in another variant: the text). There is no longer talk of an opposition between eye and ear: a text is printed on paper, and thus visible like a (visual) image. But such equation makes all the more urgent the question what may well be the difference between image and text? W. J. T. Mitchell does not commit himself - he bluntly states that he will use the term' text' 'a foil to imagery, a "significant other" or rival mode of representation' (1986, 3). From his comments on Goodman, we can surmise that he has little problems with his approach. Why, then, not just replace the messy terminology of 'image and text' with Goodman's 'dense' as opposed to 'differentiated' sign systems (1994, 345)? All the same, the term 'text' clearly indicates that to W. J. T. Mitchell - who, as we have seen, does not discern ordinary texts from texts that conjure up images - there is no difference between literary texts and philosophical or scientific discourse.

The equation of literary and philosophical texts under the denominator 'text' paves the way for the resurrection of the opposition of image and sign under a third form: as the opposition of painting and literature (see part II of Iconology, 1986, 47). The remarkable thing about this third version is that the move with which the image has been subsumed under the image is here undone by an opposite move: the subsumption of the sign under the image - of science or philosophy under the literature. For, however much W. J. T. Mitchell may contend that the domain of painting and poetry is that of 'reference, representation, denotation and meaning' (1986, 47), in reality, painting and literature are two different ways of making image - by making them directly perceptible to the eye (unmediated mimesis, painting), or by conjuring them up in the mind (mediated mimesis, literature). Both ways of making images - both forms of art - are the counterpart of science and philosophy, which refer rather than presentify. Let us compare both approaches:

In our approach, (four kinds of) signs have to be distinguished from (two kinds of) images:

image (art)

sign (science, philosophy)

unmediated mimesis

plastic arts
musice, auditory mimesis
theatre, ballet

mediated mimesis:


visualisation, symbol,
 imagesign, discourse

see kinds of signs

With W. J. T. Mitchell, scientific or philosophical discourse and literature are subsumed under the term 'text', and distinguished from painting, which is equally a sign (text):




Whereas, in our approach, there is a clear criterion to discern painting from literature, with W. J. T. Mitchell you will search in vain to find one - no wonder, since to him the image is 'linguistic in its inner workings'.

No clear distinctions with W. J. T. Mitchell, hence, as with Nelson Goodman, but a messy formation that is a condensation of the opposition between eye and ear (painting and literature as spoken image conjuring words) and the opposition between 'natural' image' and 'conventional' sign (image versus sign), whereby the text serves as a shackle between sign and literature. This condensation betrays the repression of the old opposition between image and sign.

Anyway, with the reformulation of the opposition between image and sign as the opposition between painting and literature, the opposition between art and science/philosophy is replaced with the battle between the arts - da Vinci's paragone.


Once the opposition between image and sign reformulated as that between painting and literature, W. J. T. Mitchell analyses the ideological background of the 'battle between the arts'. Although he thereby does not analyse the opposition between image and sign, we will nevertheless have a look at his approach of the two paradigms used in this battle, to show how much W. J. T. Mitchell falls victim to the subsumption of the image under the sign. Let us first illustrate this on the basis of the paradigm of the difference between the arts of space and the arts of time.

Ever since Lessing (Laokoon, 1776) it is commonly assumed that 'poetry is an art of time, motion, and action; painting an art of space, stasis, and arrested action" (1986, 48), on the basis of the fact that poetry resorts to 'articulate sound in time' (1986, 95) which entertains a 'convenient relation' with the course of actions. Painting is better suited to depict bodies and poetry to depict actions. Painting can render actions only indirectly through bodies, and poetry renders bodies through actions (1986, 101).

W. J. T. Mitchell's approach of this contentions is instructive. To begin with, he states that only oral literature is linear, whereas 'a literary work is spatial, in so far as it is written' (1986, 99). More fundamentally, he contends that bodies cannot be rendered directly, but only indirectly through signs like 'colours and shapes' (1986, 102), so that there is only a gradual difference with the rendering of movement through bodies that are signs for movement. For literature, we could construct a parallel approach as follows: words are signs for actions, and actions are signs for bodies. To W. J. T. Mitchell, painting and literature are equally 'indirect', hence (1986, 102). The opposition is only apparent: in both cases, we are dealing with spatio-temporal constructions, of which the terms spatial and temporal arts are merely synecdoches (1986, 103).

In reality, Lessing as well as W. J. T. Mitchell overlook the fact that there is a difference between images that are directly perceptible and images that are conjured up by words (unmediated and mediated mimesis). That is why it escapes their attention that there is a difference between the spatiotemporal properties of the lecture of imageconjurng signs (in spoken as well as in written texts) and the spatiotemporal properties of the images (the world) that is conjured up. In Bertolt Brecht's poem 'Ein Haus unter Bäumen am See./Vom Dach Steigt Rauch/..;' the sequence of the words is linear, but the image that is conjured up is immobile ('spatial'). No doubt, there are also images of processes, and images can be concatenated at that, but in both cases the linear structure or the verses does not correspond to the spatiotemporal structure of the world that is conjured up: each sentence conjures up a single, discrete state of affairs or an event. In both cases, the spatiotemporal characteristics of the conjured up world - discrete stills or continuous processes - only unfolds on the level of images. On the level of the images, there is no longer any difference between literature and painting. Not otherwise than images conjured up by imageconjuring words, paintings can be added up into friezes and polyptychs (from the Calvary to the comic strip). It is remarkable, however, that Michell, in the vein of Lessing, continues to think of the image as of the single, non-moving painting. From way back, dance and theatre are providing moving images, were there is no talk whatsoever of something like the discrepancy between the discrete words and the continuity of the movement that is conjured up: the continuity is unfolding there before our eyes. It turns out, then, that - apart from music - thé paradigm of the temporal arts are dance, theatre and film, whereas literature, not otherwise than the comic strip, is always seriously handicapped with the discrete nature of verbal language. That does not prevent many artists, especially painters, but also poets and musicians, from developing a predilection for the single, non-moving image because of the numerous advantages of suspending time (as already testified by Keats' 'Ode to a Grecian urn').

It is obvious, then, that he concept of painting and literature as spatiotemporal constructions with a different weight of the temporal or the spatial dimensions, only obfuscates a totally different state of affairs.


The opposition between painting and literature - image and text - is often understood as an opposition between eye and ear. In order to avoid confusion, we have to make a clear distinction here between the senses with which painting and literature are perceived, and the sensory domains covered by the perceived or conjured up images. It is clear, then, that painting and written literature are perceived by the eye, music and oral literature by the ear, and theatre, film and opera by both. But that does not apply to the worlds that are depicted: there is no difference in principle between the image conjured up in the mind by 'Ein Haus unter Bäumen am See./Vom Dach sleight Rauch' and a painting with the same subject. Moreover, the subject in question is only apparently visual: the visual impressions are only imageconstitutive signs that constitute the appearance of worlds that transcend the domain of the visual altogether. The same goes for music; Isoldes Liebestod is not about sound, but about love. It should be noted, however, that words - whether heard of read - can conjure up images for all the senses (the interoceptive ones included), whereas paintings can only show visual appearances and music aural ones. But the world that is disclosed by all these appearances is in essence the same: whereas painting and music have only one entrance gate, literature has many.

W. J. T. Mitchell, for his part, comments that literature is not an 'aural art', because it can be read, and because also the deaf can learn to read and speak (1986, 117). But he goes further and contends that the blind can know more about a painting than someone whose eye is not schooled (1986, 118). 'The innocent eye is a metaphor for a highly experienced and cultivated sort of vision' (1986, 118) 'The capacity for a purely physical vision that is supposed to be forever inaccessible to the blind, turns out to be itself a kind of blindness' (1986, 118). No doubt, there is something like the schooling of an eye, but that does not mean that what a schooled eye sees would be accessible through other channels. A blind can be told what there is to be seen on the Mona Lisa, but not wherein it differs from a bad copy.


Above, we already described a first form of devaluing the 'literal image': the contention that the relation with the original is merely conventional. A second way of reducing the image to a sign, is to state that, when there would only be likeness, the image could only render the visible, not the invisible. With the invisible, Mitchell means 'abstract ideas' conveyed 'by means of allegory, states of mind and dramatic actions by 'the language of facial expression and gesture' (1986, 41) and 'the language of human body as a vehicle of narrative' (1994, 26), but also the expression of 'the setting, the compositional arrangement, and the colour scheme' (1986, 41) - the expression of 'abstract' forms and colours. The image can only render such 'invisible' through signs, which testifies to ''the necessarily verbal character of imaging the invisible' (1986, 42).

We want to make two remarks here.

To begin with, the above applies only to what W. J. T. Mitchell calls 'expressive' images, which implies that there would be 'natural images' - images based on likeness: 'There is the natural image which looks like or "captures" what it represents, and its pictorial rival, the 'artificial, expressive image which cannot "look like" what it represents, because that thing can only be conveyed in words' (1986, 43-44). The image, that was first supposed to be based on convention rather than likeness, turns out to be a likeness, albeit only of the visible world. That raises the question whether it is only the 'expressive' image that is 'linguistic' 'in its inner workings'? Is there something like a visible reality that is not a 'word', not a sign for the invisible? And if so, what is that reality? To what kind of images is W. J. T. Mitchell referring with 'the natural image which looks like or "captures" what it represents' (1986, 43-44): are there still other images than allegorical, figurative or 'abstract' images - Greenberg's 'nothing but colours' (1994, 227)? W. J. T. Mitchell cannot mean 'non-literary' abstract art, since this is also 'verbal' in his view (see 'Red square and black square' of Malevich, 1994, 224) ). Nevertheless, he states 'How can we deny... the merely figurative status of textuality in a painting purely composed of shapes colours, without legible, arbitrary signs' (1994, 95). Elsewhere, he seems to think of 'non-literary forms: landscape painting, still life and genre' (1994, 27).

And that makes us ask the deeper question whether there is such a thing as a partition of the world in a visible and an invisible realm, partition which would coincide with the opposition between word and image. Such partition is rather naive, because it presupposes that there are visible things, whereas already such a simple object like a cube is not visible: we only see an appearance of it. That we have the impression of seeing a cube - with a front and a rear, a surface and a content, is due to the fact that we read the optical impressions that emanate from the cube as an appearance, as a sign for the object 'cube'. That applies not only to the cube, but also to Brecht's' house under the trees and da Vinci's Mona Lisa: we read the optic impressions that emanate from these subjects as the appearance, not only of objects or bodies, but above all of an inner world and a history. It is evident, then, that the visible and the invisible are entwined in perception as such (read the paragraph on 'objectconstituting signs' in our text on 'mimesis and semiosis'). When the image provides the same configuration of optical impressions as the original Duke of Ellington we equally see a body in flesh and blood with a soul. The visible, hence is not opposed to the invisible: the world as such is invisible (here and now imperceptible), it is only accessible through countless perceptions that are read as signs. That we read the appearance of the Mona Lisa as a sign, in the real world as well as on a canvas, does not mean that the Mona Lisa herself would be a sign.

That does not imply that we can not read (images of) objects as signs: W. J. T. Mitchell's 'allegoric' eagle may function as a sign for the abstract idea 'courage'. But there is a difference between the eagle that functions as a sign for courage, and the smile that functions as a sign for an inner state, the absence of smoke for an inhabited house, or shadow for rounding. It is only when a configuration of optical impressions has constituted the (image of an) eagle, that the eagle can serve as a symbol. It thereby can signify not only 'abstract' idea's, like courage, but also the visible appearance of an object - as when the image of an eagle is pointing to the cage of the eagles in a zoo. The same holds for Blake's images 'riddled with ideas, making them a visible language - that is, a kind of writing' (1994, 147).

How less the presumed partition between visible and invisible has to do with the opposition between image and word, becomes apparent as soon as we realise that literature uses words only to conjure up perceptible - and by implication: mostly visible - images. Where words do not conjure up images, we are dealing with mere referring discourse.

Of all people the man who advocates a sophisticated approach of reality when he contends that there is no given reality to imitate and that we have to learn to see, turns out to fall prey to a rather naive conception of perception.


After having replaced the opposition between image and sign by that between image and text, or painting and literature, W. J. T. Mitchell states that there is no such a thing as a pure image or a pure text: there are only 'textual pictures' and 'pictorial texts'. Let us begin with the latter.

W. J. T. Mitchell discerns three kinds of entwining of image and text. (1) We already know that he has a 'linguistic' approach of the image: it is not based on likeness, but 'conventional' (1986, 42), 'likeness is linguistic in its inner workings' (1986, 43), and the image is 'expressive' (1986, 43-44), so that 'imagining the invisible' is 'necessarily verbal' (1986,42). (2) the image, which is already 'linguistic' in essence, cannot do without text. That already holds for 'a picture that depicts an episode from a verbal narrative' (history painting). But it also holds for abstract painting: 'With abstract painting, the appropriate texts may well be (...) criticism, philosophy, metaphysics (1994, 98). Or: "Theory" is the "word" that stands in the same relation to abstract art, that traditional literary forms had to representational painting' (1994, 220). He refers to the diagram of the development of modern art by Alfred H. Barr, which he describes as 'the epic of abstract painting' (1994, 234), and also to the 'narratives of postmodernism' (1994, 241) (3) The dependence on the word appears also from the use of titles.

Let us remark ad (1) that not only images (painting and literature), but also texts cannot do without (con)text. Such dependence on a (con)text is, moreover, not a privilege of image and text alone: practically everything is only understandable in the appropriate (con)text. It is trivial, hence, to state that the image depends on a 'text', whether we are talking about the knowledge of the Bible, which was quasi universal in de Western world up to recently, or about the acquaintance with the 'epic of art': art has always been dependent on 'discourse or theory' (1994, 220), so that it does not make sense to describe only abstract art as 'the elaborate language game that is abstract art' (1994, 235)

Things become more complicated when artworks are isolated form their usual context (art from the past, from other cultures or destined for esoteric communities). In that case, we need explicit information about the context. It makes sense, hence, to state that 'Biblical narratives are not visible to spectators who do not already know the stories from other sources' (1994, 223). But that does not mean that the text is part of the image: as soon as the required information is provided, the image functions on itself, whereas in a cartoon or a comic strip, the text continues to be an integral part of the image. That holds equally for the painting of Beatrice Cenci by Guido Reni, of which Mark Twain contends that it can have totally different meaning than those suggested by the title. It is not correct hence to conclude that 'painting is a confluence of pictorial and verbal traditions' (1986, 42). Many a title belongs to this kind of texts 'hors image'. For the time being, a portrait of Hitler can do without a title. Things are totally different when the artist resorts to an idiolect when addressing a broader public: think of the 'Mariée mise à nu par ses céilibataires même', that is only understandable after the lecture of a considerable corpus of texts, which are an integral part of the work indeed. That is also the case when we have to resort to a text to know what there is to be seen on the image altogether, as when Luc Tuymans paints a white stain, that turns out to be a stain form an empty slide-projector; or when the artwork is an example to a theory, as with Robert Morris' 'Slab' as analysed by Mitchell himself (1986, 241ff). Only in the latter cases do the following generalisations hold: ''There is a necessary connection between the meaning of abstract painting and the theoretical discourse around it' (1994, 221). Theory is 'a constitutive pre-text' to this work (1994, 222). Or to quote Tom Wolfe via Mitchell: 'Modern art has become completely literary: the paintings exist only to illustrate the text' (1994, 220) and 'Without a theory going with it, I cannot see the painting'( (1994, 220)

When we make abstraction of the general cultural background that is needed to understand an image, it appears that there are two kinds of images. On the one hand, there are self-sufficient images: think of a history painting, a landscape, a still life, abstract paintings like those of Mitchell's Malevich (1994, 224) or 'absolute' music. On the other hand there are images that go hand in hand with explicit or implicit texts. With the latter, we have to distinguish between composite images ('comic strip, dialogue in theatre and film, song and opera), and the countless combinations of images as non-verbal signs with verbal signs: think of maps, Manzoni's 'Merda d'artista', Magritte's 'Ceci n''est pas une pipe', and what have you. For a detailed overview, we refer to 'Entwining of mimesis and semiosis').

It is only because W. J. T. Mitchell is deceived by superficial similarities concealing fundamental differences, that he can conclude that there is no such thing like a purely pictorial image. ''This sort of purity (...) is both impossible and utopian' (1994, 96).


Mitchell provides far less material for the 'intrinsic pictorial character' of the 'text'. But the examples provided speak volumes. He reckons the visual appearance of letters and the lay-out to the images: apparently, there is no difference between the 'materiality of writing typography" on the one hand, and 'ekphrasis' and 'description in narrative' on the other hand (1994, 107). Elsewhere, he completes the series of 'visual representations' with elements like the metaphor: 'the fabric of description, narrative "vision", represented objects and places, metaphor, formal arrangements and distinctions of textual functions, even in typography, paper, binding' (1994, 99). Needless to remind that printed letters - Blake's calligraphy freighted with 'symbolic values (1994, 147) included - and lay-out are visual indeed, but no images. Also metaphors are not images, but means of predicating (see: 'Mimesis and semiosis'). Only 'the fabric of description, narrative, represented objects and places' belongs to the realm of images. But literature is not the counterpart of the image, but a special kind of image: one that is not directly perceptible, but has to be conjured up by words in the mind. Literature is not a mixture of 'text' and image, hence, but nothing but image. That does not mean that there is something like a 'pure text': just think of scientific of philosophic discourse - pure referring words, not words that conjure up images - no literature, hence.

That does not prevent W. J. T. Mitchell from contending that the mere fact that literature is printed suffices to refer the idea of a 'pure text' to the dustbin: 'the medium of writing deconstructs the possibility of a pure text or a pure image ook (1994, 95).'Writing, in its physical, graphic form, is an inseparable suturing of the visual and the verbal, the “imagetext” incarnate' (1994, 95).


According to W. J. T. Mitchell, the 'textual picture' demonstrates that a text cannot do without words, whereas 'pictorial texts' demonstrate that the image cannot do without words. 'The dialectic of word and image seems to be a constant in the fabric of signs that a culture weaves around itself' (1986, 43). We have seen how W. J. T. Mitchell has a predilection for the trope of the 'synthesis of thesis and antitheses', as when spatial and temporal arts are declared to be synecdoches of an in essence spatiotemporal construct. In the same vein, he also synthesises the opposition between image and text in the concept of the 'imagetext'. To him, there is no difference in principle between literature and painting, there are only different accents according to cultures and eras (1986, 49): 'The interaction of pictures and texts is constitutive of representation as such: all media are mixed media, and all representations are heterogeneous; there are no purely visual or verbal arts' (1994, 5). 'What varies is the precise nature of the weave, the relation of warp and woof' (1986, 43). Phrased somewhat differently: 'All arts are 'composite' arts; all media are mixed media (1994, 95). The purist trend that leads to the crude opposition of image and text is an 'attack against the confusion of the arts' (1994, 216). 'The attempt to grasp the unitary, homogeneous essences of painting, photography, sculpture, poetry etc., is the real aberration' (1994, 107)

Needless to remind that there can be no talk of dialectics when the image is first reduced to text 'in its inner workings'. There is neither talk of dialectic when the opposition between image and word is understood in terms of painting and literature: in both cases, we are dealing with images. There is only talk of opposites when the whole array of signs is contrasted with the image. But that opposition is not a dialectical one: in the real world, image and sign are entwined in often complex formations. Next to the difference between signs and images, there are also diverse kinds of signs (according to degree of motivation) and diverse kinds of images (from 'saturated' to 'schematised'') which are all too easily read as continua from image to sign. Finally, there are also combinations of images of diverse senses ('integration of the arts', like with Blake, the comic strip, film and theatre), which are equally often misunderstood in terms of the opposition of image and sign.


W. J. T. Mitchell has surely to tell us very interesting things about the ideology of the image, if not about artworks as such. But as for the answer to the questions ' 'What is an image?' and 'What is the difference between images and words' (1986,1) - or more accurately, as we meanwhile understand: 'What is the difference between images and signs?' - we had better rely on a somewhat more sophisticated version of the supposedly ''naive mimesis, copy or correspondence theories of representation'. It would lead us too far here, to analyse - in the best tradition of W. J. T. Mitchell himself - the rather manifest reasons why W. J. T. Mitchell does rather resign or resort to anti-mimetic prophets like Nelson Goodman.

© Stefan Beyst, July 2010 (translated July 2010).

* Translated in French by Maxime Boidy et Stéphane Roth: 'Iconologie : image, textes, idéologie', Les Prairies ordinaires

 facebookshare facebookvolg    twitter

image of the week: