Dissertation Notes

Proposal (December 2018)

Provisional Title: The pho­to­graphic im­age in the age of the neural net­work

Topic (approx. 100 words)

How does our un­der­stand­ing of the pho­to­graphic im­age change when most pho­tographs are taken by ma­chines for ma­chines? What hap­pens to the pho­to­graph when it be­comes an in­stance (of mil­lions) in a ma­chine-learn­ing dataset, and are these datasets a mod­ern ex­ten­sion of the pho­to­graphic archive? How do we re­spond to im­ages pro­duced by ma­chines (ie. op­er­a­tional im­ages and, more re­cently, deep fakes)?


There’s a rich body of lit­er­a­ture on our re­la­tion­ship with the dig­i­tal pho­to­graph (Steyerl, Farocki, Paglen) and the archive (Foucault, Sekula, Derrida, and more re­cent writ­ers on in­stances of dig­i­tal archives like the Enron Corpus, Facebook pro­files, and var­i­ous archive-re­lated ef­forts by Google).

I’m hop­ing to base this cul­tural analy­sis on read­ings of pa­pers from the field of com­puter sci­ence, both his­tor­i­cal (such as Rosenblatt 1958, which in­tro­duces the con­cept of the neural net­work) and con­tem­po­rary (such as Taigman, Yang, Ranzato and Wolf (2014), which de­scribes the first im­age-clas­si­fi­ca­tion model with hu­man-level per­for­mance (developed by Facebook) and Nguyen, Yosinski and Clune 2015, which de­tails how such mod­els can be fooled us­ing ar­ti­fi­cial im­agery).

600-800 Word Text

If you can imag­ine it, there is prob­a­bly a dataset of it. The Machine Learning Repository”, which is main­tained by the University of California, lists 426 datasets at the time of this writ­ing, each con­sist­ing of be­tween hun­dreds and tens of mil­lions of in­stances. A set of anonymised records from the 1990 U.S. cen­sus (24 mil­lion in­stances) sits next to one con­sist­ing of 150 hours of Indian TV news broad­casts (12 mil­lion in­stances). The 371 choral works of J.S. Bach (in ma­chine-read­able form) can be found next to cases of breast can­cer in Wisconsin (699 of them), for­est fires (571) just be­low Facebook com­ments (40,949) (University of California, Irvine). If we nar­row the search to datasets of im­ages, we still get count­less re­sults. There is the Stanford Dogs dataset (20,580, 110 breeds), the German Traffic Sign Detection Benchmark Dataset (900), and dozens of datasets of hu­man faces. Arranged in chrono­log­i­cal or­der, the face datasets tell us about the shift­ing eco­nomic cir­cum­stances of data­base pro­duc­tion. The ear­li­est face datasets are cre­ated by re­search groups di­rectly en­gaged in fa­cial recog­ni­tion re­search, and pre­dom­i­nantly fea­ture who­ever was walk­ing around the lab­o­ra­tory at the time. The Yale Face Database (1997, 165 in­stances) and the Carnegie Mellon Face Images Dataset (1999, 640 in­stances) are ex­am­ples of this. In the early 2000s, we start to see tar­geted ef­forts to gen­er­ate face data­bases, now de­tached from re­searchers work­ing on the al­go­rithms them­selves. The FERET data­base (2003, 11,338), which was funded by the U.S Defence Department is per­haps the most strik­ing ex­am­ple of this. Though the num­ber of in­stances has jumped by two or­ders of mag­ni­tude, the fun­da­men­tal mode of pro­duc­tion method has­n’t changed from first phase of datasets: A pro­fes­sional pho­tog­ra­pher (hired for this pur­pose) is record­ing paid vol­un­teers, for the sole pur­pose of cre­at­ing ma­te­r­ial for the data­base. This re­la­tion­ship starts to change by 2010, when im­age data­bases are in­creas­ingly sourced from pub­lic sources on the in­ter­net us­ing au­to­mated crawlers. This shift from orig­i­nal pro­duc­tion to au­to­mated ex­trac­tion of im­ages al­lows the num­ber of in­stances to in­crease by or­ders of mag­ni­tude again: FaceScrub (2014, 107,818) was com­piled us­ing Google’s im­age search, IMDB-WIKI (2015, 523,051) and the Youtube Face Database (2012, ~600,000) bear their min­ing-grounds in their names.

The largest face dataset whose ex­is­tence has been pub­licly ac­knowl­edged (at 4,000,000 in­stances) is­n’t even listed: it’s Facebook’s pro­pri­etary face dataset, which is not pub­licly avail­able. By con­trol­ling the rich­est dataset, Facebook by ex­ten­sion con­trols the world’s most pow­er­ful fa­cial recog­ni­tion al­go­rithm (Taigman et al., 2014). How do we deal with the emer­gence of these vast datasets in cul­tural terms? It seems nat­ural to place the data­base in the tra­di­tion of the pho­to­graphic archive. But are they re­ally the same? Sekula (1986) de­scribes how the pho­to­graphic archive of the 19th cen­tury serves to define, reg­u­late” and thus to con­trol so­cial de­viance. The dataset cer­tainly serves that func­tion - one need­n’t look very hard to find count­less ex­am­ples of po­lice, gov­ern­ments and cor­po­ra­tions us­ing au­to­mated im­age-mak­ing to sort peo­ple along scales of likely so­cial com­pli­ance (Paglen, 2016) (Sekula, 1986). However, in some ways the data­base seems fun­da­men­tally dif­fer­ent from the archive. First, the archive usu­ally comes with an in­dex (or cat­a­logue) to help who­ever is ac­cess­ing the archive find any par­tic­u­lar record (Berthod 2017).The dataset is es­sen­tially a flat list with no means of nav­i­ga­tion other than sort­ing by file­names (which are of­ten mean­ing­less). This leads to the larger ob­ser­va­tion that in the data­base, the in­di­vid­ual record is es­sen­tially mean­ing­less. Only the ac­cu­mu­la­tion of thou­sands, or mil­lions of sim­i­lar records make it use­ful - as Halevy et al (2009) show, the ac­cu­racy of an al­go­rithm is di­rectly linked to the quan­tity (not com­plete­ness, or even ac­cu­racy) of the train­ing data (Steyerl, 2016). Secondly:

While both the archive and the dataset ex­ert power, they do so in dif­fer­ent ways. The archive con­trols pri­mar­ily who­ever is recorded in the archive (or con­versely, who­ever is left out). The dataset has no spa­tial or tem­po­ral lim­i­ta­tions - a dataset of por­traits col­lected in the Midwest in the 1990s might be used by a po­lice com­puter on the other side of the globe, 30 years later. This is per­haps be­cause the dataset, un­like the archive, is ul­ti­mately a means to an end: A raw, un­re­fined ma­te­r­ial from which al­go­rithms might be forged. In this con­text the agri­cul­tural lan­guage sur­round­ing the cre­ation of archives and data­bases (as ob­served by Steyerl), seems to un­der­line this point: The archive is cu­rated, recorded, built-up, ac­cu­mu­lated. Data is mined, har­vested and crawled be­fore truck­loads of it are com­pressed, dis­trib­uted and fed to the al­go­rithm.


Non-print sources

Artists deal­ing with the op­er­a­tional im­age, data­bases etc:

Various Image-Databases such as:

Tutorial Notes January 15, 2019

There is also the ob­ser­va­tion that think­ing about how ma­chines see the world forces us to ques­tion how we our­selves do it.

Chapters could be:

GAN-generated images GAN / Youtube Faces Dataset

The main method­olig­i­cal idea re­mains to do this not just through sources from the hu­man­i­ties, but also un­der­stand the math­e­mat­ics, physics, eco­nom­ics and lo­gis­tics of ma­chine vi­sion.

January 16, 2018

Possible chap­ters / an­gles:

Data col­lec­tion (how im­ages are taken)

(Through sen­sors etc, dri­ven by econ­omy, some im­ages start life as rep­re­sen­ta­tions and be­come data) This would be where the re­flec­tion on the data­base vs the archive goes. Also a good place to do fig­ures that up­date dy­nam­i­cally.

Mathematics (Network ar­chi­tec­ture and his­tory thereof)

Assumptions can be hard-coded into ar­chi­tec­ture. This sec­tion should prob­a­bly in­clude an ex­pla­na­tion of how com­mon net­work ar­chi­tec­tures work, also a good place for live demon­stra­tions. Maybe a text-to-im­age model? Or a pix2pix trained on line draw­ings.

Infrastructure (Cables, Buildings, Chips)

Google (and also Facebook) are build­ing these ded­i­cated pro­cess­ing units: TPU Third gen­er­a­tion Cloud TPU Google

Wired on the TPU

Labour (generating train­ing data, re­verse Turing test): Who looks at the im­ages

There is this nar­ra­tive that ma­chine learn­ing mod­els spring from the minds of ge­nius pro­gram­mers. See for in­stance this Fastcompany ar­ti­cle: It sug­gests a Google in­tern made this amaz­ing model, when re­ally the guy has a PhD and used thou­sands of pounds worth of com­put­ing power. And more broadly, the datasets, hard­ware, in­fra­struc­ture etc. are propped up by much lower-skilled lablour.

A lot of pa­pers use Amazon Mechanical Turk to val­i­date re­sults or gen­er­ate train­ing sets:

Machines gen­er­at­ing im­ages

Maybe talk about a spe­cific model: Pix2pix would seem to be a good can­di­date.

Group tu­to­r­ial notes

Keywords (6-10)

  1. Archive
  2. Database
  3. Computer Vision
  4. Machine Learning
  5. Reverse Turing Test
  6. Digital Photography
  7. Operational Images
  8. Digital Economy


a cur­rent map’ of the dis­ser­ta­tion (including key themes / writ­ers / artists / ex­am­ples)

I think it’s prob­a­bly nec­es­sary to talk about real-world ex­am­ples of com­puter vi­sion hav­ing so­cial con­se­quences, but the ul­ti­mate goal is to get to a more fun­da­men­tal ques­tion: How do we have to look at pho­tog­ra­phy in the age of the data­base? Also, the age of the data­base has been go­ing on for far longer than neural net­works have been in the pub­lic con­scious­ness.



Key Texts

New writ­ing

a short piece of writ­ing (approx. half a page / one page) that you are happy to share with the group. The writ­ing does­n’t need to be a sum­mary of your think­ing, it can be a new piece of writ­ing that is re­spond­ing to one of your key ideas… and it can be quite rough! It might be help­ful to bring some text that you would like feed­back on - whether it is the style of writ­ing, the ideas con­tained within… or a com­bi­na­tion of both.