Distant Viewing


Zeroth: Introduction
First: Extract Images
Second: Face Detection
Third: Shot Detection
Fourth: Training Set
Fifth: Changing Scope
Sixth: Frame Level Annotations

Bewitched, S03E11
Bewitched, S05E13
Bewitched, S05E18
Laverne & Shirley, S01E11


Moving images have served as a dominant form of cultural expression in the United States since the beginning of the twentieth century. Millions flocked each week to the theaters after World War I and then tuned into television after World War II. Today, Americans spend over 1.5 hours each day, on average, streaming digital media over the internet (Nielsen 2016). Extensive scholarship in media studies has established how formal elements of moving images — such as camera angles, sound, and the construction of narrative arcs — reflect, establish, and challenge cultural norms (Mulvey 1975, Braudy 2002). In other words, moving images offer a lens into a community's ideals and values. The study of culture in the twentieth century requires, therefore, considering media as a serious source of historical evidence. However, with currently available tools, the formal analysis of moving images has been restricted to a small set of works capable of being studied by close analysis. With a rapidly growing set of large archives containing digitized time-based media, computational approaches are now needed to take full advantage of the available archival material. Tools capable of applying algorithmic approaches to moving images stand to open exciting new avenues of research in digital humanities, cultural analytics, and media studies.

We seek to build the Distant Viewing Toolkit (DVT), an open source software library for studying large collections of moving images. The project is in collaboration with the the Digital Scholarship Lab (DSL) at the University of Richmond and the Media Ecology Project (MEP) at Dartmouth College. Funding is requested for the development and extension of cutting-edge techniques in computer vision to facilitate the algorithmic production of metadata describing the content (i.e., people/actors, dialogue, scenes, objects) and style (i.e., shot angle, shot length, lighting, framing, sound) of time-based media. This information can be used to visualize, summarize, search, and explore digitized corpora of moving images. These techniques will allow scholars to see content and style within and across moving images such as films, news broadcasts, and television series, revealing how moving images shape cultural norms.

To illustrate how the DVT Toolkit might be used in practice: Once completed, we will make a ''software library'' of tools available for free download. For example, humanities scholars, media librarians, and even students will be able to download the toolkit and install it on their personal computer. Once installed, the user can either load their own content for analysis or import preprocessed datasets from our website. The DVT software is able to automatically build a report (i.e., the ''metadata'') describing formal stylistic and content-driven aspects of moving images. The user can then explore the metadata using intuitive tools included in the DVT library to gain a better aesthetic understanding of how the media is constructed and to compare how these aesthetics change across various dimensions such as over time or across different series. Ultimately, the software will allow for the analysis at scale of formal elements that are currently only accessible through close analysis.

Because it is critical that the DVT Toolkit is useful to a wide variety of scholars, the toolkit will be designed to work with a wide variety of content, including news broadcasts, dramatic series, musicals, and comedies. To ensure this, during the grant period we will be testing the toolkit with a variety of scholars studying different kinds of moving images. Specifically, the corpora to be examined as case studies are: early Hollywood feature films (Jenny Oyallon-Koloski, Illinois-Urbana Champagne); mid-century educational films produced by the US Government (Bret Vukoder, Carnegie Mellon University); Network Era comedies (Annie Berke, Hollins University); police procedurals (Claudia Calhoun, NYU); and local television news broadcasts (Lauren Tilton, University of Richmond). These case studies not only demonstrate the utility of the toolkit, but also provide our team with valuable feedback on how it should function.


Acland, Charles R. and Eric Hoyt, Editors. The Arclight Guidebook to Media History and the Digital Humanities. REFRAME Books, 2016.

Ajmera, Jitendra, Iain McCowan, and Hervé Bourlard. “Speech/music segmentation using entropy and dynamism features in a HMM classification framework.” Speech communication 40.3 (2003): 351-363.

Amos, Brandon, Bartosz Ludwiczuk and Mahadev Satyanarayanan. “OpenFace: A general-purpose face recognition library with mobile applications.” 2016.

Baraldi, Lorenzo, Costantino Grana, and Rita Cucchiara. “Measuring Scene Detection Performance." Iberian Conference on Pattern Recognition and Image Analysis. Springer International Publishing, 2015.

Baughman, James L. “Television Comes To America, 1947-1957.” Illinois History 46.3, 1993.

Baughman, James L. The Republic of Mass Culture: Journalism, Filmmaking, and Broadcasting in America Since 1941. JHU Press, 2006.

Buckland, W. “What Does the Statistical Style Analysis of Film Involve? A Review of Moving into Pictures. More on Film History, Style, and Analysis.” Literary and Linguistic Computing, 23(2): 219-30.

Burghardt, M., Kao, M., Wolff, C. “Beyond Shot Lengths – Using Language Data and Color Information as Additional Parameters for Quantitative Movie Analysis.” In: Digital Humanities 2016: Conference Abstracts. Jagiellonian University & Pedagogical University, Kraków, pp. 753-755.

Butler, Jeremy G. “Toward a Theory of Cinematic Style: The Remake.” Morrisville, NC: Lulu, 2003.

Butler, Jeremy G. Television: Critical Methods and Applications, 4th Edition. New York: Routledge, 2012.

Caldwell, John T. Televisuality: Style, Crisis, and Authority in American Television. Rutgers University Press, 1995.

Cervone, Alessandra, et al. “Towards Automatic Detection of Reported Speech in Dialogue Using Prosodic Cues.” Sixteenth Annual Conference of the International Speech Communication Association. 2015.

Cosentino, S., et al. “Automatic Discrimination of Laughter Using Distributed sEMG.” Affective Computing and Intelligent Interaction (ACII), 2015 International Conference on. IEEE, 2015.

Dow, Bonnie J. Prime-Time Feminism: Television, Media Culture, and the Women’s Movement Since 1970. University of Pennsylvania Press, 1996.

Ewerth, R., Mühling, M., Stadelmann, T., Gllavata, J., Grauer, M. and Freisleben, B. Videana: A Software Toolkit for Scientific Film Studies. In: Ross, M., Grauer, M. and Freisleben, B. (eds.), Digital Tools in Media Studies –- Analysis and Research. An Overview. Bielefeld: tanscript Verlag, pp. 100-16.

Fiske, John. Reading Television. Routledge, 1978.

Horwitz, Jonah. “Visual Style in the ‘Golden Age’ Anthology Drama: The Case of CBS.” Cinémas: Journal of Film Studies 23, no. 2–3 (2013): 39–68.

Kar, T., and P. Kanungo. “A Texture Based Method for Scene Change Detection.” 2015 IEEE Power, Communication and Information Technology Conference (PCITC). IEEE, 2015.

Kumar, Rupesh, Sumana Gupta, and K. S. Venkatesh. “Cut Scene Change Detection Using Spatio-temporal Video Frames.” 2015 Third International Conference on Image Information Processing (ICIIP). IEEE, 2015.

Pulver, Andrew, Ming-Ching Chang, and Siwei Lyu. “Shot Segmentation and Grouping for PTZ Camera Videos.” 10th Annual Symposium on Information Assurance (ASIA 2015). 2015.

Manovich, Lev. The Language of New Media. MIT press, 2001.

Mittell, Jason. Complex TV: The Poetics of Contemporary Television Storytelling. NYU Press, 2015.

Moore, Barbara, Marvin R. Bensman, and Jim Van Dyke. Prime-Time Television: A Concise History. Greenwood Publishing Group, 2006.

Morley, David. Television, Audiences and Cultural Studies. Routledge, 2003.

Moretti, Franco. Distant Reading. Verso Books, 2013.

Newman, Michael Z. Video Revolutions: On the History of a Medium. Columbia University Press, 2014.

Sanders, Jason, Gabriel Taubman, and John J. Lee. “Background Audio Identification for Speech Disambiguation.” U.S. Patent No. 9,123,338. 1 Sep. 2015.

Salt, Barry. “Statistical Style Analysis of Motion Pictures." Film Quarterly 28.1 (1974): 13-22.

Silverstone, Roger. Television and Everyday Life. Routledge, 1994.

Spangler, Lynn C. Television Women from Lucy to Friends: Fifty Years of Sitcoms and Feminism. Greenwood Publishing Group, 2003.

Sun, Yi, Ding Liang, Xiaogang Wang, and Xiaoou Tang “DeepID3: Face Recognition with Very Deep Neural Networks.” arXiv preprint arXiv:1502.00873 (2015).

Thomson, Kristin. Report of the Ad Hoc Committee of the Society For Cinema Studies, “Fair Usage Publication of Film Stills". 2010.

Tsivian, Yuri, and Gunars Civjans. “Cinemetrics: Movie Measurement and Study Tool Database.” (2011).

Xu, Peng, Lexing Xie, and Shih-Fu Chang. “Algorithms And System For Segmentation and Structure Analysis In Soccer Video.” ICME. Vol. 1. 2001.

Zhang, Richard, Phillip Isola, and Alexei A Efros. “Colorful Image Colorization”. European Conference on Computer Vision (ECCV), 2016.

Zhou, Bolei, Aditya Khosla, Agata Lapedriza, Aude Oliva, and Antonio Torralba. “Learning Deep Features for Discriminative Localization.” Computer Vision and Pattern Recognition (CVPR), 2016.