Adventures in the Archive: Text Mining Pan Am Periodicals

Adventures in the Archive: Text Mining Pan Am Periodicals 

Pan American Airlines was the major international airline for much of the 20thcentury, the United State’s unofficial national airline, and responsible for many of the innovations in air travel that are still with us today. Following bankruptcy in 1991, University of Miami acquired its archives and, thanks to a grant from National Historical Publications & Records Commission (NHPRC), has now been able to digitize all Pan Am’s printed materials as Pan Am Periodicals.  

This archive covers the airline’s inception in the 1920s to its collapse in the 1990s, spanning a period of vast technological, social and cultural change. Periodicals could range from the chronologically specific, such as the Africa Newsletter published in 1942, to the chronologically broad, such as the Annual Reports which ran from 1939 to 1989. Inevitably, there is also a huge geographical sweep, from London based in-flight magazines, to reports covering the divisional breakdowns of Latin America, Pacific Alaska, and Atlantic. As such, this archive has much to offer anyone wishing to explore the 20thcentury from the perspective of a global American corporation.

By the time I joined this project, the labor of scanning all the material and turning it into plain text files, with detailed metadata, had already been accomplished by a team including Paige C. Morgan, Laura Capell, Paul Clough, Jason Cohen, Elliot Williams, and Gabriella Williams. My role, along with my fellow graduate student Lilianne Lugo Herrera,was writing introductions and Potential Starting Points (PSPs) for each periodical, primarily for an audience who would be interested in text mining the material, but might also be brand new (as I was) to distant reading.

This meant AntConc. AntConc was a really useful text mining tool because it did not require coding knowledge nor, even, once the plain text files were downloaded, the internet. I used this tutorialby Heather Froelich (while not quite up to date, it’s very useful and thorough) to get to grips with the software and then practiced on the periodicals. For me, text mining works in two important ways: firstly, allowing you to quickly find useful or interesting material; and secondly, enabling you to gain an understanding of the text based on an overview and/ or quantitative data. My introductions focus on what the periodical is, who wrote it, as well as its size and intended audience, whilst the PSPs aimed to offer specific ways of text mining the periodical which might be illuminating. Therefore, my job was not to make arguments about the text, but to suggest topics around which interesting material could be found or quantitative data generated, both of which enables scholars and an interested non-academic public to understand and make arguments about Pan Am. 

Some PSPs were easier to write than others. One of the first periodicals I looked at was the Africa Newsletter, covering seven newsletters and thirty-one pages of text. As this dataset was very small,  I read through the files on the University of Miami Digital Collections website, easily locatable by the object ID number. The Africa Newsletter was very upfront about both its intended audience – the family and friends of Pan Am employees based in Africa in 1942, and its purpose – reassurance. However, it was just very fascinating to me personally, in terms of what Pan Am could and could not say about its involvement with the British, French and American military, the varying degrees of racist characterization of locals, and the informal, jocular tone that wanted to represent the experience of employees as an exotic vacation and not at all dangerous or risky. Additionally, there was an illustration that involved the outline of the African continent with PAA emblazoned across northern Africa, feeding into ideas of American imperialism. This highlighted the limitations of using plain text files – this illustration, as well as reams of financial tables in later periodicals – were simply not replicable through OCR versions of the periodical. My PSPs offered ways to access this material quickly through text mining, suggesting searching words such as ‘native’, ‘military’, ‘America’, ‘Britain’ or ‘play’, ‘work’ and ‘home’. I also directed readers to the illustrations, as well as suggesting that this small dataset could be combined or compared with other WW2 periodicals.

This approach proved the exception rather than the rule as many periodicals were simply too large to read in their entirety. For these periodicals, distant reading was not only a good option, but the only option. Generally, periodicals were also less clear about audience and purpose, with both being assumed rather than explicitly stated. Once I had worked this out, usually by reading the first few pages of the first issue, I would search general terms based on this information. What themes does audience and purpose dictate? Financial? Military? Commercial? How much of the periodical is fact based or more personal, human interest? Which cities and countries are covered? These search terms served a dual purpose: firstly in offering a statistical insight into how often these words appear, as well as their collocates (words they frequently appear next to); and secondly, in what broader context these words appear, allowing access to different examples of a specific word which could potentially be constructed into an argument. 

For example, the Annual Reports of the Pacific Alaska Division covered a short period of time, 1948-1951, but provide an insight into how China as ruled by a Communist government was perceived by an American corporation. There are reports on the difficulty of knowing what ‘normal’ is in a region still at war, what exactly is happening between various countries, economic aid from the United States and the perceived danger of communism spreading. As well as this insight into Asia’s role at the beginning of the Cold War, there are more specific moments that emerge, such as the fact that there was an accommodation crisis in Hong Kong in 1948, due to the amount of Chinese refugees. This was worth noting for Pan Am because it meant that you could only enter Hong Kong if you had a hotel reservation, causing a two million dollar backlog in business. This example points to Pan Am’s global scope, and its implementation of transnational and transcultural interactions, all from a firmly US, and usually specifically New York or Miami, perspective. 

The chronological scope of this archive provides many opportunities to track the social, cultural and financial global changes through the prism of Pan Am. This was particularly noticeable in terms of the attitude towards Communist countries in general, but also developing attitudes towards women and minorities. The word ‘minorities’ begins appearing in material published in the late sixties in a neutral, professional manner, replacing the, albeit infrequently used ‘colored’, and the more misogynistic characterizations of women certainly reduce across the period.

As well as tracking these developments in different contexts, from the professional internal memo to the informal, in-flight magazine, I was interested in what was not referenced or mentioned. This is particularly true of periodicals discussing World War Two or the Vietnam War, which were actively being censored, but also to my surprise, I could find no references to the assassination of President John F. Kennedy in the immediately ensuing months. The reference I eventually found was an exhibition about his life, which Pan Am were responsible for flying to Europe. Pan Am’s concerns also seemed fundamentally different to many large corporations today. An internal Management Memo from 1970 expressed outrage that two graduate students had published articles in left wing magazines critiquing Pan Am’s involvement in the Vietnam War. The idea of a corporation, such as Amazon, for example, caring about a graduate student’s opinion is, unfortunately, so unlikely now. 

Throughout exploring this archive and writing the PSPs themselves, I was conscious that they were being driven by own interests and humanities training (as well as my British education which in particular resulted in a very different perspective on World War Two). Beyond the basic suggestion to explore financial vocabulary, I could offer no more details on some of the more economically focused datasets, whereas I was much more capable of describing ways to explore gender. 

I have used digital archives, especially Early English Books Online, many times during my graduate work, and while I understood theoretically that these archives took time to digitize and reconstruct, I never realized how much time until I worked on Pan Am Periodicals. Making as much material available, in a sustainable, easily used manner is a crucial part of humanities infrastructure and deserves all the support it can get. Additionally, this project brought home to me that archival work is not objective, but instead inevitably strengthened and also curtailed by the interests and specialisms of the individual and more broadly the institution. The PSPs myself and Lilianne created provide initial guidelines into the periodicals, but do not exclude a researcher from pursuing their own interests. Pan Am Periodicals is an incredibly rich archive, with lots of potential for interesting explorations and discoveries not just about Pan Am but about the 20thcentury, which, thanks to time, intellectual work and labor, is now available for anyone to use and explore. 

The Pan Am Periodicals Plain Text Dataset is available here

Image courtesy of University Of Miami Libraries.