top of page
7.jpeg

Network Visualization Tools

Develop narrative and visual navigational tools which facilitate researchers to form connections between the people in the Emma B. Andrews Diary Project. This involves creating models, performing UX testing, implementing design thinking principles and working with programmers to develop proof of concept prototypes. 

“Abydos,” The Emma B. Andrews Diary Project

Historical Network Research Methodology

The objective for this paper create an evidence based methodology for joining text mining data for the 19 volumes of the Emma B. Andrews Diary with the preexisting biographical data recorded in the Master Indices document. One level of organization the project sets out to establish is the categorization of historical figures into occupational fields in order to better understand the spheres of influence as they connected in Egypt at the outset of the twentieth century. In the words of historical sociologist Claire Lemercier, the value of this tool is the "ability to provide many complementary views and indicators, especially helping the researchers to navigate between scales."1

The chronological organization of biographical figures into fields can provide researchers with a sense of possessing narrativity rather than simply being narrative when viewed at each biography's micro scale, allowing them to form unique connections depending on their areas of study.2 The consolidation of text mining data and biographical information into a single bipartite network3 can support  researcher metacognition through recursive searching and encourage emerging ideas around "future paths to pursue and prior paths worth revisiting."4 Organizing biographical figures will not only illuminate the embeddedness, or the "intertwined nature of social, economic, political and religious"5 interactions in Egypt during this period, but also how these connections may have formed long lasting relationships that have influenced modern science, culture and industry.

Repository Characteristics

The diary, spanning 1889 to 1913, was diligently maintained by Emma B. Andrews while traveling Egypt with Theodore Davis, a prominent lawyer turned excavator and art collector, meeting and taking notes on nearly 950 different historical figures. Because the network is composed almost wholly of travelers visiting Egypt, the network is uniquely occupied by proactive participants, wherein "homogamy and endogamy are the product of choice" rather than circumstance, as would have been the case if the diary documented domestic life back in Andrews' Columbus, Ohio.6 

 

While the figures in the EBA diary nearly all share this commonality as fellow travelers, there is a great diversity in the professional, academic and economic backgrounds in the network, indicating value homophily rather than status homophily.7 Handlers, excavators and draughtsmen are mentioned alongside gilded age robber barons, as a lack of western amenities at that point in Egypt disassembled class barriers which would have been adhered to in the travelers home countries. This environment lends itself to the creation of what Ronald S. Burt would call the structural hole argument where "social capital is created by a network in which people can broker connections between otherwise disconnected segments."8 The structural hole argument could be illuminated in this network by highlighting relationships bonded between various scientific fields and titans of industry, possibly resulting in funding and research based innovation that otherwise wouldn't have occurred.

 

The fairly contained and proactive social environment that Emma B. Andrews documents fulfills Charles Wetherell's basic principles of social networks analysis:

 

a) Actors in all social systems are viewed as interdependent rather than independent.

b) Linkages or relations among actors channel information, affection and other resources.

c) The structure of those relations or ties among actors both constrain and facilitate action.

and d) The patterns of relations among actors define economic, politics and social structure.9

 

Another element that lends to the repository's adaptability to historical network research is the author's thoroughness, attention to detail and reliability of entries. In comparison with Mary Newberry, a cousin of Theodore Davis who traveled with the couple from 1912 to 1913, Emma B. Andrews' writing can feel much more bare, reading almost like a shipping manifest, wherein the movement of guests on and off of the boat as well as notes about the crew and the weather is recorded diligently, if briefly. 

 

Here is an example from the two writers recorded after George Washington's birthday in 1913 with biographical figures highlighted in red and locations highlighted in blue, to demonstrate the density of Emma B. Andrews' writing style. 

 

Mary Newberry:

 

"Washington's Birthday.-and a nice American day as befits an American on February 22 wherever she happens to be. I played dominoes all the morning and in the afternoon I crossed over to Luxor, ordered a carriage, annexed those enchanting Pier children and drove out to the Savoy to call on Mrs. Daniel Eaton. As we passed along the waterfront -"Children" said I, "do you know why all the American boats are dressed in flags today?" "No", said the sweet little voices. "You don't know what day this is?" "No", expectantly and excitedly. "Why, it's Washington's Birthday". "Oh, is it? Why, I don't believe Father and Mother know it." "Then you must tell them when we get back". Mrs. Eaton and Mrs. Van Winkle were out, and we drove back again all too soon. As we again passed the patriotic display of bunting that adorable Eleanor Pier, aged seven, said "Whose birthday is this? Did you say it was President Taft's?" (Newberry continues the entry for two additional paragraphs).

 

Emma B. Andrews:

 

Very warm.  Mr. Whymper to breakfast - the Piers to lunch.  Mrs. Aylener and Morehouse to tea.  We have had Mr. and Mrs. Hardy to tea - Senator Aldrich also.  Mary N. and I have been over to the Winter Palace for tea - we looked very smart on Washington’s birthday with all our flags."

Definitions

Field: An area or division of an activity, subject, or profession.10

 

Occupation: An activity in which one engages.11

 

The advantages of this last definition are in the broadness of its scope, wherein more ambiguous and interchangeable titles like heir, socialite and traveler as well as ideological designations like philanthropist and activist, are included. This also simplifies the fiddly differentiation between occupation, pastime and hobby. The definition is adaptable to antiquated titles as well as figures of antiquity, such as the Pharaohs that frequently play central roles in the diaries while being long dead. 

 

An earlier model for this project's design adhered to Miriam Webster's second definition of occupation, "the principal business of one's life," but after conducting more research on social network analysis (SNA), I found that this judgement relied too heavily on our interpretation of the biographical material. While one could follow a rule of using the one title that the historical figure drew the main source of their livelihood from, i.e. the occupation that was most responsible for bringing them to Egypt, this is time consuming to research, involves some amount of guesswork and necessarily reduces the amount of data from the final network visualization. Placing the historical figure in each of the fields they were known to occupy creates more access points for researchers and allows them the means to determine the validity of that categorization.

Interpretation

"The ability to cut away just enough data to make the network manageable, but not enough to lose information, is as much an art as it is a science."12 

Scott Weingart, Demystifying Networks

 

The qualitative aspect of the project is largely in the designation of occupational fields. Seen through Van Leeuwen's coding orientation schema, this designation is largely sociological, informed by late nineteenth century values inherent in the text and interpreted through our current lens.13 Because the work is focused on how to best use the preexisting data and make it more useful for researchers, occupations are assigned based on information in the Master Indices document, developed by Sarah Ketchley for the Emma B. Andrews Diary Project over the course of the past ten years. 

 

These designations attributed to the figures include any activity that they engage with on a long term basis and is not connected to economic gain or success within that particular field. The figures appear in each occupation that they listed in without a weighted qualification. All visualizations are necessarily reductions of data and I feel this method for differentiating fields is a mindful balance between detail and the feasibility of successfully completing the network visualization tool.

 

Earlier iterations of the field designations were originally made up of fifteen categories. After discussing the proposal with the rest of the team during the weekly meeting, it was suggested that a maximum of ten fields with ten subcategories would be ideal for navigation. Using the International Standard Classification of Occupations14 and a Domain Taxonomy15 as a guide, I created the second draft seen below. The advantage of this design is that much of the military, political and royalty overlap is contained within the Institutions field, as well as the aristocrat, socialite and heir ambiguities within the Public Figure field. 

EBA Fields.5.png

Second draft of the Fields/Occupation Taxonomy

Companions

One complex element of earlier iterations for this project involved the inclusion of female wives and partners in the occupation of their spouses. The rationale for this concept was that while couples were traveling, their level of social interaction was much stronger than under normal circumstances and, in many cases, it seems that the female figures had more proactive roles as hosts and social intermediaries than their male counterparts. One note, in Bruno Latour's Actor-Network Theory, an intermediary is designated pejoratively below mediator, whereas I am using the term more traditionally as someone who connects one group of people to another.16

 

The argument supporting this decision is that network analysis could reveal the "centrality of a woman in a particular network, or her influence in terms of the number of contacts that she helped facilitate," to quote Theresa Kemp from Accounting for Early Modern Women in the Arts.17 One counterargument I found held that including women in the fields of their partners may rob them of their historical agency, i.e they become the output of their male partners. I found both of these arguments compelling and a large amount of research in both social network analysis and historical network research has yielded little preexisting writing on this question of agency in historical network research (HNR), or, a research paradigm focused on the importance of relationships among people, organizations and concepts to explain historical phenomena.18 

Voyant context tool for the word acquaintance throughout all volumes

What ultimately caused me to pivot from this approach was the text mining data from all combined volumes. Using the Voyant context tool with words like acquaintance, host, introduce, meet and presented there wasn't clear language which indicated who was doing the social connecting at the many gatherings that took place in the diaries.19 This all could be due to Andrews' style of writing, which tended to plainly state "someone came to dinner," or simply "----- to breakfast" rather than "I had ---- over for breakfast," or, "I hosted a party and introduced ---- to ----." This may be due to the etiquette of the time or simply that it would be redundant information to include in a diary, when it was perfectly clear to the author who was the host and who was the guest, so there was no need to identify this within the entry. 

Voyant context tool for the word introduced throughout all volumes

That said, because these details are not in the text itself, making this claim for all of the female figures in the EBA diaries may be inaccurate. One alternative approach that would retain the integrity of the fields while also recognizing the social influence of the female figures in the diary may be to create another tier for the wives and partners, wherein researchers could make these associations between intermediaries and the active participants in the vocation. 

Future Expansion and Refinement

Out of the nearly 950 figures in the EBA diary, only around 250 clearly apply to the designated occupational fields. Future research work can be done to fill in this information and this can be added to the network visualization tool. Also, as demonstrated in some HNR projects, Wikipedia could be used to automatically fill in designations, which may be viewed as a boon for saving time but a possible compromise in accuracy. 

 

Another concept that came up repeatedly in discussing the fields was the personal finances of individuals and how it affected when they came to Egypt, how frequently and even what occupational titles they assumed. This might be an interesting element to include for researchers and could be signified by simply expanding or reducing node size of the individual depending on their wealth. 

Conclusion

There is great potential in this project to make an excellent and valuable resource more useful and more visible to researchers. Established in the principles of SNA and HNR theory, the methodology for this network visualization can ensure the consistent selection, extraction and filtration of data to specified parameters unique to the attributes of the Emma B. Andrews repository.20

References

1 Lemercier, Claire. “12. Formal Network Methods in History: Why and How?” Social Networks, Political Institutions, and Rural Societies, edited by Georg Fertig, vol. 11, Brepols Publishers, 2015, pp. 281–310. DOI.org (Crossref), https://doi.org/10.1484/M.RURHE-EB.4.00198.

2 Venturini, Tommaso, et al. “11. How to Tell Stories with Networks: Exploring the Narrative Affordances of Graphs with the Iliad.” The Datafied Society, edited by Mirko Tobias Schäfer and Karin van Es, Amsterdam University Press, 2017, pp. 155–70. DOI.org (Crossref), https://doi.org/10.1515/9789048531011-014.

 

3 Düring, Marten. “From Hermeneutics to Data to Networks: Data Extraction and Network Visualization of Historical Sources.” Programming Historian, Feb. 2015. programminghistorian.org, https://programminghistorian.org/en/lessons/creating-network-diagrams-from-historical-sources.

 

4 Mirel, Barbara. “Building Network Visualization Tools to Facilitate Metacognition in Complex Analysis.” Leonardo, vol. 44, no. 3, June 2011, pp. 248–49. DOI.org (Crossref), https://doi.org/10.1162/LEON_a_00176.

5 Collar, Anna, et al. “Networks in Archaeology: Phenomena, Abstraction, Representation.” Journal of Archaeological Method and Theory, vol. 22, no. 1, Mar. 2015, pp. 1–32. DOI.org (Crossref), https://doi.org/10.1007/s10816-014-9235-6.

6 Lemercier, Claire. “12. Formal Network Methods in History: Why and How?” Social Networks, Political Institutions, and Rural Societies, edited by Georg Fertig, vol. 11, Brepols Publishers, 2015, pp. 281–310. DOI.org (Crossref), https://doi.org/10.1484/M.RURHE-EB.4.00198.

 

7 McPherson, Miller, et al. “Birds of a Feather: Homophily in Social Networks.” Annual Review of Sociology, vol. 27, no. 1, Aug. 2001, pp. 415–44. DOI.org (Crossref), https://doi.org/10.1146/annurev.soc.27.1.415.

 

8 Burt, Ronald S. “Structural Holes versus Network Closure as Social Capital.” Social Capital, by Nancy Lin et al., 1st ed., Routledge, 2017, pp. 31–56. DOI.org (Crossref), https://doi.org/10.4324/9781315129457-2.

9 Wetherell, Charles. Historical Social Network Analysis. 2021, p. 126.

10 Field. https://www.merriam-webster.com/dictionary/field. Accessed 31 July 2021.

11 Occupation | Definition of Occupation by Merriam-Webster. https://www.merriam-webster.com/dictionary/occupation. Accessed 31 July 2021.

12 Weingart, Scott. “Demystifying Networks.” The Scottbot Irregular, 14 Dec. 2011, http://www.scottbot.net/HIAL/?p=6279.

13 Engebretsen, Martin, and Kennedy Helen, editors. Data Visualization in Society. Amsterdam University Press, 2020. DOI.org (Crossref), https://doi.org/10.5117/9789463722902.

14 “International Standard Classification of Occupations.” Wikipedia, 10 Nov. 2020. Wikipedia, https://en.wikipedia.org/w/index.php?title=International_Standard_Classification_of_Occupations&oldid=988067343.

 

15 Figure 3: Domain Taxonomy. | Scientific Data. www.nature.com, https://www.nature.com/articles/sdata201575/figures/3. Accessed 27 July 2021.

 

16 Institut d’Etudes Politiques de Paris (Sciences Po), and Bruno Latour. “On Actor-Network Theory. A Few Clarifications, Plus More Than a Few Complications.” Philosophical Literary Journal Logos, vol. 27, no. 1, 2017, pp. 173–97. DOI.org (Crossref), https://doi.org/10.22394/0869-5377-2017-1-173-197.

17 Wiesner-Hanks, Merry E. Challenging Women’s Agency and Activism in Early Modernity. p. 27.

18 Lintunen, Tiina, and Kimmo Elo. “Networks of Revolutionary Workers: Socialist Red Women in Finland in 1918.” International Review of Social History, vol. 64, no. 2, Aug. 2019, pp. 279–307. DOI.org (Crossref), https://doi.org/10.1017/S0020859019000336.

19 Voyant Tools. https://voyant-tools.org/?corpus=c51598f04d885c09ade2312f03085db9. Accessed 29 July 2021.

20 Engebretsen, Martin, and Helen Kennedy, editors. Approaching Data Visualizations as Interfaces: An Empirical Demonstration  of How Data Are Imag(in)e. Amsterdam University Press, 2020. DOI.org (Crossref), https://doi.org/10.2307/j.ctvzgb8c7.

Fields Network Visualization

Once I figured out the final version of the taxonomy outlined in the week seven reflection, I needed to reformat the biographical figures within that framework. The first excel document for the earlier iterations of the taxonomy was a fairly giant and messy database and this second approach was an opportunity to clean up the process. On reviewing the Master Indices document again, I also decided that I wanted to pull from the Research Notes section, in addition to the Occupation and Biographical Notes columns. The Research Notes seemed like it might be too informal as it sometimes served as a section where collaborators could leave marginalia but I found a few examples where this information, either on it's own or in concert with other fields, provided enough data to assign figures within the fields/subfields schema. 

Anchor 1
EBA Fields.6.png

Final draft of the Fields/Occupation Taxonomy

I was also able to refine the formatting process. Initially, I assumed that the node sheet contained the most detailed information but found by generating examples through Gephi that the opposite was true. In the second iteration, the node sheet contained only id and label, whereas the edge sheet contained the source, target and type (direct reflections of those id fields). The edge sheet also contained label (name), field (e.g. Academic), subfield (e.g.. Scholar), attribute (e.g. American Theologian, the most refined descriptor), volume, time_start (first appearance in the diary) and time_end (an arbitrary month after first appearance towards the purposes of the dynamic visualization).

Example of second iteration edge sheet

The following ref # column was the amount of times that the figure was mentioned in the entire 19 volumes. This was created in order to serve as a weight in Gephi in order to indicate the frequency of appearances in the repository. Finally, the text column was created for a short, biographical blurb in case there was a possibility to create an infobox within Gephi, although there does not now appear to be a way to program this into the design. 

 

I also realized that I only really needed to focus on creating the edge sheets first and that the nodes can be put together fairly quickly and more accurately when assembled  last. Although I strictly didn't need to, I found that it was easier for me to build out each field's edge documents independently. This helped me keep all the figures in alphabetical order and make sure that they were added into the appropriate categories if they worked in multiple disciplines. For each of the 257 figures that were applicable to the field/subfield schema, I wanted to verify their entries from the Master Indices document in the actual EBA diaries as well as add two new fields of reference, total number of mentions within all volumes and date of first appearance. 

 

I felt that the total number of appearances would be visually informative because it would illuminate unusual frequencies. While the highest number of references belong to travel companion Theodore Davis, insights can be drawn by looking at patterns across all fields at a glimpse. Fields such as Social Science show a very high frequency number of mentions considering their relatively small number of figures (43 figures, 121 mentions), whereas the much more populated Institutions show a much lower percent of references (70 figures, 124 mentions). This could reflect travel trends, egyptology popularity or simply what type of people Emma Andrews enjoyed spending time with. 

443694-1389977850.jpeg

Alexander Duff, 1st Duke of Fife (1849-1912), seated with Princess Louise, Duchess of Fife (1867-1931), via the Royal Collection Trust

By isolating the date of first appearance, we can identify potential trendsetters within each field which may have drawn in others. An example is the politician, banker and landowner the Duke of Fife, whose appearance in January of 1911 preceded the greatest concentration of figures within the Business and Law field in all of the EBA volumes. Another reason I was interested in retrieving these dates was in order to generate a dynamic visualization. By tracking these first appearances, an animated dynamic visualization could more immediately demonstrate these patterns for researchers and possibly serve as an information rich, low time investment tool which might encourage further research in the text repository. 

 

Piecing together reference documents in order to verify the Master Indices information and fill in the time_start and ref# fields was a process in and of itself. To begin, I pulled all of the volumes together from separate text files into a single pdf. Second, I created a guide to demarcate time periods and volumes. This process was trickier than I anticipated. Andrews rarely noted the year of her entries outside of the first record of January and there are some portions around volume 17-19 where the entries are so abbreviated that they seem to jump back and forth a bit around New Years.

guide.png

All Volumes Guide

The process for verifying information and creating the fields essentially boiled down to searching the all-volume document for the figure, referencing the guide, which would provide the year of the entry by page number (pp. 187-190: 1896) and then referencing the original volumes (Emma B Andrews Journal Volume 4 1896-1897) in order to make sure the correct volumes were listed in the Master Indices columns. While the workflow took a little time to get accustomed to, it was certainly more efficient than sifting through all 19 volumes individually. 

Biographical Figures with corrected or updated fields

Completing the edge documents for all 8 fields, I had identified 36 adjustments to the Master Indices document. These included identifying missing volume appearances, documenting discrepancies between the name field and how they are referred to in the text (Akhenaten vs. Khuenaten) and noting when figures appear in the Master Indices but were irretrievable in the text as in the case of Diemer, Michael Z and Tyndale, Walter from the Art field. It's possible that these figures appear under alternate aliases in the diaries that are not specified in the Master Indices. Once these revisions to the structure of the spreadsheet documents were implemented, I was able to reduce the number of tabs from 52 to 15 between the first and second iterations. 

 

While my first Gephi documentation above this one may have deceived the reader into thinking I was on the other side of the mountain, I managed to find many more inclines in the implementation of this updated taxonomy. As I mentioned in that writing, the formatting of data and import settings play an outsized role in Gephi's functionality. There is also an incredible dearth of information about the many import settings and how they practically function. 

import settings.gif

Mysterious Gephi Import Settings

Without sloshing around a bog of technical details, the key is to simply label everything as a string if it is grammatical, an integer if it is numerical and a double if it is a weighted value. These designations are general enough that you can manipulate the columns towards more specific aims further on in the process. Designating an import function as an interval allows the visualization to become dynamic and timestamp renders the visualization static. A note to Gephi designers: timestamp and interval are interchangeable words whereas dynamic data visualization and static data visualizations are two distinct and easily understandable concepts! Surprisingly, I was unable to find any resource that had a basic guide to all of the import settings (Boolean, Character, IntervalMap, etc) so I am unable to elaborate beyond advising you not to choose them. 

isometric.1.gif

Initial Isometric Layout

One concept that I hadn't fully realized before was that statistic tools need to be run on your data before they can be implemented into the visualization. I was able to find this helpful guide which details the average degree and modularity of your nodes into the spacing and sizing of the visualization. Another important piece of information that I gleaned from the document, was that, much like an incantation, layout settings need to be invoked in a specific order. 

circlepack.5.2.gif

Circle Pack Layout without Additional Steps

For the visualization displayed below, the conjuring was MultiGravity ForceAtlas 2 (groups nodes according to timestamp information using approximate repulsion)> Circle Pack Layout (groups nodes according to characteristic, in this case, field)> Expansion (spreads out nodes and edges for better viewer comprehension, avoids the 'hairball' effect)> Label Adjust (very subtly randomizes the arrangement of nodes so that displayed labels don't overlap). 

Gephi Viz.1.gif

Circle Pack Layout with Additional Outlined Steps

For appearance, nodes were sized according to the weight of the reference number column, which tracked the number of times they were mentioned in the entire EBA volume. The color of the nodes was determined by field using the Appearance>Partition>Modularity function. The edge colors were determined by their associated subfield via Appearance>Partition>Subfield, which contained 38 total categories. Considering palettes, I felt that bold distinct choices would be best for the nodes (Intense) and many variations on a single directed color would prevent ocular overload for the edge information (Ice Cube). 

2.3.gif

Detail of Subfield Color Palette 

Figuring out the time interval elements necessary for creating a dynamic version of the visualization proved exceptionally difficult. I learned that I needed to convert start_time and stop_time fields into a specific formula that would turn 1893-3-1, 1894-4-1 into <[1893-3-1, 1894-4-1]>. Pretty simple Excel conversion, right? Little did I know that Excel doesn't understand any time periods before the twentieth century as dates.

 

Given that nearly a third of the entries were from the nineteenth century, I spent quite a while searching for formulas to simply combine these fields. After attempting a few brain breaking examples, I wound up cobbling together the approach below. Using the text to column function, I separated the m-dd-yyyy integers  and then recombined them into the <[m-dd-yyyy, m-dd-yyyy]> formula needed for Gephi dynamic visualizations. One last note, the interval column needed to be imported as a string rather than the more intuitive IntervalSet import function. 

Pre-Twentieth Century Date Conversion Process

Finally, with Vincent Wilson's assistance, I was able to successfully import the visualization using Sigma.js. One complication that I hadn't anticipated was that Sigma does have a way to convey edge information. I needed to merge the important elements (field, subfield, attribute, volumes and introduction date) into the label column of the node sheet for the visualization. 

finsihed.1.png

Sigma.js Import of Network Visualization

Also crucial to the process was uploading the Sigma.js Export plug-in for Gephi. This generates a little form where you can adjust the title and descriptors as well as the hovering function. While I need to adjust a few elements in the formatting and node sizing, overall, I am very happy with this network visualization. It sounds like the rest of the group is going to incorporate the fields/subfields information into the greater team network visualization and I am extremely thankful to have taken part in this unique collaborative effort!

final.1.gif
2.gif
bottom of page