Quarta-feira, 16 de Novembro de 2011
Peter Doorn (Data Archiving and Networked Services, Nederland), Computational history among e-science, digital humanities and research infrastructures: accomplishments and challenges
This presentation will focus on the following subjects: first I will briefly introduce DANS; after that I will place the developments in computational history in the context of the developments in e-Science and the digital humanities. Over the years we see a gradual increase in the scale of projects, partly brought about by computation itself and the specialization it requires. Therefore we can see an increased attention for digital data and research infrastructures, both at the national and at the European level.
DANS is an institute of the Royal Netherlands Academy of Arts and Sciences (KNAW) and the Dutch Research Funding Organisation (NWO) and was founded in 2005 (www.dans.knaw.nl). It builds on the work of predecessors, the first of which dates back to 1964 (Steinmetz Foundation and Archive for the social sciences). The Netherlands Historical Data Archive (NHDA) was created in 1989, inspired by the needs of historians and the creation of numerous historical databases, which needed to be archived and kept accessible for later use. The central task of DANS is to provide permanent access to digital data in the humanities and social sciences, although we recently started to gradually expand our services to other domains as well.
DANS maintains a digital archive with substantial data collections in history, social sciences, and archaeology. We also carry out data projects in collaboration with research communities and partner organizations. Moreover, we give advice and support, for example we developed a Data Seal of Approval (see: http://www.datasealofapproval.org/), aiming at quality control of data and repositories, and maintain a Persistent Identifier Infrastructure based on the URN (see: http://www.persid.org/index.html).
In short, DANS promotes permanent access to digital research data; it encourages scientific researchers to archive and reuse data by means of our online archiving system EASY; we provide access, through www.narcis.nl, to thousands of scientific datasets, e-publications and other research information in the Netherlands; moreover, DANS provides training and advice, and we perform research into archiving of and access to digital information.
History and computing as e-Science:
It makes sense to place the developments of computational history in the past decade in the context of e-science, which has been defined back in 2001 as “Science increasingly done through distributed global collaborations enabled by the Internet, using very large data collections, tera-scale computing resources and high performance visualisation.“ (UK Department of Trade and Industry; Research Council e-Science Core Programme). Jim Grey, Tony Hey and others spoke of a “fourth paradigm” in science, characterized by a high data intensiveness. Increasingly, scientific breakthroughs will be powered by advanced computing capabilities that help researchers manipulate and explore massive datasets. The speed at which any given scientific discipline advances will depend on how well its researchers collaborate with one another, and with technologists, in areas of e-Science such as databases, workflow management, visualization, and cloud-computing technologies.
Although the scale of humanities research, including the work of historians, is much smaller than that in astronomy or particle physics, most specialists agree that the tendencies and needs of e-science and e-humanities are basically similar. Humanities computing was defined by Willard McCarty in 1999 as “an academic field concerned with the application of computing tools to arts and humanities data or to their use in the creation of these data.” Terms such as computational humanities, digital humanities and e-humanities are now also in use, and essentially denote similar things (with nuance I do not intend to get into).
Since the 1990s many people have come up with definitions for or descriptions of computing in historical research (this list can be easily expanded):
• Charles Harvey: historical computing must be concerned with the creation of models of the past or representations of past realities.
• Matthew Woollard: History and computing is not only about historical research, but also about historical resource creation.
• George Welling: Historical Informatics (computational history) is a new field of interdisciplinary specialization dealing with pragmatic and conceptual issues related to the use of information and communication technologies in the teaching, research and public communication of history.
• Lawrence McCrank (2002): Historical information science integrates equally the subject matter of a historical field of investigation, quantified social science and linguistic research methodologies, computer science and technology, and information science, which is focused on historical information sources, structures, and communications.”
• Boonstra, Breure, Doorn (2004): Historical information science is the discipline that deals with specific information problems in historical research and in the sources that are used for historical research, and tries to solve these information problems in a generic way with the help of computing tools
In a study on the “Past, Present and Future of Historical Information Science” I published together with Onno Boonstra and Leen Breure, we distinguished four categories of information problems in historical research, which we ordered on what we called the “life cycle of historical information”: information problems of historical sources (representation); of relationships between sources (harmonization, linkage); of historical analysis (qualitative and quantitative); of the presentation of sources or analysis (visualization, edition). The PDF of the book can be found here: http://www.dans.knaw.nl/content/categorieen/publicaties/past-present-and....
Back in 2004, we were a bit wary on the developments of history and computing in the past few years. It seemed as if the exciting and formative years of historical computing (roughly the period 1985-2000) year were over. Many main-stream historians were just happy to be able to use the computer for text processing, web browsing and emailing.
Probably a degree of specialisation did occur: you simply could not expect every historian to be a programmer, as Le Roy Ladurie once said. The scale of historical research had to go up to get beyond the basic level of computing techniques. Collaboration with professional IT specialists was necessary, and I think we are gradually working towards that direction.
The increase of the scale of digital history projects:
In my presentation I will mention a few examples of big projects we were involved in, and in which computing scientists and historians did work together: the digitization of the Dutch censuses and the project “Life Courses in Context” (the first project in the humanities in the Netherlands to receive an investment grant of a few million Euros; see www.volkstellingen.nl); the project “Climate of the World Oceans”, in which historians, computing scientists and climatologists worked together to retrieve weather observations from historical ships’ logs (www.knmi.nl/cliwoc/); the collaboratory on institutions for collective action (http://www.collective-action.info/); the collaboratory ‘Clio Infrastructure’, building and connecting global data hubs on world inequality, the increasing divergence between rich and poor countries (www.clio-infra.eu). The projects “Telling witnesses” and “Veteran tapes”, in which many hundreds of qualitative interviews have been collected and analysed as “oral histories” of the Second World War and other conflicts (http://getuigenverhalen.nl/) en (http://www.watveteranenvertellen.nl). The project Medieval Memoria Online (MeMO), which aims to help scholars in carrying out research into memoria during the period up to the Reformation (c. 1580) in the area that is the present-day country of the Netherlands (http://memo.hum.uu.nl/). In all these projects, historical researchers and computing experts (and often specialists from other disciplines as well) from several institutes worked or are working together.
The need for research infrastructures:
It is vital that these projects rest on a solid foundation, not only during the course of the project, but also afterwards. If no infrastructure exists that can guarantee the sustainability after the project is finished, the results are in danger of disappearing soon after the projects’ end, and the investment and effort will get lost. This is exactly why digital infrastructures are necessary: to support and maintain the collaborative efforts. The services developed in the projects need to be sustainable, and they can only be maintained efficiently if they are generic and re-usable. This is why a few years ago, not just in the natural and life sciences, but also in the humanities and social sciences, initiatives have been taken to set up infrastructures to support and sustain the investments done in large (and small) projects. The European Strategy Forum for Research Infrastructures (ESFRI) formulated a first “Roadmap” for the creation of such infrastructures (http://ec.europa.eu/research/infrastructures/index_en.cfm?pg=esfri). DARIAH, the emerging Digital Research Infrastructure for the Arts and Humanities, is one of the two infrastructures proposed on the ESFRI Roadmap for the humanities, including history (www.dariah.eu). DARIAH aims to “link and provide access to distributed digital source materials of many kinds”. In the field of linguistics, CLARIN has been set up: Common Language Resources and Technology Infrastructure (www.clarin.eu), and there are also examples in the social sciences. In several countries, among which the Netherlands, it is proposed that CLARIN and DARIAH will closely work together or even merge.
The digitization of cultural heritage material, among which archival sources, is of great importance for historians and other humanities researchers. And also in this field we see the creation of large-scale infrastructures. Europeana enables people to explore the digital resources of Europe's museums, libraries, archives and audio-visual collections (www.europeana.eu). It promotes discovery and networking opportunities in a multilingual space where users can engage, share in and be inspired by the rich diversity of Europe's cultural and scientific heritage. The width of the endeavor is at the same time it’s limitation for researchers: although millions of heritage objects can be “explored”, the content and descriptions are oriented to the consumption by a general audience, not towards the analytical use of specialists. The European Holocaust Research Infrastructure, which is supported by DARIAH for solving the technological challenges of bringing together virtual resources from dispersed archives, is a good example for an infrastructure on the interface of heritage and historical research.
The intention of the organisers of the Lisbon workshop on Digital Methods and Tools for Historical Research is to discuss the implications of using digital technologies in the production and dissemination of knowledge in History.
Two of the implications I have highlighted is the increase of scale of digital history projects and the need for research infrastructures to sustain the results of digital projects. Multidisciplinary and international collaboration is inevitable for professional results. Computational history is in this sense comparable to (of simply part of) data driven e-Science.
This conclusion is independent from the type of methodology we look at: whether it is relational databases, geographic information systems, (text) encoding or digitization and preservation of digital memory. Such methodologies rarely stand alone in a digital project, and are rather phases in the cycle that many digital projects go through: after digitization comes the encoding (in textual sources) or the structuring in databases. Analysis is the next phase, for which GISes are very useful in the case the data has a geospatial component, which can be visualised. At the end of the cycle, proper measures need to be taken in order to keep the results accessible for the future.