Friday, 25 September 2009

Transcribing Eighteenth-Century Texts

A colleague at Monash asked me if I knew of a commercial transcription service that could transcribe a 450-page early eighteenth-century English text into a Word document. The text would serve as the basis for an edited version of the text. It would be compared, word for word, punctuation mark for punctuation mark, a number of times over.

This sort of textual editing is tedious, but it would be even more tedious to type the whole thing first, and still have this job ahead of you. Transcription is also very, very time-consuming for all but the fastest and most accurate typists. Which is why she wanted to pay someone else to do it. Unfortunately, the conditions of the grant she has received rule out paying a research assistant to do the job; but she can pay a business to do it.

I should explain, that the text had never been reprinted, and no version of the text was available in digital form. So, not on Google Books, not on ECCO, no substantial excerpts in recent editions. Zip. She was going to have to create the text ex nihilo.

When she first discussed this problem with me I suggested that, if the text were clear, she could scan the text and run OCR software over it herself. She would have a lot of f/s substitutions to make, but you can get a fairly useable text this way. Nowhere near as good as a transcription service, but usable. But she would have to spend quite a while getting nice clean copies and then either scanning them all herself or send these to a bureau to be scanned and OCR'ed.

I also suggested voice-recognition software, which I use all the time for transcribing chunks of eighteenth-century texts. You get better results than with OCR, but still, a lot of copy-editing is required.

We also discussed just sending it to a secretarial service for typing. You might end up with a lot of f/s substitutions, but, as I said, that really isn't too difficult to do. But once you are sending it out to be done you start to think of commercial transcription services who have handled eighteenth-century texts before.

I had heard of a number of large projects that have sent material to be transcribed in India or in Asia: I am pretty sure that the British Library Catalogue was transcribed in Asia (with predictable results). And I figured, other people must have wanted a commercial transcription service before, for the same reason. But, looking online, the only transcription services my colleague could find were for legal and medical records. What she wanted was someone with experience handling eighteenth-century texts.

So I sent a query to the 18C-List. The answers I got, on and off-list, were (1) offers by individuals to undertake the transcription, (2) suggestions that the she didn't need a transcription (that the text might already be available in some form), (3) suggestions that her publisher might do the transcription for her and (4) details of a transcription service. (There was also a reply that contained an attack of Obama's health insurance legislation (!?!), which implied the question was off-topic, or warning that the answer was. I am not sure which.)

I gathered from the answers I got that most editors transcribe their own texts, even very lengthy ones. It is seen as part of the job. The textual editor might take over responsibility (in a larger project) but the volume editor, the editor of an individual text in a series or multi-volume set, actually does the typing. It is called keystroking. Which suggests something much more pleasurable than the RSI-inducing activity that is transcribing.

The next most common arrangement—no. 3 above—is that the press does it in-house. I know Pickering & Chatto do this, they are transcribing the texts of 2000 bawdy songbooks right now for a collection I am editing with Paul Watt. But apparently it is not just Pickering & Chatto. It seems some of the larger presses, like Cambridge, farm out this transcription, but I couldn't find out who they were sending their texts to.

This brings us to no. 4 above: one scholar with experience in this area told me that,

We outsource practically all our transcription to India. The only transcription we tend to do ourselves is unique manuscript material. The companies we use are Planman Technologies (good, but slightly uneven) and Acogent.

How this works is that we sign a contract with the companies for a certain volume of transcription work (this gives us a bulk discount), and then we fill this using our own internal budget to cover the costs. We either ship books physically via courier to India, or, if they’re fragile or rare, scan them here and send the page scans to India via FTP server.

What we do is somewhat different from what your colleague wants, in that we do bulk digitization and we get our transcription encoded in TEI XML rather than as a Word document.


A New Zealand scholar also suggested that it would be worth getting in touch with digital humanities centres that deal with early modern and eighteenth-century works (such as The Centre for Computing in the Humanities at King's College London and the Electronic Textual Cultures Lab at the University of Victoria) to see who they outsource their transcription to.

Both of these answers suggested what is now obvious.

Thinking of the texts as special (they are almost three hundred years old!) is misleading. Transcribing eighteenth-century texts is just data-entry and data management: "Acogent provides data entry and data management services …"

Also, thinking of single-author, single-volume, painstakingly edited texts is also misleading (i.e., it leads you in the wrong direction). The digital world is full of text collections created in digital humanities centres, all of whom outsource transcription. (The New Zealand Electronic Text Centre, to take an example that never occurred to me, contains the text of The Travels of Hildebrand Bowman, Esquire (1778). Texts like this are all over the net). So, asking a bunch of scholars who prepare one-off, minutely collated texts, is really unlikely to elicit a useful answer. I am just lucky—and so is my colleague—that 'online, everyone hears you scream' (for help).

3 comments:

Lilith said...

Hi Patrick,
This is not really a comment, but I can't find a way to contact you by email so I will write my question here.
I am starting to work on a research project on an English translation of a Spanish book, published in 1705. I did the transcription myself, and fortunately it contains only 144 pages, and reduced to 35 typed. I have been searching for guidelines on editing 18c texts, such as the italics and capital letters, etc. I am the only one on the project working on the English translations. I would appreciate very much if you could pass me some references about this topic. I was working on 16c Spanish literature before, and this is something new for me. Thank you so much.

Lilith

Patrick Spedding said...

Lilith, You can contact me via the email address I provide here

http://patrickspedding.blogspot.com/2009/06/about-dr-patrick-spedding.html

As for your question: I favour minimal change, but how you edit depends on your audience.

The minimum requirement is transcribing all long-esses into the modern short form and change running quotation marks into modern forms.

Beyond that, you can regularise Proper Nouns (or all words); or modernise Proper Nouns (or all words).

For a student edition you might be able to justify moderninsing all punctuation, forms of emphasis (i.e., change all italic, small caps, bold) etc.

Almost every Oxford or Penguin paperback of an 18C text will tell you that this is what they have done (or that this is all that they have done).

A good introduction is Erick Kelemen, Textual Editing and Criticism: An Introduction (2008): but it really isn't rocket science and I am not sure you'll need this unless you are planning a full critical/diplomatic edition.

Good luck!

PS

Lilith said...

Thanks a lot, Patrick. That is very helpful information for me. I like your blog very much, I will be checking on it!
Lilith

p.s. I checked on the page that you gave me, but the email address didn't appear.