eScriptorium for Handwritten Text Recognition in Humanities Research – Transcript of the Discussion

During the webinar on January 26, Peter Stokes gave a demontration of eScriptorium, which was followed by a lively discussion with the participants from all over Europe. The discussion was hosted by Monika Barget (IEG Mainz). Questions were answered by Peter Stokes and his colleagues Daniel Stoekl Ben Ezra and Robin Tissot.

Monika: Thanks for your talk, Peter! You are all welcome to put your name into the chat box to ask questions about eScriptorium. The first question comes from Thomas.

Thomas: Are models trained for Transkribus and Kraken compatible? If so, that’s great! But if not, why not? Is the OCR community collaboratively working on transcriptions training models?

Peter: Unfortunately, models cannot be exported from Transkribus at present, which is why they cannot be compared or adapted to Kraken models. In terms of standards, there is some on-going work for Hebrew. However, collaboration is very difficult because transcriptions are provided for various manuscripts and editions. When working with hand-written texts, there isn’t one single transcription that serves all purposes. It would be useful to do more work in this field, but it is a long way to go.

Thomas: Thank you very much for your answer. Great presentation, I am deeply impressed by the outcome and the openness of the programme!

Krzysztof: I’m testing eScriptorium as a Docker image and have two questions: 1. Do you consider it mature enough to be used as an annotation framework? 2. If a kraken-compatible model doesn’t produce good results, is it better to retrain it on one’s data or create a custom model from scratch?

Peter: Yes, it is mature enough to be used as an annotation framework to train the system. Because you can export it…

The second question is harder to answer….

Monika: One topic that colleagues in early modern and modern history might be especially interested in is if eScriptorium can also be used for printed sources. What would be the challenge?

Peter: It can certainly be used for printed books, and at least in theory, it should be easier.

Daniel: The main challenge when working with print is quantity, that is processing large numbers of images.

Katrien: How can you organize quality control? Do you have to do it manually?

Peter: The question for you to ask would be how well a particular model would work for your particular use case…. Have students check transcriptions… identify output that needs to be checked… This is something we are working on. The interface not complete yet. One way to improve quality control would be to include students or resort to crowd sourcing. Another option is to work through our API. I am not sure if Daniel is still here to respond, but he is one of the people who have been working on that and could say more about it.

Monika: Daniel would you like to jump in and say something about the quality control?

Daniel: Quality control is only implicit at the moment. I can only do it for a page for which I have already got a transcription. 

Alessandro: Apart from IIIFs, is it possible to import files directly from online storage like Seafile or other cloud services? Or do I have to download the image files I want to transcribe first? I tried to give eScriptorium a Seafile link and couldn’t make it work.

Peter: The simple answer is that you would have to download. With the API, you could create a connection to your particular database as Daniel has pointed out in the chat. Or your database might allow standardized exports. Importing .jpg or .png images via Seafile links would not work, however, because the link import to the system is designed for IIIF. That includes certain file credits. You would have to design a more complex system to handle the import from standard file sharing systems.

Daniel: If you know how to handle an API, almost anything can be managed automatically.

Markus: You said that the electricity cost for running eScriptorium on a server is very high, but what would the necessary hardware cost, e.g. for a small team of 3 or 4 users?

Peter: We should ask Daniel because he has experience with the Vietnamica research group.

Daniel: I am running it on my computer…. It costs 3000 Euros to have sufficient GPU. I alone have about 30000 images on my machine. The electricity cost would, of course, be a lot lower if you are running it locally.

Markus: Thanks! And as Thomas says in the chat, you could also lease the equipment and do not need to invest in buying it.

Thomas: I am referring to the mission statement of RESILIENCE “to provide members access to eSriptorium on powerful computers” … which is very nice.

Peter: We have a couple of trial slots but can, of course, not provide full services to everyone at the moment.

Sara: Is there already a bibliography on eScriptorium available?

Peter: We provide resources on eScriptorium that people may want to look at before it becomes part of the RESILIENCE suite of research services, for example on the “escripta” Blog: escripta.hypotheses.org (incl. the eScriptorium team)

For more information, see:

kraken.re (by Ben Kiessling)

gitlab.inria.fr/scripta/escriptorium

github.com/mittagessen/kraken

github.com/mittagessen/kraken-models and zenodo.org/communities/ocr_models/

Monika: You also wrote a nice blog post with short embedded videos: https://www.resilience-ri.eu/blog/resilience-tool-escriptorium/

Peter: Yes, and we have been preparing more video tutorials.

Krystof: One more question: is there currently a way to grant users with different roles in the project [e.g. admins, annotators or “mere” transcribers] custom permissions?

Peter: At the moment, it is all on a very basic level… we might expand this at a later stage and we certainly have this in mind. Ties in with quality control.

Robin: Permissions are very basic right now, only training and inviting are restricted.

Peter: Yes, we cannot have too many people train at the same time. Something we are concerned about… if someone ingests a huge file… everyone else is stuck. So we are carefully monitoring performance.

Monika: How can people proceed if they want to use eScriptorium themselves? How can they learn more?

Peter: eScriptorium will be an integral part of RESILIENCE and we aim to give assistance as far as we can. However, we are not able to provide access to all interested parties at the moment. If people are interested in helping with training models for particular languages etc., they are welcome to contact us.

Monika: Peter’s webinar and all the links mentioned here will also be made available on the RESILIENCE website. There are no more questions, so I will hand over to Thorsten for some final remarks.

Thorsten: I thank Peter for his inspiring talk and all of you for enriching the discussion with your questions. Have a good day.