Rachel Lott / en U of T researchers train AI to read difficult-to-decipher medieval texts /news/u-t-researchers-train-ai-read-difficult-decipher-medieval-texts <span class="field field--name-title field--type-string field--label-hidden">U of T researchers train AI to read difficult-to-decipher medieval texts</span> <div class="field field--name-field-featured-picture field--type-image field--label-hidden field__item"> <img loading="eager" srcset="/sites/default/files/styles/news_banner_370/public/iStock-1043878156-latin.jpg?h=afdc3185&amp;itok=EL-cemhD 370w, /sites/default/files/styles/news_banner_740/public/iStock-1043878156-latin.jpg?h=afdc3185&amp;itok=2swBubHD 740w, /sites/default/files/styles/news_banner_1110/public/iStock-1043878156-latin.jpg?h=afdc3185&amp;itok=gGQuLWex 1110w" sizes="(min-width:1200px) 1110px, (max-width: 1199px) 80vw, (max-width: 767px) 90vw, (max-width: 575px) 95vw" width="740" height="494" src="/sites/default/files/styles/news_banner_370/public/iStock-1043878156-latin.jpg?h=afdc3185&amp;itok=EL-cemhD" alt="&quot;&quot;"> </div> <span class="field field--name-uid field--type-entity-reference field--label-hidden"><span>Christopher.Sorensen</span></span> <span class="field field--name-created field--type-created field--label-hidden"><time datetime="2021-02-24T13:35:57-05:00" title="Wednesday, February 24, 2021 - 13:35" class="datetime">Wed, 02/24/2021 - 13:35</time> </span> <div class="clearfix text-formatted field field--name-field-cutline-long field--type-text-long field--label-above"> <div class="field__label">Cutline</div> <div class="field__item">Researchers at U of T and at University College, London are training software, called Transkribus, to read and transcribe hand-written Latin, which is often full of strange spellings, hyphenations and abbreviations (photo by fotographo/iStockPhoto)</div> </div> <div class="field field--name-field-author-reporters field--type-entity-reference field--label-hidden field__items"> <div class="field__item"><a href="/news/authors-reporters/rachel-lott" hreflang="en">Rachel Lott</a></div> </div> <div class="field field--name-field-topic field--type-entity-reference field--label-above"> <div class="field__label">Topic</div> <div class="field__item"><a href="/news/topics/our-community" hreflang="en">Our Community</a></div> </div> <div class="field field--name-field-story-tags field--type-entity-reference field--label-hidden field__items"> <div class="field__item"><a href="/news/tags/alumni" hreflang="en">Alumni</a></div> <div class="field__item"><a href="/news/tags/artificial-intelligence" hreflang="en">Artificial Intelligence</a></div> <div class="field__item"><a href="/news/tags/centre-medieval-studies" hreflang="en">Centre for Medieval Studies</a></div> <div class="field__item"><a href="/news/tags/computer-science" hreflang="en">Computer Science</a></div> <div class="field__item"><a href="/news/tags/faculty-arts-science" hreflang="en">Faculty of Arts &amp; Science</a></div> <div class="field__item"><a href="/news/tags/research-innovation" hreflang="en">Research &amp; Innovation</a></div> <div class="field__item"><a href="/news/tags/u-t-scarborough" hreflang="en">U of T Scarborough</a></div> </div> <div class="clearfix text-formatted field field--name-body field--type-text-with-summary field--label-hidden field__item"><p>In a move that could transform manuscript studies, University of Toronto researchers have partnered with a team in the United Kingdom to develop a program that can read and transcribe the handwritten Latin found in 13th-century legal manuscripts.</p> <p>While scholars have been making digital images of these manuscripts for years, transcribing and comparing these texts is painstaking and tedious work that can take years or even decades to complete. That's because medieval handwriting can often look crabbed and unintelligible, with non-standardized spellings, hyphenations, abbreviations, calligraphic flourishes and any number of distinct “hands.”</p> <p>But machine-reading software called Transkribus promises to change the field. Using artificial intelligence (AI), the software can theoretically be trained to read any type of handwriting, in any language&nbsp;– and&nbsp;<strong>Michael Gervers</strong>, a professor of medieval social and economic history at U of T Scarborough, says&nbsp;it could&nbsp;eventually be applied across medieval studies.</p> <p>“When – rather than if – the process is successful, it will make an enormous difference to the way medievalists approach their subject,” says Gervers,&nbsp;who is also cross-appointed to the&nbsp;Centre for Medieval Studies&nbsp;in the Faculty of Arts &amp; Science.</p> <p>Developed by&nbsp;READ COOP SCE, an international consortium of scholars, scientists and archivists, Transkribus not only digitizes manuscripts and transcribes their contents but “recognizes” idiosyncratic features across multiple manuscripts, thus enabling comparison. The software’s recent successes include the transcription of manuscripts from colonial Mexico, the Hanseatic League&nbsp;and early 20th-century Ireland.</p> <p>The software first came to Gervers’s attention back in 2016 when it was still getting off the ground. Gervers, who has worked with Latin manuscripts since the 1970s, put together a U of T team including&nbsp;<strong>Graeme Hirst</strong>, a&nbsp;professor in the department of computer science&nbsp;who works on natural language processing, and alumna&nbsp;<strong>Hannah Lloyd</strong>, now a PhD student in history at Yale University.</p> <p>They also joined forces with another team already working with Transkribus at University College, London (UCL). Scholars in UCL’s&nbsp;Bentham Project&nbsp;were teaching the software to read 18th-century philosopher Jeremy Bentham’s handwritten papers. By sharing resources for software development, the two teams trained Transkribus more quickly and efficiently.</p> <p>The teaching process wasn’t easy. Transkribus learns by “looking” at a sample page and comparing it line-by-line with a pre-prepared transcription. Lloyd spent hours selecting text to feed the software.</p> <p>The team ran into two major problems: hyphens and abbreviations. Medieval scribes often saved valuable parchment by abbreviating words&nbsp;– sometimes dramatically. They would also write up to the very border of the script area before arbitrarily hyphenating whatever word they were on when they ran out of space. Since Transkribus “reads” whole words rather than individual letters, it had to learn to recognize words even when abbreviated or hyphenated.</p> <p>Clearing those&nbsp;hurdles is now paying off. The new Latin-reading Transkribus is capable of precisely transcribing the peculiar handwriting found in 13th-century Latin legal documents.</p> <p>Though the program is currently trained for Latin legal texts, it’s only a matter of time before it can be&nbsp;adapted to literary texts and more.</p> <p>Gervers notes that Transkribus would be an ideal program for Ge’ez, an Ethiopic script he has worked with alongside Latin since the 1990s. Largely unchanged over its 2,000-year history, the Ge’ez script was used in one of the earliest known complete Gospel manuscripts and is still used in Ethiopia today.</p> <p>Gervers says the script is “perfect for machine transcription.” Why? Ge’ez has no abbreviations and conveniently puts colons at the ends of words and sentences.</p> </div> <div class="field field--name-field-news-home-page-banner field--type-boolean field--label-above"> <div class="field__label">News home page banner</div> <div class="field__item">Off</div> </div> Wed, 24 Feb 2021 18:35:57 +0000 Christopher.Sorensen 168428 at