Transcribing a Russian passport from 1864 with an AI

In a folder of family papers my father owns, there is a folded sheet of heavy paper with a Russian imperial eagle at the top, a pressed seal at the bottom, and writing I cannot read. The print is in pre-1918 Cyrillic. The handwritten entries are in chancery script. The whole thing is a long courtly announcement from Tsar Alexander II that he grants safe passage to a thirty-one-year-old subject of Prussia named Pinkus Julius Mieses.

Russian Passport of Julius Pinkus Mieses from 1864

Pinkus is my great-great-great-grandfather. He is also the father of the chess grandmaster Jacques Mieses, the subject of the biography I have been writing. The passport is dated 31 August 1864. Jacques was born in Leipzig the following February. The family had the document on the shelf for years and never managed to do anything with it, because every time I looked I bounced off the same wall. I cannot read a word.

In an earlier post I wrote about why I am writing this biography at all, where the family papers came from, and how a half-forgotten website from the 1990s grew into a book project. This is the practical follow-up. The papers exist. I cannot read most of them. A few months ago I started trying again, this time with help from a language model.

A naive first attempt

I started small. I uploaded the photograph to Claude Sonnet 4.5 with what amounted to “please transcribe this document”. I did not expect much. The combination is rough on paper. Chancery Cyrillic of the 1860s, a printed bureaucratic template with handwritten entries inside it, a second sheet of German Fraktur and old-style script translation, a wax seal.

The result was already surprising. The model gave me a clean transcription of the printed Russian text, identified the document as a foreign-travel passport, named Tsar Alexander II as the issuing sovereign, and correctly read the name Mieses across both Cyrillic and Latin scripts. There were rough patches, but the essence was there. For the first time in years I knew, at least roughly, what was on the page.

That alone would have been worth a blog post. But what really changed things was the next step.

The prompt did more than the model

I rewrote the prompt. The new version is long and structured, modelled on German academic editing conventions, specifically the DFG recommendations for editions of early modern and modern texts. It asks for two layers of transcription, not one. The first is a strict diplomatic transcription that preserves original orthography, line breaks, abbreviations, and damage. The second is a normalised reading version with modern spelling and resolved abbreviations. It demands that any uncertain reading be marked, that proper names be flagged with brackets when not certain, that numerals be read with extra caution because of the well-known confusion between Kurrent digits, that stamps and seals be described separately from the running text, and that the final output include a source-critical commentary and a list of open questions.

I ran the same document, with this prompt, through Claude Opus 4.7. The output was not just better. It was qualitatively different. The earlier attempt had answered confidently. This one questioned itself.

The prompt is in German, because the family archive, the secondary literature, and the academic conventions it draws on are all German. I have left it as the actual working tool, not a translated version made for this post. For readers who do not read German, the short summary is this: it asks the model to produce two parallel layers of transcription, a strict diplomatic one that preserves every quirk of spelling, line breaks, abbreviations, and damage, and a normalised reading version with modern conventions. It demands that uncertain readings be flagged with brackets, that proper names get extra caution, that numerals be read carefully because of the ambiguity of chancery digits, that stamps and seals be described separately, and that the final output include a source-critical commentary and a list of open questions. The non-negotiable principle is at the end: never invent text to fill gaps. Mark them as gaps.

Show the full prompt (German, ca. 1800 words)

# Aufgabe: Transkription historischer Dokumente aus dem Familienarchiv
 
Transkribiere das in der nächsten Nachricht als Bild hochgeladene Dokument aus dem Familienarchiv Geppert/Mieses (siehe Inventar `inventar_familienarchiv.md` im Projektwissen). Liefere eine wissenschaftlich saubere, archivisch korrekte Transkription, die als zitierfähige Quelle für Archive Einträge und Buchbiografie dienen kann.
 
## Vorbereitung
 
1. Identifiziere zunächst das Dokument anhand der FA-ID, des Datums und des Inhalts. Gleiche mit dem Inventar im Projektwissen ab und übernimm die dort vermerkten Klärungsfragen als Prüfraster.
2. Bestimme: Schriftart (Kurrent, Sütterlin, Druckfraktur, lateinische Schreibschrift, Kyrillisch, Hebräisch), Sprache(n), Erhaltungszustand, Beschreibstoff (Papier, Pergament, Karton) und Schreibmittel (Tinte, Bleistift, Druck mit handschriftlichen Einträgen).
3. Wenn ein Dokument zwei oder mehr Sprachen oder Schriftarten kombiniert, etwa russischer Reisepass mit deutscher Übersetzung, behandle jeden Abschnitt separat in der Transkription und kennzeichne den Wechsel.
 
## Methode (nach DFG-Empfehlungen zur Edition neuzeitlicher Texte)
 
### Diplomatische Transkription
 
Die Grundlage jeder Bearbeitung ist die buchstabengetreue Wiedergabe:
 
- Originale Schreibweise und Orthographie beibehalten ("Theil" nicht "Teil", "ß" oder "ss" wie im Original geschrieben, "tz" oder "z" wie vorgefunden)
- Groß- und Kleinschreibung des Originals übernehmen
- Originale Zeichensetzung beibehalten
- Originale Zeilenstruktur beibehalten (Zeilenumbruch wie im Original, durch Slash `/` markiert oder durch tatsächlichen Umbruch)
- Seitenwechsel mit `[fol. 1v]`, `[S. 2]` oder `[Rückseite]` markieren
- Streichungen mit Markdown-Durchstreichformatierung: `~~gestrichener Text~~`
- Einfügungen über der Zeile mit umgekehrten Schrägstrichen: `\eingefügt/`
- Unsichere Lesungen mit `[?]` direkt nach dem Wort
- Alternative Lesungen mit `[wort1|wort2]`
- Unleserliche Stellen: `[unleserlich]` oder `[...]`, mit Längenangabe bei längeren Passagen: `[unleserlich, ca. 3 Wörter]`
- Abkürzungen unaufgelöst lassen, Auflösung in eckigen Klammern dahinter: `u.s.w.[und so weiter]` oder `S[einer] M[ajestät]`
- Beschädigungen am Rand: `[Riss]`, `[Loch]`, `[Wasserschaden]`
- Stempel und Siegel separat beschreiben unter eigenem Abschnitt, nicht als Fließtext
 
### Lesefassung
 
Zusätzlich zur diplomatischen Transkription wird eine modernisierte Lesefassung erstellt. Sie soll lesbar sein, ohne Sinn oder Aussagestruktur des Originals zu verändern:
 
- Heutige Rechtschreibung
- Heutige Zeichensetzung
- Abkürzungen still aufgelöst
- Zeilenumbrüche normalisiert (Absätze nach Sinn)
- Erläuternde Zusätze in eckigen Klammern: `[gemeint ist...]`
- Behutsam vorgehen: nicht Sinn unterlegen, nur Lesbarkeit herstellen
 
### Bei fremdsprachigen Dokumenten
 
- Originalsprache transkribieren (Russisch in Kyrillisch, Hebräisch in hebräischer Schrift)
- Wissenschaftliche Transliteration nach DIN/ISO-Norm (Russisch: ISO 9, Hebräisch: DIN 31636 / ISO 259)
- Deutsche Übersetzung darunter, möglichst wörtlich, nicht frei
 
## Output-Format
 
Liefere die Transkription als Markdown-Datei mit folgendem festen Aufbau:
 
[... Output-Schema mit Quellenmetadaten, diplomatischer Transkription, Lesefassung,
     Übersetzung, quellenkritischem Kommentar, Forschungsrelevanz und offenen Fragen ...]
 
## Quellenkritische Grundprinzipien
 
Diese Prinzipien sind nicht verhandelbar:
 
**Niemals Lücken füllen mit erfundenem Text.** Bei unleserlichen Stellen ehrlich `[unleserlich]` setzen. Lieber eine unvollständige Transkription als eine erfundene Vollständigkeit.
 
**Niemals Daten oder Namen ergänzen, die nicht im Original lesbar sind.** Wenn ein Datum nur "den 14ten dies" sagt, nicht aus anderem Wissen den Monat ergänzen, sondern den Wortlaut belassen und im Kommentar erklären.
 
**Bei zweifelhafter Lesung lieber zwei Varianten anbieten** als eine unsichere als sicher darstellen. Eckige Klammern mit Pipe sind dafür das richtige Werkzeug.
 
**Eigennamen besonders sorgfältig prüfen.** Mieses, Händler, Coelestine, Brody, Pinkus, Selma, Geppert, Redler sind im Familienkreis bekannt; bei unbekannten Personennamen lieber `[?]` setzen.
 
**Im Zweifel Vorsicht bei Zahlen.** Ziffern in Kurrent- oder Kanzleischrift können leicht verwechselt werden (1 mit 7, 0 mit 6, 3 mit 5). Bei zweifelhaften Datierungen entweder beide Lesungen anbieten oder die Unsicherheit im Kommentar markieren.
 
## Hinweise zu spezifischen Sprachen und Schriften
 
[... ausführliche Hinweise zu Kurrentschrift, lateinischer Schreibschrift,
     Russisch in vor-revolutionärer Orthographie und Hebräisch ...]

Four moments stood out.

Four things that surprised me

The model spotted a typo in the imperial title. The printed Russian text gives Alexander II as “САМОДЕРЖЦА ВЕРОССІЙСКАГО”. The correct form would be “ВСЕРОССІЙСКАГО”, “of all Russia”. The printer in 1864 had dropped a letter. The model noticed, flagged it with a “sic; recte” marker the way an editor of historical texts would. Neither the original nor the contemporary German translation comment on it. It is the kind of observation easy to skim past, even for someone who reads Russian.

Update, 2026-05-24: A native Russian speaker, pointed out in the LinkedIn comments that the missing letter in “ВЕРОССІЙСКАГО” is most likely a photographic artefact, not a printer’s typo. A fold in the paper runs exactly through the spot where the С should be. So the model did not catch a real editorial issue, it confidently flagged a non-issue. Which is, on reflection, the more interesting finding: the system was wrong, and I was wrong to be impressed.

It identified the issuing official from his order list. The signature on the document is illegible. The printed Cyrillic small text below it contains the full title and orders of the issuer. From this, the model identified him as Mikhail Alexandrovich Ofrosimov (1797 to 1868), General of Infantry, Moscow Military Governor-General, in office 1864 to 1865. It noted that the foreign orders listed in the title, the Prussian Red Eagle 2nd class with star and with brilliant star, fit his documented diplomatic career. The silver and bronze campaign medals match his service in the Russo-Turkish War of 1828/29 and the Crimean War of 1853 to 1856. That is more biographical synthesis than any text-recognition system has any business doing.

It read the document into its historical setting. The passport is dated 31 August 1864 in the Julian calendar, 12 September in the Gregorian. The model converted this correctly with the standard twelve-day offset of the nineteenth century. Then it did something more interesting. It noted that 31 August falls at the close of the Nizhny Novgorod fair, the great fur and wool market of the Russian empire, traditionally held from mid-July through late August. The family chronicle from 1909, written by Pinkus himself in old age, mentions that he travelled to Nizhny Novgorod that year to buy furs and horsehair. The model put two and two together and reframed the document. This is not a migration passport. It is a return-trip passport from a trade journey. The inventory entry in my own archive, which described it as “a letter of safe conduct for the migration to Germany”, was wrong. The model pointed this out and proposed a corrected entry.

It marked its own uncertainty. The handwritten passport number could be either 128 or 628. The model said so. The German entries for forehead, mouth and chin are partly illegible. The model said so. The cursive descriptor for the eyebrows might be “braun” or a misspelling for “barun”. The model said so. The signature at the bottom of the Russian sheet is illegible, the model said so but proposed Ofrosimov as a plausible reading and explained why. It never invented text. Where I expected hallucinations, I found a list of explicit doubts.

This is the part I had not predicted. The earlier, more naive attempt with the simpler prompt had answered confidently. With the structured prompt and the larger model, the system started behaving like a careful colleague who would rather admit “I cannot read this” than make something up.

What the output looks like

To make this concrete, here is one passage from the passport, the central legal sentence that names Pinkus and describes his journey, in the three layers the prompt asks for. First the diplomatic transcription in the original pre-1918 Cyrillic, then a scientific transliteration following ISO 9, then a German translation that stays close to the wording rather than smoothing it. The model produced all three from the photograph alone.

# Diplomatic transcription (pre-revolutionary Cyrillic)

Объявляется чрезъ сіе всѣмъ и каждому, кому о томъ вѣдать
надлежитъ, что показатель сего Прусскій подданный купецъ
Пинкусъ Юлій Мизесъ отправляется чрезъ С. Петербургъ
за границу.

# Reading version (ISO 9 transliteration)

Ob"javljaetsja čerez" sie vsěm" i každomu, komu o tom" vědat'
nadležit", čto pokazatel' sego Prusskij poddannyj kupec"
Pinkus" Julij Mizes" otpravljaetsja čerez" S. Peterburg"
za granicu.

# German translation

Es wird hiermit allen und jedem, dem davon Kenntnis zu geben ist,
bekanntgegeben, dass der Vorzeiger dieses, der preußische
Untertan, der Kaufmann Pinkus Julij Mieses, sich über
St. Petersburg ins Ausland begibt.

In English: this is the passport’s core declaration. The model reads the chancery Cyrillic, preserves the old orthography with its hard signs at the end of every word, produces a clean transliteration, and then translates the legal formula into German that still sounds like a 19th-century document rather than a modern paraphrase. Four biographical facts come out of this single sentence: the holder’s name, his status as a Prussian subject, his occupation as a merchant, and his planned route via St. Petersburg.

The full output runs to roughly nine pages and includes source metadata, line-by-line diplomatic transcription of both the Russian sheet and the German translation, the modernised reading version, source-critical commentary, an assessment of biographical relevance, and a list of open research questions. The sample above is one paragraph out of that whole. The rest is, if anything, denser in detail.

Where it still falls short

The German translation, written in mid-nineteenth-century handwriting on a thin paper that has aged badly, defeated the model in places. Three of the physical-description fields are unreadable. So is part of the issuing officer’s title in the running text. The two pass numbers, one Russian, one a translation registry number 2949, remain ambiguous. The route of the holder across the Prussian-Russian border, the kind of detail that might be stamped on the back of the document, is not visible in the photograph I have. The model raised all of this as open questions. None of it is solvable without going back to the physical original under better light, or back to the archive of the Moscow Governor-General’s office, which the model also correctly identified by name.

This is the part where the AI is not yet useful. It tells you where the gap is, but the gap remains a gap.

One concrete favour

The transcription above has not been peer-reviewed by anyone who reads pre-revolutionary Russian. I want it to be. If you read pre-revolutionary Cyrillic, Kurrent script, or mid-nineteenth-century German chancery hand, and you would like to look over what came out, I would be genuinely grateful. The points the model flagged as uncertain are the obvious places where a second pair of eyes would help, but anything else you notice is welcome too. You can find me through the contact page.

The book is better when more hands have touched it.

What it changes

I am not a historian by training. I am a software engineer who happens to be writing a biography. Before I started using these tools, my own access to documents like this one was bounded by what I could read with effort, which is to say printed German, basic Latin, and fragments of French. Anything in Kurrent, Cyrillic, or any nineteenth-century chancery hand was effectively closed material. I either skipped it, paid someone, or wrote around it.

That is no longer the case. With one well-written prompt and one capable model, an evening’s work now produces something that earlier would have required a paid specialist and a six-week turnaround. The model does not replace the specialist. The transcription above still needs validation, as I just said. But the model gets me to a point where I can ask a specialist the right questions, and where the cost of asking has dropped from feeling like the next major decision in the project to feeling like a quick check.

For a biography project that touches Galician birth records, Leipzig trade registers, Hampstead correspondence, Argentine letters, and other old and handwritten documents, that change is hard to overstate.

Julius Pinkus Mieses came home from Russia in the autumn of 1864. His son Jacques was born in February 1865. The chess world has known about Jacques for over a century. About his father, almost nothing has been written. Documents like this passport are part of what slowly changes that.

How AI Read My Ancestor’s 1864 Russian Passport

Tagged on: biography ClaudeAI mieses

Johannes Geppert

How AI Read My Ancestor’s 1864 Russian Passport

A naive first attempt

The prompt did more than the model

Four things that surprised me

What the output looks like

Where it still falls short

One concrete favour

What it changes

Like this:

Leave a ReplyCancel reply

A naive first attempt

The prompt did more than the model

Four things that surprised me

What the output looks like

Where it still falls short

One concrete favour

What it changes

Share:

Like this:

Leave a ReplyCancel reply