This is the story of trying to fix one problem, in one headword entry
in the thesaurus corpus.
Having gotten my 1911 Roget’s out of storage, for use as a backstop in
making determinations about errors reported by
rlint, I decided it
was time to start work. The first entry flagged was:
Copy : tests ✘ : Invalid char '-' in sense 3 (N sense 3) term "transcript",
attr "copy into a non-visual form"
The message makes it clear what the technical issue here is:
currently, attributes/notations on terms are not allowed to contain
hyphens. That might need to change, but for now I’m being very
conservative in what characters are allowed to appear where. At this
early state I’d rather generate false positives than miss problems.
But something else jumps out at me: “copy into a non-visual form”
doesn’t sound like wording from the 1911 Roget’s. Another moment’s
thought leads me to realize, in quick succession, that:
- “Transcript” is a noun, not a verb, therefore
- Even if this is valid, it should be attached to “transcribe”
- Except: to “copy into a non-visual form” would be the opposite of transcription
- The Latin roots of transcribe mean “across write”
- In English the word’s definition is “to make a copy in writing”
- If anything deserves this notation, it’s something like “narration” or “audiobook”
Checking the 1911 shows that this notation does not exist there. Just
to be sure, I also check my 1941 edition – which I do not use as a
source, because it is still in copyright. I use it as a “second
opinion” because it’s far more similar to the 1911 than modern
editions are. The notation isn’t there either; it is a Guternberg
And just like that, the entire entry is now suspect. I’m going to
present to you the 1911 and PG entries, in their entirety. First, the
21. [Result of imitation.] Copy.
N. copy, facsimile, counterpart, effigies, effigy, form, likeness, similitude,
semblance, cast, tracing, ectype; imitation &c. 19; model, representation,
adumbration, study; portrait &c. (representment) 554; resemblance.
duplicate; transcript, transcription; reflex, reflexion; shadow, echo;
chip of the old block; reprint, reproduction; second edition &c. (repetition) 104;
rechauffe, apograph, fair copy, revise.
parody, cariacature, burlesque, travesty, travestie, paraphrase.
servile, servile copy, servile imitation; counterfiet &c. (deception) 545;
Adj. faithful; lifelike &c. (similar) 17; close, conscientious.
Here’s the PG:
DESC::Result of imitation
N. copy, facsimile, counterpart, effigies, effigy, form, likeness.
image, picture, photo, xerox, similitude, semblance, ectype^, photo offset,
electrotype; imitation &c 19; model, representation, adumbration, study;
portrait &c (representation) 554; resemblance.
duplicate, reproduction; cast, tracing; reflex, reflexion [Brit.], reflection;
transcript [copy into a non-visual form], transcription; recording, scan.
chip off the old block; reprint, new printing; rechauffe [Fr.]; apograph^,
parody, caricature, burlesque, travesty, travestie^, paraphrase.
[copy with some differences] derivative, derivation, modification, expansion,
extension, revision; second edition &c (repetition) 104.
servile copy, servile imitation; plagiarism, counterfeit, fake &c (deception) 545;
Adj. faithful; lifelike &c (similar) 17; close, conscientious.
unoriginal, imitative, derivative.
In addition to the issue I talked about earlier, there are also these:
- Haphazard additions of terms in PG; some worthwhile, some
- A general breaking-up of the entry in PG, turning what had been
subsenses into full-fledged senses
- “Effigies” is not an English plural here; in the 1911 it is
italicized, marking it as a (probably French) word, presumably
- Bizarrely, PG did flag rechauffe as French – and this is a
delightful usage, with a denotation of “reheated leftovers”: a copy
in the sense of the English idioms “warmed over” or “rehashed”
- Also, it should be réchauffé, but adding unicode everywhere it’s
needed is a whole other nightmare
- “Pasticcio” isn’t flagged as archaic in the original; it is also
italicized, and it is an opera term meaning “pastiche”, with an
added overtone of plagiarism
- It should be flagged as Italian, and “pastiche” should be added
- “Travestie” is French, not archaic as PG flags it, and I think it
should simply be removed. Roget was fond of French cognates, in case
you hadn’t noticed.
- “Apograph” isn’t marked as archaic anywhere, but it definitely should be
- Also, it means “transcription”; Roget has it in that sense. PG
has split the sense at “chip of the old block” but not moved
apograph, orphaning it from the English word that shares its
- If, like me, you were wondering what “revise” is doing there, it
turns out to have a noun sense via printing jargon: a proof which
includes corrections from an earlier proof
- That should have a notation/attribute set on it
- Why aren’t there verb senses listed for either?
And I’m sure there are others that I didn’t notice. What did you find?
This is a great illustration of why this work feels absolutely
overwhelming at times. There’s just so much when you’re trying to be
careful and do the right thing.