Xerox Scanners / Photocopiers Randomly Alter Numbers

  • Fonts in Use
  • News
Fonts in Use, News, Oops! | Yves Peters | August 5, 2013

Thanks to a tip from Jörg Haubrichs, software developer at FontShop International, I read an insane article this weekend that frankly blew my typographic mind. Last Wednesday German computer scientist David Kriesel did a bizarre discovery. After scanning a construction plan on a Xerox Workcentre and printing it, he noticed the perfectly reproduced plan suddenly contained incorrect numbers. And he only found out the mistake because the copy of the construction plan told him one room was about 22 square meters large, whereas the adjacent – visibly larger – room was labeled only 14 square meters. The Xerox Workcentre somehow changed the numbers whilst scanning. At first glance the scanned images looked perfectly fine, but only upon closer inspection they found some numbers on those images turned out to be incorrect. The implications of these baffling substitutions are far more serious and far-ranging than one would suspect, because the issue trumps the expectations of the user – copies that are supposed to be identical are far from it.

On his website David Kriesel explains at length and in detail how he stumbled upon the anomalies in Xerox Workcentre 7535 and 7556 machines, and analyses what causes the problem – a compression algorithm randomly replaces patches of pixel data in an almost unnoticeable way. Apparently Xerox machines use JBIG2 for compression, an algorithm that creates a dictionary of image patches it considers similar. As long as the error generated by these patches is not too high, the machine reuses them instead of using the original image data. This also would explain why the error occurs when letters or numbers are scanned in moderate resolution, yet still readable for the human eye. When the letter size is close to the patch size of JBIG2, complete letters and even blocks of letters that look similar to the machine are replaced.


A cost overview scanned on the WorkCentre 7535. At first glance the copy looks correct, but when you realise values in such tables are usually sorted in ascending order you notice the wrong numbers. This is not a simple pixel error either, because you can clearly see the characteristic dent on the left side of the 8 in contrast with the smooth curve on the 6.
Image courtesy of David Kriesel’s website

Why is this issue so crucial? First of all, these are widespread machines, commonly used in service centres and copy shops, and Xerox seemed to be unaware of the issue until David Kriesel notified them last Wednesday. Second, the error existed in a very old version of the software that was installed on the machine, and had not been solved in the most recent software update. If you relied on Xerox Workcentres for your copies you can only wonder how many incorrect documents – even though they looked correct – you produced these past few years. Did you pass them on to others? What dangers do errors in the numbers on those documents represent? Can you be sued for such errors? Indeed, you have to appreciate the issue goes well beyond merely financial problems created by swapped numbers on invoices, accounting spreadsheets and other financial documents. Numbers on documents can also have life-or-death importance which cannot be underestimated. Imagine copies of construction plans for a building or a bridge where the numbers have been altered. In a worst-case scenario architectural structures like these could collapse and claim victims. The same goes for medication, where incorrect doses could lead to serious consequences for the patient’s health, even causing one’s death.


As a test David Kriesel printed series of numbers in 7pt Arial, scanned them, OCRed them and compared them to the original ones. Observe how the sixes around the false eights look correct. Here too the false eights contain the characteristic dent, meaning complete image patches were replaced again.
Image courtesy of David Kriesel’s website

Part of the problem also lies in the widespread use of Helvetica and its clone Arial as default typefaces in digital documents. This ties in with the ongoing criticism they receive, most recently for the persistent use of Helvetica as the system font in iOS. Even though the resolution on David’s construction plan is not very fine, the numbers are perfectly distinguishable. That is, for the human eye, because we can interpret character forms and detect subtle differences in them far better than machines – that’s why captchas exist. The Xerox WorkCentre however has more difficulties reading the closed forms of the neo-grotesque numbers, which causes it to mix up 6, 8 and 9 for example. This once again suggests that – contrary to the misguided belief of hordes of turtle-necked goatee-stroking hipsters – Helvetica is a pretty crappy typeface, especially for text, and so is Arial by association. And sorry, my dear architect friends – Eurostile is even worse.

To figure out (pardon the pun) what clear number shapes should look like, all you need to do is look at typefaces specifically designed to be used in applications where the numbers are of utmost importance. One of the seminal designs in that respect is Chauncey H. Griffith’s Bell Gothic, first used in Manhattan’s Fall 1937 phone directory. Notice how different those same three numbers are, and how they can be easily told apart by both man and machine.

And now that we’ve broached the subject of numbers that need to be read correctly by machines, the yardstick here is Adrian Frutiger’s iconic OCR-B. Here again you can see how the 6, 8 and 9 are completely different from each other.

When numbers have to distinguished when traveling at high speed, instant recognition is equally imperative. This is why Interstate for example also is a good reference for well-shaped numerals.

So next time you need to specify type for text, look beyond the letters and also examine the shapes of the numerals. As for me, I will follow the development of this fascinating story with interest.

For more information on numerals read my two reference posts on The FontFeed:

Header image: Xerox Workcentre 7545, courtesy of Xerox

Tags: , , , ,

16 Comments:

  1. A fascinating and horrifying story. However, the part about figure legibility is bogus here. The machines don’t mix up 6, 8 and 9 because they are so similar in neo-grotesques. They replace whole blocks of imagery because their pattern/color is similar. In one of Kriesel’s examples, values of ‘17,42’ and ‘21,11’ have both been replaced by ‘14,13’ – the issue is more severe than just closed forms. Still, it certainly is never a bad idea to use something else than Arial & Co, something nice with discernible figures that please both machines and the human eye.

    Posted by Florian on Aug. 5, 2013
  2. Yes, it is amazing. But a great excuse for FontShop, MyFonts et al to promote non-system fonts, old-style figures, etc. :)

    Posted by Laurence Penney on Aug. 5, 2013
  3. If you would do the same test with Frutiger or Vectora this would improve the reading result and not replace 6 by 8 or 8 by 6. Ask designers how often they make mistakes by reading the Pantone Guide numbers. These numbers are difficult to read in a bad lighting condition or difficult for older designers and printers. Pantone uses Helvetica. The specific Xerox copiers behave like old designers reading Helvetica (or Arial) in small sizes! Very dangerous, Xerox has to fix this problem because Helvetica and Arial will and cannot be replaced in an engineering environment. I think.

    Posted by Henk W. Gianotten on Aug. 5, 2013
  4. I have to wonder why copiers are using this kind of extreme data compression at all. Is it to reduce RAM costs?

    Posted by Jeff Flanagan on Aug. 5, 2013
  5. RAM – yes.
    I would imagine that pro photo copiers like that can scan, then reproduce hundreds of pages (say, you scan a thick manual to make many, sorted copies). Then you really need to first store it all in memory (if you don’t have cheaper persistent storage).

    Posted by Jörg Haubrichs on Aug. 5, 2013
  6. This could cause a problem when it comes to critical numerical data but I seriously doubt whether most reports are actually read word for word.

    Posted by Melvin on Aug. 6, 2013
  7. “hordes of turtle-necked goatee-stroking hipsters”

    Ha! The author’s definition of a hipster is culled directly from Maynard G. Krebs.

    Posted by Ben Frazier on Aug. 6, 2013
  8. These images, especially the second, illustrate how fatter and fatter the images get when they are copied at 100%. When pages like this have been copied over and over, it gets to the point where even some of the letters are unreadable. If you’ve ever been to a government office like Food Stamps, or maybe the DMV, you’ll know what I mean.

    And this has been a problem for quite some time, like about 30 years. I learned long ago that if a page is copied at 97%, the copy is much clearer, because it is not as fat.

    Also, on the last set of numbers, the 3 looks an awful lot like the 8.

    And Melvin, maybe reports are not read word for word, but: school children need to read word by word; and people that are trying to read directions to put things together that they have purchased also need to be able to read word for word.

    Posted by Margie Kierstead on Aug. 6, 2013
  9. “[...] Pantone Guide numbers. These numbers are difficult to read in a bad lighting condition [...]”: Never judge colors in bad lighting; “[...] or difficult for older designers and printers [...]”: Professionals always have a loupe on hand ;-)

    Posted by Martijn Oostra on Aug. 6, 2013
  10. I guess this is a documented limitation. Search for the string “character substitution” in this manual: “some quality degradation and character substitution errors may occur” http://www.cs.unc.edu/cms/help/help-articles/files/xerox-copier-user-guide.pdf

    Posted by Erik Mats on Aug. 6, 2013
  11. Ha! The author’s definition of a hipster is culled directly from Maynard G. Krebs.
    I wasn’t aware of that; didn’t even know the guy. I guess it’s the standard 60s image that stuck with me. : )

    Posted by Yves Peters on Aug. 7, 2013
  12. Florian, I disagree with your opinion that my assessment of figure legibility is bogus. Yes, you are correct – in the original construction map other seemingly random numerals get swapped, but that is mainly because the resolution is so poor and the numbers so small.
    However in the tests with 7pt Arial conducted by David Kriesel (the ones I show in this post) the resolution and size are far better, and even then the 6s get swapped by 8s. This clearly proves that their design is conductive to misinterpretation, both by machine and man, making them crappy typefaces when it comes to general legibility and character recognition. So I stand my ground. : )

    Posted by Yves Peters on Aug. 7, 2013
  13. Long live Tekton!

    Posted by Gregory on Aug. 7, 2013
  14. Okay, let me phrase it differently: Using highly legible figures won’t guarantee you a faultless result with this bug. They can help, but size/resolution is more important. Agree with that? : )

    Posted by Florian on Aug. 8, 2013
  15. There is another bizarre twist to this story. The group of conspiracy theorists known as the “Birthers” claimed that the exact replication of some characters on President Obama’s long form birth certificate was evidence that the document was a forgery and manufactured on a computer.

    A group of us debunkers researched and found that the replication was in fact an artifact of JBIG2 compression. Not only that we found that it is certain that the PDF of the birth certificate posted on the White House web site was indeed produced by scanning the document on a Xerox WorkCentre.

    I have written a number of articles on my blog detailing how this was determined by examining the files and running tests on a Xerox WorkCentre.

    Our findings have not been well received by the Birthers, esp. Sheriff Joe Arpaio’s Cold Case Posse in Arizona where Commander Mike Zullo has been conducting a faux criminal investigation into the so-called “forgery”.

    Posted by Reality Check on Feb. 27, 2014
  16. Howdy! Do you know if they make any plugins to safeguard against hackers?
    I’m kinda paranoid about loing everything I’ve worke hard on. Any
    suggestions?

    Posted by iphone 6 screen protector on Sep. 21, 2014

Post a comment:

  •  

The FontFeed

The FontFeed is a daily dispatch of recommended fonts, typography techniques, and inspirational examples of digital type at work in the real world. Eat up.

Archives

Subscribe

The FontFeed RSS The FontFeed Comments RSS