PDF Translation Complications

PDF files are so prevalent because they preserve content in its designated format, irrespective of the operating system or software displaying the file. This is important because different browsers, applications and operating systems display not just fonts but colors and layouts slightly differently. They function sort of like a digital photocopy. Unfortunately, the same encoding and rendering attributes that provide this constancy and control also mean that the tools that we use for translation projects can struggle (or fail) to parse text correctly, especially in design-heavy or graphically complex documents. This can mean either potential quality issues, added cost or unpleasant surprises for the client. The first thing to know is that there are really two kinds of PDFs – digitally created PDFs in which the text is encoded in its own layer, allowing it to be parsed and searched relatively easily, and image PDFs that are usually based on a photograph or a scanned image.

Both images present problems for translation projects, but in different ways:

Image PDFs are often not parseable at all, and must be manually re-created or analyzed through optical character recognition (OCR) tools, which can add cost or introduce errors
Text PDFs can be parsed through Acrobat or OCR, but images, typography, and intricate designs must be manually re-created by our staff, which duplicates careful work done by the original design team but with less control

Of the two, text PDFs that were composed digitally represent both the lion’s share of PDFs that we see, and the most easily avoided problem, because they were almost always composed in-house using software that we know how to use, like Adobe InDesign and Illustrator or Microsoft Office. Lots of folks send PDFs because they are portable, easily packaged, or just out of habit, but in this case, we really need the original source file. If we have to try to replicate original design specs based on the PDF, we usually have to charge a fee in Desktop Publishing Services, which goes at $65/hr. We are willing to bet your designer would prefer that we get the original too, rather than us going back and trying to match fonts by eye, manually lay out images with text and get all the little details in place in reverse order.

There are a variety of issues with exports, including:

  • Poor machine parsing of text – confusion regarding word breaks, leading to poor machine readability of segments
  • Mismatched fonts
  • Mismatched text spacing
  • Color changes, both hue changes and within solid blocks of colors

Of these issues, the most time consuming is the machine readability of text, because it will interfere with word counts and identifying repetitions (to learn more about translation memories and repetitions, please see our Localization 101 page). These errors require our staff to go over every piece of text in the document before translation to ensure that word counts are correct and that segments are properly identified, which is a crucial component in establishing terminological consistency.

The visual issues in the export also require fixing, and while our support staff have serious chops, making them fix things that could have remained unbroken from the start just doesn’t make a lot of sense.

For image PDFs, clients should expect to see a small fee on a quote for document recreation, usually around $65-$130 but potentially more, depending on size and complexity. This entails one of our staff members either running OCR if they can, and manually checking every word, or just literally re-typing the entire source document so that we can import the strings into memoQ.

So, in short, we can actually translate that PDF – but not without some hassle and/or added cost, which is almost always avoidable. If you have that source file, save yourself some time and your designer some stress and send it over, and everyone will be happier.

Want to get going on a specific project? Hit our quote page to upload files, enter parameters, and start talking specifics.

Want to learn more? Our pricing page has enough detail for you to make your own ballpark localization budget, and our Localization 101 page gives you the rundown on how projects work and what you might expect.
 
If you have more questions and want to chat, enter your info in the form below, or shoot a line over to sales@glyphservices.com.

LET’S DO THIS!

Don’t need a quote, just want to talk?
Let’s connect!

To our valued community,

With the inevitable measures that all businesses need to undertake to support staff and clients, I would like to provide you with an update on our operations. For over 5 years 100% of our staff have been trained and able to work remotely while still maintaining weekly office days for collaboration and team building. We have cultivated business platforms and operations that run digitally without sacrificing security, service quality or capabilities during the better part of the last decade. As of March 9th, all company operations were transferred to work-from-home with no expected change in services, capacity or staff availability.


We understand that many of our partners have experienced challenges due to having a large international presence and multilingual employees. As a result, we are extending our help by monitoring requests after business hours and on weekends for projects associated with the coronavirus. This is to ensure we provide you with the best possible support and the fastest turnaround possible.


If you do have time-sensitive requests associated with the coronavirus, please send them to newrequests@glyphservices.com and include a clear subject line or title so we can act accordingly “COVID-19 Translation”.


We encourage you to reach out if there’s anything we can do to help you and your team during this difficult time. We sincerely hope you, your family, and colleagues remain safe and healthy.

Best wishes,

Viktoriya, CEO

This website uses cookies to improve our users’ experience. By continuing to browse this site you are agreeing to our use of cookies. To learn more, see our Privacy Policy.