Frequently Asked Questions

Getting Started

What does your rate cover?

Our quote will cover the pricing for our entire scanning process: from loading metadata to hosting, file formats, OCR, persistent identifiers, to digital preservation, not just the cost of the actual images for your items.

Which formats can you digitize?

We can digitize physical texts both bound and unbound, folios, microfilm, newspapers, and more.

Do you digitize loose pages, scrapbooks, pamphlets, or folders of ephemera?

Yes, they usually fall under special pricing. Please get in contact with your regional manager for more information about digitizing these types of materials.

How do we get our items to an Internet Archive scanning center to be digitized?

When you get in touch, your nearest regional digitization center manager will reach out to you regarding next steps, including how to arrange shipping to and from the center. The shipping method and costs will be chosen and covered by you. In some cases, it may be possible to bring our service to you via satellite digitization.

How long will it take to digitize our materials?

Ideally, we like to have the material in and out within 30 days from when we receive it. If it is a large shipment, then the turnaround time is agreed upon beforehand by both parties.

We have books in copyright. Does this mean we can’t have them digitized?

Working with in copyright materials involves a different set of risks and considerations than working with public domain material. While we are often able to digitize such materials, we cannot make them available on the same terms as public domain material. It is the responsibility of the library partner or organization to determine the copyright status of their collection as part of the digitization planning process.

What options are there for larger projects?

We provide in-house digitization services (aka Satellite Digitization Services) to several libraries and universities. We may be able to establish a satellite in your library for projects that surpass 750,000 pages. For large-scale de-selecting projects, please complete our Donation Form.

What forms of shipping do you accept?

We’re happy to accept shipments in any form a partner is comfortable sending them: hand delivery, USPS, FedEx, UPS, private courier, etc. The partner is responsible for providing a waybill as well as any shipping charges incurred both to and from the IA digitization center. We request return shipping labels be included with the initial shipment.

What metadata should we provide for our collection items?

Metadata can be created by filling in our template and emailing it to us as a CSV. Many fields are already identified in the template (title, author, publisher, etc.), but any custom field can be added. Physical item condition notes can also be included in the CSV and not be uploaded to the public item record. For more detailed info and to download a copy of our template, please contact us.

Digitization Process

What size of books can you scan?

Books or similar items 10.5″ width x 16″ high maximum on our full-frame scribes. Max folio/foldout specs are 45″ x 46.2″, though at a much reduced PPI (around 172).

Will my digitized items be word searchable?

Yes! Each page image is processed using Optical Character Recognition (OCR) software, a fully automated process that produces an unedited text file without formatting or images and a location-based XML file. Full text and metadata will both be indexed on archive.org. We use open-source Tesseract to process texts in many languages.

What languages can you OCR?

We can OCR over 100 languages! Get in touch if you want to see if the language you need is on our list.

How are MARC records and metadata added to the items?

If it is a larger, longer-term project, we can use a Z39.50 connection to tap into your catalog. For all others, we use a CSV spreadsheet that you create. This also allows you to customize fields that you may want to include that are not included in your catalog record.

What PPI can I expect on the final images?

We shoot with a minimum PPI of 350, however depending on the size of the material we usually capture images at around 450/500PPI. 

Post Digitization

Can our items be ingested and displayed onto our own website?

Yes, we can work with you to put the images from Archive.org onto your website.

Can our items be ingested into Digital Public Library of America (DPLA)?

Yes, however, we do not provide that as a service on our end. You have to contact DPLA for details.

What digital formats will be produced?

The file formats produced are DJVU, Full Text, PDF, JPG, and JSON. More information about the file types produced by Internet Archive are described here. Epub and Daisy files can be generated on the fly for those with print disabilities and a valid key from the Library of Congress. Lossless TIFF files can also be generated, albeit at a higher cost. The optional lossless workflow and pricing must be selected and agreed upon before digitization begins.

Can you digitally retouch or redact some information in my items?

We do not offer digital retouching or Photoshop to items. Our goal is for the digital versions of items to match their physical counterparts as closely as possible. Retouched/Photoshopped images can be used to replace pages within items if provided by the partner.

What metadata is displayed on each digital item?

In addition to basic bibliographic information (Title, Author/Creator, Publisher, Date), each digital item’s page will display selected metadata from MARC records when available. We use this crosswalk to convert MARC metadata into Dublin Core format.

How does invoicing work?

We charge a setup fee for each digital item (book, archival fond, newspaper, etc.) along with a per-page fee for each image within the item, based on format, size, and handling considerations. We capture every page from cover to cover, including front matter and foldout images when present. Digitization Partners receive a monthly invoice for the number of pages and items uploaded each month, continuously until the project is complete. Invoices are received via email, and payment can be made via bank transfer, and in the UK: SWIFT BANK TRANSFER/PayPal. 

Everything Else

We’ve already digitized our materials but we’d like them to be on Internet Archive. Can we upload these?

Yes! You can upload using the upload on the archive.org website or, for Python users, you can use our Command Line Interface (CLI) tool for larger uploading/downloading. (hyperlink to CLI page instructions: https://archive.org/developers/internetarchive/cli.html ) Note: You will need to sign up for an Internet Archive account to upload items. Sign up here.

We have books that we want to donate. Do you accept donations?

Yes! Please access our help page to find out more.

How do I use Internet Archive’s IIIF server to present digitized items?

Read more about our IIIF server here: Making IIIF Official at the Internet Archive

Still Have Questions? We’re Here to Help!

Whether you need more details about our process, technical specifications, or partnership opportunities, our team is ready to assist you.