CERN’s 60th anniversary year is the perfect opportunity to look back at the history of our organization, and to do a bit of "spring cleaning" in our database of photos. With this in mind, the scientific information service (GS-SIS) has decided to undertake a huge digitization project, with the aim of uploading CERN's entire archive of pictures onto the CERN Document Server, CDS.
There are approximately a quarter of a million pictures in the archive. At the time of writing, 15,733 pictures are available on CDS. The vast majority of these are relatively recent, dating from the last 10 to 15 years as digital cameras became the norm and uploading pictures became a mere matter of clicks.
Below the main staircase in building 500 lurks a room full of dusty filing cabinets. Here, older pictures exist as hard copies, in a range of formats. In most cases we have the original negatives; others are in the form of slides.
To get these pictures on CDS, they must first be scanned. But this raises questions. How high should the scanning quality (and therefore file size and scanning cost) be? We know that most pictures will never be used. Those that are will serve mostly in projected presentations, websites or print versions up to A4 size. However, a few may be needed for posters, exhibitions, or something we haven’t thought of yet. It's impossible to guess in advance which these might be. The original negatives will tend to degrade even if they are stored in optimal conditions. We need to strike a balance between providing as much information as possible (by maximising the scanning resolution) and optimising the associated costs (both financial and computing).
CERN’s in-house scanning resources are limited, so we will soon launch a call for tender for a company to scan the pictures. We will have to ship the negatives to the company we select, so one of my main tasks at the moment is re-packing. I am moving the pictures from their current storage in open-top wooden trays, which are heavy and offer no protection, to lighter, closed-lid cardboard boxes. This is no small task. Many are out of order, numbered incorrectly or using differing systems, and some are quite simply missing. (If you happen to have borrowed any at some point, please get in touch with me and return them!)
The first load to be shipped for scanning will focus on some 150,000 of the older black-and-white negatives, dating from the early 1950s to the mid-1980s. This will increase the number of pictures on CDS by a factor of ten, raising issues of usability for the database.
Metadata for pictures on CDS can be of poor quality (if it exists at all) and there is currently no systematic way of tagging records with keywords. We will need to fix this problem as we expand the contents of CDS so that people can find the pictures they need.
Existing records for the un-scanned pictures also need improvement. Most are only in French, using many largely un-guessable abbreviations. Furthermore, records often refer to a set of pictures. For example, just yesterday I received an urgent request for a picture of François de Rose, who passed away recently. This record is the top result for a CDS search for “de Rose” and refers to pictures 165-241 from December 1963. That’s 76 pictures to sort through in the hope of finding the right one, then having to scan it and process the image for publication. Unfortunately, I wasn’t able to fulfil the request at such short notice. This is a typical case that the digitization project aims to resolve.
There are some true gems in the archive. A couple of weeks ago I was rummaging through a cupboard when I found an envelope marked “1-1-54”. This is the lowest possible value in the numbering system on CERN photos, so I was immediately excited. Could this be the very first photo of CERN? As I carefully removed it, I was not disappointed. It shows the route de Meyrin, the border crossing, the Jura in the distance… and fields. Not a building in sight! I showed it to CERN photographer Max Brice, who was equally enthusiastic and took it straight to his scanner. A few days later it showed up on the CERN homepage.
If this is the kind of thing I can find through serendipity, just imagine what else will turn up when we sort through the collection methodically! The real excitement will come towards the end of the summer, once we start getting the digital files back. We are considering how to involve the CERN community and retirees in helping to identify the content of the pictures to further improve the metadata and searchability. Watch this space.