Follow us on Twitter

Access our Document library

Meet the team

British Library opens retirement home for the UK web



British_Library.png
The British Library and IBM have launched a new project to archive UK domain websites, expanding a remit that started in 2004. The online portal, UK Website Archive, not only looks at capturing and archiving websites, but with the implementation of IBM’s BigSheets analytical software, allows users to search for and access the mass of unstructured data to identify key trends of the time.
 
According to current statistics there are currently 8 million websites in the UK’s web domain, with estimates putting their average life expectancy at between 44 and 75 days. In addition to the race against time, there’s a second issue plaguing the library’s endeavour. Launching the project, the library pointed out that as freely available material on the web remains subject to copyright and cannot be archived without permission (which obviously is both time consuming and expensive) many sites will be lost before it’s archived.
 
“Since 2004 the British Library has led the UK Web Archive in its mission to archive a record of the major cultural and social issues being discussed online. Throughout the project the Library has worked directly with copyright holders to capture and preserve over 6,000 carefully selected websites, helping to avoid the creation of a ‘digital black hole’ in the nation’s memory,” explained Dame Lynne Brindley, CEO at the British Library.
 
“Limited by the existing legal position, at the current rate it will be feasible to collect just 1% of all free UK websites by 2011.  We hope the current DCMS consultation will enact the 2003 Legal Deposit Libraries Act and extend the provision of legal deposit through regulation to cover freely available UK websites, providing regular snapshots of the free UK web domain for the benefit of future research.”
 
Speaking to PublicTechnology.net this week, David Boloker, CTO for Emerging Internet Technology at IBM, explained the archive would also be used to store smaller opinion websites too. “As important things happen around the world, whether it’s news on the economic downturn or global events like H1N1, global warming, energy or politics – people voice their views, and the interesting thing is that key websites will be around a very long time. After a while they might archive their information, but the average user who puts up ‘Here’s my opinion,’ that information goes very, very quickly,” Boloker said.
 
“What The British Library is trying to do is keep that information available for future generations, so that they will use the archive and research the trends of the day. Think of what let to the results of the elections in 2005 or 2010. How do the trends we see in the UK compare to trends around the world?”
 
The UK Web Archive has launched with archived websites organised into several key events, such as the Credit Crunch (including pages from Woolworths and Zavvi), Antony Gormley’s “One & Other” project in Trafalgar Square (its website is due to close in March), and the upcoming 2010 General Election.
 
“The British Library’s UK Web Archive is a fascinating snapshot of the way this country uses the internet.  In the years since the internet began to be a part of our lives, the amount of information we have been able to access is simply unparalleled. The range of subjects is enormous, and reflects the diverse society we live in today,” applauded the minister for Culture and Tourism, Margaret Hodge MP. 
 
“From my perspective, I have been working with our museums, galleries and arts organisations to see how we can support really innovative use of the internet to bring our cultural riches to an even wider audience. I congratulate the British Library on the launch of their web archive.”
 
The British Library’s UK Web Archive project is the first to implement IBM’s BigSheets technology. Boloker confirmed they were currently working with other clients on implementing BigSheets in new projects, including financial institutions and advertising. “If you think about it, [the technology could be applied to] anyone who has a lot of information and needs to gain the insight on the data they have.”
 
BigSheets' wider application is something that should be developed, according to market analysts at Ovum.  “[IBM] should consider developing the technology to recognise audio and images, as this ability could provide a tremendous commercial opportunity," suggests Ovum's Sue Clarke. "There are vast and growing archives of film and video owned by a wide range of organizations, from news channels to film studios, which need to be easily searchable.”
 
Away from its application at the British Library, Clarke speculated that: “Employees within pharmaceutical companies would be able to locate drug trial information easily, or cases of drug side effects going back many years. In universities, the technology could provide students with access to old articles and papers.”