Monday, March 28, 2011
Google Who? Despite legal roadblock in Google Books project, libraries might soon benefit from other digital search-and-retrieval services
March 28, 2011
The Google Books project has been put on ice, delaying what some academic librarians had hoped would be a watershed moment in the accessibility and searchability of digital texts. But a pair of library services scheduled to be announced today show that even as the world’s most high-profile digital search-and-retrieval effort has been set back, smaller, academically oriented projects are hoping to continue making electronic texts more discoverable.
The first is from the HathiTrust Digital Library, a cooperative based at the University of Michigan that owes much of its 8.2-million-work collection to duplicate copies of books scanned by Google, and the popular journal and newspaper aggregator ProQuest, which are teaming up to let students and scholars conduct searches that query the full texts of every item in the HathiTrust archive. The second is from the Copyright Clearance Center, which is offering a digital retrieval service that it says will cut the lag time in delivering individual journal articles from five days to five minutes.
Officials at ProQuest and HathiTrust think their service could vastly improve the ability of students to find obscure but relevant book content using a search tool that is as simple to use as Google's. Most library search databases currently query only titles, authors, and “metatags” — keywords referring to certain themes in the work — says John Law, vice president of discovery services at Serial Solutions, the ProQuest division that developed the search tool, which is called Summon. That means books that might have relevant chapters or passages that are not accounted for in those basic identifiers are left out of search results.
But the new tool will use advanced algorithms, a la Google, to troll every word of every book, monograph, journal, and magazine held in the HathiTrust Digital Library that is also either in the library’s print collection or part of the “public domain,” a body of non-copyrighted works that comprises at least 20 percent of HathiTrust’s digital holdings. If the work is under copyright, Summon directs the user to where it can be found in the stacks. If it is in the public domain, Summon links to the full electronic text.
The idea is to make library catalog searches simpler and more like Google. Recent studies suggest that students tend to rely on the company’s popular search engine as a starting point for research. Andrew Asher, an anthropologist at the Ethnographic Research in Illinois Academic Libraries (ERIAL) Project, has done research indicating that students, their expectations primed by Google’s simple search function and the faith it inspires, tend to favor similarly straightforward tools, even when doing academic research.
This finding has prompted different reactions within academe, with some saying librarians and professors need to do a better job steering students toward more discriminating scholarly research tools, and others saying that the methods popularized by Google are here to stay and libraries would do well to imitate the simple search in order to appeal to students.
Law, the Summon developer, falls in the latter camp. Princeton University, he says, presents students starting research projects with hundreds of possible starting points. While it is great that Princeton makes so many resources available to students, students can be paralyzed by choice, Law argues. “Libraries need to be as simple, easy and fast to access and use as commercial alternatives like Google,” he says. “Having a search box for the library that is easy for users is important.”
Imitating Google-type searches of libraries’ print holdings has been difficult. Aside from the obvious challenges of duplicating the effectiveness of the company’s closely guarded search formulas, many libraries simply do not own full digital texts of many of their print collections, and therefore have no choice but to rely on searches that troll through titles, abstracts, and metatags, rather than full texts. But by aggregating the digital copies from many different libraries in one searchable archive, HathiTrust — which was founded in 2008 and has quickly grown to include contributions from 52 research libraries — offers an unprecedented opportunity for libraries to search the full texts of works they own in print but have not digitized.
For example: Library A might not have a digital copy of Alexis de Tocqueville’s Democracy in America, but as long as Library B does, and has contributed it to HathiTrust, a student using the Summon tool for a research project on early American poetry at Library A might discover Tocqueville’s brief but insightful musings on “the sources of poetic inspiration in the democratic age” hidden deep in HathiTrust’s digital copy from Library B, even though it is doubtful that a search of titles and abstracts would have pointed her in the direction of the French political thinker.
A recent study by the Online Computer Resource Center predicts that by 2014 HathiTrust’s digital archive will mirror 60 percent of works currently held in print by the major U.S. research libraries.
The reach of the Summon service therefore stands to be significant, says Law. “It really is unlocking the hidden content in the library content,” he says. “It’s really going to have a massive, massive impact on libraries’ collections.”
Another offering for libraries scheduled to be announced today is a service from the Copyright Clearance Center, a Massachusetts-based nonprofit, called Get It Now. Designed to eliminate inefficiencies in inter-library lending of journal articles, Get It Now allows students who want to read articles from journals to which their libraries do not subscribe to get a digital copy of the article e-mailed to them in minutes, rather than having a librarian send away for a photocopied version from another library.
The old way tended to take 5 to 10 days, says Gerry Hanley, senior director for academic technology services at the California State University chancellor’s office, which has been piloting the service for a year. The new way takes 5 to 10 minutes.
Get It Now essentially allows college libraries to purchase individual articles for students for less than it would cost, on average, to get a copy made and sent from another library. “The service has been a boon to graduate students and faculty who have had access to a greater scope of digital content than what was previously available through licensed content agreements,” Hanley wrote in an e-mail.
Some journals do allow colleges to purchase single articles on demand, but by using the Copyright Clearance Center as an intermediary, colleges avoid the hassle of negotiating those discrete exchanges with different publishers, says Tim Bowen, a product manager at the center. And rather than paying publishers for each exchange with a credit card, the libraries would pay the Copyright Clearance Center for the articles students order each month.
During the California State pilot, the center has charged about $24 per article, according to Hanley. The cost to the university of ordering a copy through inter-library loan is often higher. The biggest part of that cost is royalties. As of 2005, the average cost of royalties for an article acquired through inter-library loan was about $29, says Hanley. In some cases it can run higher. And then there are the postage and labor costs.
“When a library adds up the various unit costs: rush fees and other marginal costs of an inter-library loan transaction, it is not uncommon to find that filling a request through inter-library loan can make this content some of the most expensive, per-use content, that a library purchases in the course of a year,” Hanley says.
However, Get It Now is not necessarily a super-saver, says Hanley. There are upfront costs to implementing the service, he says. And of course, when you make ordering articles quicker and easier — users need only to click on the “Get It Now” button in their library’s discovery engine to place an order — patrons might be more apt to do so. The delivery mechanism is more efficient, but Get It Now expands access more than it trims costs, Hanley says, noting that some California State libraries might have to charge user fees to help subsidize the expense.
“Since this service does carry a new cost for libraries, libraries have had to explore where they might find the cost savings in their budgets to cover the expense of offering the new patron-driven services,” Hanley says. “Publishers, too, have had to be flexible adopting a business model that supports selling content by the article. This is not an easy or comfortable adjustment for publishers or libraries to make, but both are necessary for new patron-driven services to flourish."
For the latest technology news and opinion from Inside Higher Ed, follow @IHEtech on Twitter.
— Steve Kolowich