Automatically getting bibliography using existing bibliographic search engines
Apart from resting for a few weeks, and spending time learning python and django, this Summer I developed an application that lets you retrieve a huge set of research papers from a small set of titles that act as seeds.
Although it’s just a proof of concept, the application (codenamed Librarian) works really well, and saves a lot of time searching for bibliography related to a small set of articles, which is the most common use case when studying the state of the art of a new investigation.
Simplifying a lot, Librarian works as follows:
The user provides the titles of some articles that he has already read, and a lower limit for the size of the set of documents that should be retrieved.
Due to the computational cost of the search, the application presents the user with the URL of a feed, which he will use to track the progress of the request. At the same time, it launches a batch process to actually perform the search.
In each step, the batch process takes the most relevant article from the queue of articles to be processed (initially the seeds) and scrapes CiteseerX for citations, and Google Scholar for inverse citations and related articles. These references are once more added to the queue of to-be-processed articles, and the process continues until the number of explored articles exceeds the given limit.
The relevance of each article is used to select the next candidate to explore. This is calculated by applying an exponential decay relative to its number of cites with the distance between the article and any of the seeds, where the distance is the number of jumps to get from the seed to the article through intermediate references.
Finally, the outcome of the process is a set of articles that are directly or indirectly related to the seeds, and therefore contains a quite representative sample-of-the-art initially bounded by the articles provided.
Although the results yielded by the tool are not so good as those obtained manually, with a minimum effort and some iterations, you can get excellent results that can save you a lot of work, reducing the time consumed to collect bibliography from days or even weeks, to hours.
–The article has been proofread with the help of David Correa
¿Disfrutaste esta entrada? Por qué no dejas un comentario abajo y continúas la conversación, o te suscribes a mi feed y obtienes artículos como este enviados a tu lector de feeds.

Good job! Keep it up!