Citation extraction project page

Citation extraction works transforming various document formats (PDF, PostScript, DOC) in an internal rapresentation, and then extracting the citations from the document text.

Extracting text from PDF/PS.

Here's a repository to gather statistics about documents collected from the Internet. Data from these documents will serve to improve the algorithms and to test their effectiveness.

( categories: Documentation )