Published June 3, 2014 | Version v1
Conference paper

GOOFRE version 2

Description

The amount of data contained within Google Books has doubled over the last two years and now exceeds 500 billion words. A new treatment of the data has included a re-examination of scanned images, offering a more accurate recognition of the text. In addition, for the first time, included texts have been subjected to deambigation and lemmatisation. Finally, the website Culturomics has made tools available that facilitate its accessibility. It seemed interesting, therefore, to develop a new expertise and to create a new database, complete with all the necessary statistical tools, available online or locally, for exploiting such large corpora.

Abstract

International audience

Additional details

Created:
March 26, 2023
Modified:
December 1, 2023