Project Gutenberg

Project Gutenberg

Established	1 December 1971 (First document posted)^[1]
Collection
Size	Over 40,000 documents
Website	gutenberg.org

Project Gutenberg (PG) is a volunteer effort to digitize and archive cultural works, to "encourage the creation and distribution of eBooks".^[2] It was founded in 1971 by Michael S. Hart and is the oldest digital library.^[3] Most of the items in its collection are the full texts of public domain books. The project tries to make these as free as possible, in long-lasting, open formats that can be used on almost any computer. As of July 2012, Project Gutenberg claimed over 40,000 items in its collection.

Wherever possible, the releases are available in plain text, but other formats are included, such as HTML, PDF, EPUB, MOBI, and Plucker. Most releases are in the English language, but many non-English works are also available. There are multiple affiliated projects that are providing additional content, including regional and language-specific works. Project Gutenberg is also closely affiliated with Distributed Proofreaders, an Internet-based community for proofreading scanned texts.

History

Michael Hart (left) and Gregory Newby (right) of Project Gutenberg, 2006

Project Gutenberg was started by Michael Hart in 1971 with the digitization of the United States Declaration of Independence.^[4] Hart, a student at the University of Illinois, obtained access to a Xerox Sigma V mainframe computer in the university's Materials Research Lab. Through friendly operators, he received an account with a virtually unlimited amount of computer time; its value at that time has since been variously estimated at $100,000 or $100,000,000.^[4] Hart has said he wanted to "give back" this gift by doing something that could be considered to be of great value. His initial goal was to make the 10,000 most consulted books available to the public at little or no charge, and to do so by the end of the 20th century.^[5]

This particular computer was one of the 15 nodes on ARPANET, the computer network that would become the Internet. Hart believed that computers would one day be accessible to the general public and decided to make works of literature available in electronic form for free. He used a copy of the United States Declaration of Independence in his backpack, and this became the first Project Gutenberg e-text. He named the project after Johannes Gutenberg, the fifteenth century German printer who propelled the movable type printing press revolution.

By the mid-1990s, Hart was running Project Gutenberg from Illinois Benedictine College. More volunteers had joined the effort. All of the text was entered manually until 1989 when image scanners and optical character recognition software improved and became more widely available, which made book scanning more feasible.^[6] Hart later came to an arrangement with Carnegie Mellon University, which agreed to administer Project Gutenberg's finances. As the volume of e-texts increased, volunteers began to take over the project's day-to-day operations that Hart had run.

Starting in 2004, an improved online catalog made Project Gutenberg content easier to browse, access and hyperlink. Project Gutenberg is now hosted by ibiblio at the University of North Carolina at Chapel Hill.

Pietro Di Miceli, an Italian volunteer, developed and administered the first Project Gutenberg website and started the development of the Project online Catalog. In his ten years in this role (1994–2004), the Project web pages won a number of awards, often being featured in "best of the Web" listings, and contributing to the project's popularity.^[7]

Project Gutenberg founder, Michael Hart, died on September 6, 2011 at his home at Urbana, IL at the age of 64.^[8]

Affiliated organizations

In 2000, a non-profit corporation, the Project Gutenberg Literary Archive Foundation, Inc. was chartered in Mississippi to handle the project's legal needs. Donations to it are tax-deductible. Long-time Project Gutenberg volunteer Gregory Newby became the foundation's first CEO.^[9]

Charles Franks also founded Distributed Proofreaders (DP) in 2000, which allowed the proofreading of scanned texts to be distributed among many volunteers over the Internet. This effort greatly increased the number and variety of texts being added to Project Gutenberg, as well as making it easier for new volunteers to start contributing. DP became officially affiliated with Project Gutenberg in 2002.^[10] As of 2007, the 10,000+ DP-contributed books comprised almost a third of the nearly 40,000 books in Project Gutenberg.

CD and DVD Project

In August 2003, Project Gutenberg created a CD containing approximately 600 of the "best" e-books from the collection. The CD is available for download as an ISO image. When users are unable to download the CD, they can request to have a copy sent to them, free of charge.

In December 2003, a DVD was created containing nearly 10,000 items. At the time, this almost represented the entire collection. In early 2004, the DVD also became available by mail.

In July 2007, a new edition of the DVD was released containing over 17,000 books, and in April 2010, a dual-layer DVD was released, containing nearly 30,000 items.

The majority of the DVDs, and all of the CDs mailed by the project were recorded on recordable media by volunteers. However, the new dual layer DVDs were manufactured, as it proved more economical than having volunteers burn them. As of October 2010, the project has mailed approximately 40,000 discs.^[11]

Scope of collection

Growth of Project Gutenberg publications from 1994 until 2008.

As of November 2011, Project Gutenberg claimed over 40,000 items in its collection, with an average of over fifty new e-books being added each week.^[12] These are primarily works of literature from the Western cultural tradition. In addition to literature such as novels, poetry, short stories and drama, Project Gutenberg also has cookbooks, reference works and issues of periodicals.^[13] The Project Gutenberg collection also has a few non-text items such as audio files and music notation files.

Most releases are in English, but there are also significant numbers in many other languages. As of November 2010, the non-English languages most represented are: French, German, Finnish, Dutch, Portuguese, and Chinese.^[3]

Whenever possible, Gutenberg releases are available in plain text, mainly using US-ASCII character encoding but frequently extended to ISO-8859-1 (needed to represent accented characters in French and Scharfes s in German, for example). Besides being copyright-free, the requirement for a Latin (character set) text version of the release has been a criterion of Michael Hart's since the founding of Project Gutenberg, as he believes this is the format most likely to be readable in the extended future.^[14] Out of necessity, this criterion has had to be extended further for the sizable collection of texts in East Asian languages such as Chinese and Japanese now in the collection, where UTF-8 is used instead.

Other formats may be released as well when submitted by volunteers. The most common non-ASCII format is HTML, which allows markup and illustrations to be included. Some project members and users have requested more advanced formats, believing them to be much easier to read. But some formats that are not easily editable, such as PDF, are generally not considered to fit in with the goals of Project Gutenberg, although many are being introduced to the collection in PDF format so that illustrations can be added to downloadable documents. For years, there has been discussion of using some type of XML, although progress on that has been slow.^{[citation
needed]}

Beginning in 2009 the Project Gutenberg catalog began offering auto-generated alternate file formats, including html, EPUB and plucker.^[15]

Ideals

Michael Hart said in 2004, "The mission of Project Gutenberg is simple: 'To encourage the creation and distribution of ebooks'".^[2] His goal was, "to provide as many e-books in as many formats as possible for the entire world to read in as many languages as possible".^[3] Likewise, a project slogan is to "break down the bars of ignorance and illiteracy",^[16] because its volunteers aim to continue spreading public literacy and appreciation for the literary heritage just as public libraries began to do in the late 19th century.^[17]^[18]

Project Gutenberg is intentionally decentralized. For example, there is no selection policy dictating what texts to add. Instead, individual volunteers work on what they are interested in, or have available. The Project Gutenberg collection is intended to preserve items for the long term, so they cannot be lost by any one localized accident. In an effort to ensure this, the entire collection is backed-up regularly and mirrored on servers in many different locations.^{[citation
needed]}

Copyright

Project Gutenberg is careful to verify the status of its ebooks according to U.S. copyright law. Material is added to the Project Gutenberg archive only after it has received a copyright clearance, and records of these clearances are saved for future reference. Project Gutenberg does not claim new copyright on titles it publishes. Instead, it encourages their free reproduction and distribution.^[3]

Most books in the Project Gutenberg collection are distributed as public domain under U.S. copyright law. The licensing included with each ebook puts some restrictions on what can be done with the texts (such as distributing them in modified form, or for commercial purposes) as long as the Project Gutenberg trademark is used. If the header is stripped and the trademark not used, then the public domain texts can be reused without any restrictions.^{[citation
needed]}

There are also a few copyrighted texts that Project Gutenberg distributes with permission. These are subject to further restrictions as specified by the copyright holder.^{[citation
needed]}

Criticism

The text is wrapped at 65-70 characters and paragraphs are separated by a double-line break. Although this makes the release available to anybody with a text-reader, a drawback of this format is the lack of markup and the resulting relatively bland appearance.^[19]

While the works in Project Gutenberg represent a valuable sample of publications that span several centuries, there are some issues of concern for linguistic analysis. Some content may have been modified by the transcriber because of editorial changes or corrections (such as to correct for obvious proof-setting or printing errors). The spelling may also have been modified to conform with current practices (although the intent by Project Gutenberg,^[20] and by Distributed Proofreaders,^[1] is to preserve the original text and where possible the formatting). This can mean that the works may be problematic when searching for older grammatical usage. Finally, the collected works can be weighted heavily towards certain authors (such as Charles Dickens), while others are barely represented.^[21]

In March 2004, a new initiative was begun by Michael Hart and John S. Guagliardo^[22] to provide low-cost intellectual properties. The initial name for this project was Project Gutenberg 2 (PG II), which created controversy among PG volunteers because of the re-use of the project's trademarked name for a commercial venture.^[9]

Affiliated projects

All affiliated projects are independent organizations which share the same ideals, and have been given permission to use the Project Gutenberg trademark. They often have a particular national, or linguistic focus.^[23]

List of affiliated projects

PG-EU is a sister project which operates under the copyright law of the European Union. One of its aims is to include as many languages as possible into Project Gutenberg. It operates in Unicode to ensure that all alphabets can be represented easily and correctly.^[24]
Project Gutenberg Australia hosts many texts which are public domain according to Australian copyright law, but still under copyright (or of uncertain status) in the United States, with a focus on Australian writers and books about Australia.^[25]
Project Gutenberg Canada.^[26]
Project Gutenberg Consortia Center is an affiliate specializing in collections of collections. These do not have the editorial oversight or consistent formatting of the main Project Gutenberg. Thematic collections, as well as numerous languages, are featured.^[27]
Projekt Gutenberg-DE claims copyright for its product and limits access to browsable web-versions of its texts.^[28]
Project Gutenberg Europe is a project run by Project Rastko in Serbia. It aims at being a Project Gutenberg for all of Europe, and started to post its first projects in 2005. It uses the Distributed Proofreaders software to quickly produce etexts.^[29]
Project Gutenberg Luxembourg publishes mostly, but not exclusively, books that are written in Luxembourgish.^[30]
Projekti Lönnrot, a project started by Finnish Project Gutenberg volunteers, derives its name from the Finnish philologist Elias Lönnrot (1802-1884)^[31]
Project Gutenberg of the Philippines aims to "make as many books available to as many people as possible, with a special focus on the Philippines and Philippine languages".^[32]
Project Gutenberg Russia is a project that aims to collect public domain books in Slavic languages, Russian in particular. The discussion of the project and its legal side began in April 2012. The word Rutenberg is a combination of words "RUssia" and "Gutenberg".^[33]
Project Gutenberg Self Publishing Unlike the Gutenberg Project itself, Project Gutenberg Self Publishing allows submission of texts never published before, including self-published ebooks.^[34]
Project Gutenberg of Taiwan seeks to archive copyright free books with a special focus on Taiwan in English, Mandarin and Taiwan-based languages. It is a special project of Forumosa.com^[35]

CONDIZIONI DI USO DI QUESTO SITO • agg. 13.12.12
L'utente può utilizzare il sito ELINGUE solo se comprende e accetta quanto segue:

le risorse e i servizi linguistici presentati all'interno della cartella di sito denominata ELINGUE (www.englishgratis.com/elingue) , d'ora in poi definita "ELINGUE", sono accessibili solo previa sottoscrizione di un abbonamento a pagamento e si possono utilizzare esclusivamente per uso personale e non commerciale con tassativa esclusione di ogni condivisione comunque effettuata. Tutti i diritti sono riservati. La riproduzione anche parziale è vietata senza autorizzazione scritta.
si precisa altresì che il nome del sito EnglishGratis, che ospita ELINGUE, è esclusivamente un marchio di fantasia e un nome di dominio internet che fa riferimento alla disponibilità sul sito di un numero molto elevato di risorse gratuite e non implica dunque in alcun modo una promessa di gratuità relativamente a prodotti e servizi nostri o di terze parti pubblicizzati a mezzo banner e link, o contrassegnati chiaramente come prodotti a pagamento (anche ma non solo con la menzione "Annuncio pubblicitario"), o comunque menzionati nelle pagine del sito ma non disponibili sulle pagine pubbliche, non protette da password, del sito stesso. In particolare sono esclusi dalle pretese di gratuità i seguenti prodotti a pagamento: il nuovo abbonamento ad ELINGUE, i corsi 20 ORE e le riviste English4Life. L'utente che abbia difficoltà a capire il significato del marchio English Gratis o la relazione tra risorse gratuite e risorse a pagamento è pregato di contattarci per le opportune delucidazioni PRIMA DI UTILIZZARE IL SITO onde evitare spiacevoli equivoci.
ELINGUE è riservato in linea di massima ad utenti singoli (privati o aziendali). Qualora si sia interessati ad abbonamenti multi-utente si prega di contattare la redazione per un'offerta ad hoc.
l'utente si impegna a non rivelare a nessuno i dati di accesso che gli verranno comunicati (nome utente e password)
coloro che si abbonano accettano di ricevere le nostre comunicazioni di servizio (newsletter e mail singole) che sono l'unico tramite di comunicazione tra noi e il nostro abbonato, e servono ad informare l'abbonato della scadenza imminente del suo abbonamento e a comunicargli in anticipo eventuali problematiche tecniche e di manutenzione che potrebbero comportare l'indisponibilità transitoria del sito.
Nel quadro di una totale trasparenza e cortesia verso l'utente, l'abbonamento NON si rinnova automaticamente. Per riabbonarsi l'utente dovrà di nuovo effettuare la procedura che ha dovuto compiere la prima volta che si è abbonato.
Le risorse costituite da codici di embed di YouTube e di altri siti che incoraggiano lo sharing delle loro risorse (video, libri, audio, immagini, foto ecc.) sono ovviamente di proprietà dei rispettivi siti. L'utente riconosce e accetta che 1) il sito di sharing che ce ne consente l'uso può in ogni momento revocare la disponibilità della risorsa 2) l'eventuale pubblicità che figura all'interno delle risorse non è inserita da noi ma dal sito di sharing 3) eventuali violazioni di copyright sono esclusiva responsabilità del sito di sharing mentre è ovviamente nostra cura scegliere risorse solo da siti di sharing che pratichino una politica rigorosa di controllo e interdizione delle violazioni di copyright.
Nel caso l'utente riscontri nel sito una qualsiasi violazione di copyright, è pregato di segnalarcelo immediatamente per consentirci interventi di verifica ed eventuale rimozione del contenuto in questione. I contenuti rimossi saranno, nel limite del possibile, sostituiti con altri contenuti analoghi che non violano il copyright.
I servizi linguistici da noi forniti sulle pagine del sito ma erogati da aziende esterne (per esempio, la traduzione interattiva di Google Translate e Bing Translate realizzata rispettivamente da Google e da Microsoft, la vocalizzazione Text To Speech dei testi inglesi fornita da ReadSpeaker, il vocabolario inglese-italiano offerto da Babylon con la sua Babylon Box, il servizio di commenti sociali DISQUS e altri) sono ovviamente responsabilità di queste aziende esterne. Trattandosi di servizi interattivi basati su web, possono esserci delle interruzioni di servizio in relazione ad eventi di manutenzione o di sovraccarico dei server su cui non abbiamo alcun modo di influire. Per esperienza, comunque, tali interruzioni sono rare e di brevissima durata, saremo comunque grati ai nostri utenti che ce le vorranno segnalare.
Per quanto riguarda i servizi di traduzione automatica l'utente prende atto che sono forniti "as is" dall'azienda esterna che ce li eroga (Google o Microsoft). Nonostante le ovvie limitazioni, sono strumenti in continuo perfezionamento e sono spesso in grado di fornire all'utente, anche professionale, degli ottimi suggerimenti e spunti per una migliore traduzione.
In merito all'utilizzabilità del sito ELINGUE su tablet e cellulari a standard iOs, Android, Windows Phone e Blackberry facciamo notare che l'assenza di standard comuni si ripercuote a volte sulla fruibilità di certe prestazioni tipiche del nostro sito (come il servizio ReadSpeaker e la traduzione automatica con Google Translate). Mentre da parte nostra è costante lo sforzo di rendere sempre più compatibili il nostro sito con il maggior numero di piattaforme mobili, non possiamo però assicurare il pieno raggiungimento di questo obiettivo in quanto non dipende solo da noi. Chi desidera abbonarsi è dunque pregato di verificare prima di perfezionare l'abbonamento la compatibilità del nostro sito con i suoi dispositivi informatici, mobili e non, utilizzando le pagine di esempio che riproducono una pagina tipo per ogni tipologia di risorsa presente sul nostro sito. Non saranno quindi accettati reclami da parte di utenti che, non avendo effettuato queste prove, si trovino poi a non avere un servizio corrispondente a quello sperato. In tutti i casi, facciamo presente che utilizzando browser come Chrome e Safari su pc non mobili (desktop o laptop tradizionali) si ha la massima compatibilità e che il tempo gioca a nostro favore in quanto mano a mano tutti i grandi produttori di browser e di piattaforme mobili stanno convergendo, ognuno alla propria velocità, verso standard comuni.
Il sito ELINGUE, diversamente da English Gratis che vive anche di pubblicità, persegue l'obiettivo di limitare o non avere affatto pubblicità sulle proprie pagine in modo da garantire a chi studia l'assenza di distrazioni. Le uniche eccezioni sono 1) la promozione di alcuni prodotti linguistici realizzati e/o garantiti da noi 2) le pubblicità incorporate dai siti di sharing direttamente nelle risorse embeddate che non siamo in grado di escludere 3) le pubblicità eventualmente presenti nei box e player che servono ad erogare i servizi linguistici interattivi prima citati (Google, Microsoft, ReadSpeaker, Babylon ecc.).
Per quanto riguarda le problematiche della privacy, non effettuiamo alcun tracciamento dell'attività dell'utente sul nostro sito neppure a fini statistici. Tuttavia non possiamo escludere che le aziende esterne che ci offrono i loro servizi o le loro risorse in modalità sharing effettuino delle operazioni volte a tracciare le attività dell'utente sul nostro sito. Consigliamo quindi all'utente di utilizzare browser che consentano la disattivazione in blocco dei tracciamenti o l'inserimento di apposite estensioni di browser come Ghostery che consentono all'utente di bloccare direttamente sui browser ogni agente di tracciamento.
Le risposte agli utenti nella sezione di commenti sociali DISQUS sono fornite all'interno di precisi limiti di accettabilità dei quesiti posti dall'utente. Questi limiti hanno lo scopo di evitare che il servizio possa essere "abusato" attraverso la raccolta e sottoposizione alla redazione di ELINGUE di centinaia o migliaia di quesiti che intaserebbero il lavoro della redazione. Si prega pertanto l'utente di leggere attentamente e comprendere le seguenti limitazioni d'uso del servizio:
- il servizio è moderato per garantire che non vengano pubblicati contenuti fuori tema o inadatti all'ambiente di studio online
- la redazione di ELINGUE si riserva il diritto di editare gli interventi degli utenti per correzioni ortografiche e per chiarezza
- il servizio è erogato solo agli utenti abbonati registrati gratuitamente al servizio di commenti sociali DISQUS
- l'utente non può formulare più di un quesito al giorno
- un quesito non può contenere, salvo eccezioni, più di una domanda
- un utente non può assumere più nomi, identità o account di Disqus per superare i limiti suddetti
- nell'ambito del servizio non sono forniti servizi di traduzione
- la redazione di ELINGUE gestisce la priorità delle risposte in modo insindacabile da parte dell'utente
- in tutti i casi, la redazione di ELINGUE è libera in qualsiasi momento di de-registrare temporaneamente l'utente abbonato dal
servizio DISQUS qualora sussistano fondati motivi a suo insindacabile giudizio. La misura verrà comunque attuata solo in casi di
eccezionale gravità.
L'utente, inoltre, accetta di tenere Casiraghi Jones Publishing SRL indenne da qualsiasi tipo di responsabilità per l'uso - ed eventuali conseguenze di esso - delle informazioni linguistiche e grammaticali contenute sul sito, in particolare, nella sezione Disqus. Le nostre risposte grammaticali sono infatti improntate ad un criterio di praticità e pragmaticità che a volte è in conflitto con la rigidità delle regole "ufficiali" che tendono a proporre un inglese schematico e semplificato dimenticando la ricchezza e variabilità della lingua reale. Anche l'occasionale difformità tra le soluzioni degli esercizi e le regole grammaticali fornite nella grammatica va concepita come stimolo a formulare domande alla redazione onde poter spiegare più nei dettagli le particolarità della lingua inglese che non possono essere racchiuse in un'opera grammaticale di carattere meramente introduttivo come la nostra grammatica online.

ELINGUE è un sito di Casiraghi Jones Publishing SRL
Piazzale Cadorna 10 - 20123 Milano - Italia
Tel. 02-36553040 - Fax 02-3535258 email: robertocasiraghi@iol.it
Iscritta al Registro Imprese di MILANO - C.F. e PARTITA IVA: 11603360154
Iscritta al R.E.A. di al n. 1478561 • Capitale Sociale Euro 10.400,00 interamente versato

Contents