Concordancers
aConcorde - aConcorde is a multi-lingual concordance tool. Originally developed for native Arabic concordance, it posses basic concordance functionality, as well as English and Arabic interfaces.
ConcApp - by Chris Greaves
TextStat - TextSTAT is a simple programme for the analysis of texts. It reads ASCII/ANSI texts (in different encodings) and HTML files (directly from the internet) and it produces word frequency lists and concordances from these files.
Conferences
Corpora
DCPSE (The Diachronic Corpus of Present-Day Spoken English) - DCPSE is a new parsed corpus of spoken English available on CD-ROM. It contains 400,000 words from ICE-GB (collected in the early 1990s) and 400,000 words from the London-Lund Corpus (late 1960s-early 1980s).
ICE (International Corpus of English)
Lancaster University (Corpora held by)
Leipzig Corpora Collection - Sentence collections in MySQL database for 17 mainly European languages.
Reuters - Reuters corpora are now distributed by NIST.
The English Norwegian Parallel Corpus
TalkBank - The TalkBank database contains transcript and media data collected from conversations with adults and older children.
The Alex Catalogue of Electronic texts - The Alex Catalogue of Electronic Texts is a collection of about 14,000 “classic” public domain documents from American and English literature as well as Western philosophy.
Translational English Corpus (TEC)
Corpora in Spanish
Corpora on EU
OPUS (An Open Source Parallel Corpus) - OPUS is an attempt to collect translated texts from the web, to convert and align the entire collection, to add linguistic data, and to provide the community with a publicly available parallel corpus. OPUS is based on open source products and is also delivered as an open source package. We used several tools to compile the current corpus. (Manual corrections have not been made.) With Europarl Search Engine.
The JRC-Acquis Multilingual Parallel Corpus - The Acquis Communautaire (AC) is the total body of European Union (EU) law applicable in the EU Member States. This collection of legislative text changes continuously and currently comprises selected texts written between the 1950s and now.
Courses on CL&CTS
A Crash Course in Corpus Linguistics
Computational Approach to Collocations
Corpus Linguistics. A Practical Web-based Course
Corpus Linguistics (Ben Bergen)
Course in Corpus Linguistics (University of Essex)
Information about corpus building and investigation
Dictionaries
Documents about Corpora
ECI (European Corpus Initiative) - The European Corpus Initiative (ECI) was founded to oversee the acquisition and preparation of a large multilingual corpus (ECI/MCI) to be made available in digital form for scientific research at a low a cost as possible. The corpus has been available on CD-RO since 1994, and is being distributed by ELSNET.
LDC (Linguistic Data Consortium)
Journals
Journal of Quantitative Linguistics
New Voices in Translation Studies
Other
CETH (Center for Electronic Texts in the Humanities)
Parliaments and related information
Websites of National Parliaments
IPEX (Interparliamentary EU Information Exchange)
ECPRD (European Centre for Parliamentary Research and Documentation)
ECPR (European Consortium for Political Research)
The Euro-Mediterranean Parliamentary Assembly (EMPA)
COSAC (Conference of Community and European Affairs Committees of Parliaments of the European Union)
Research Centres
Centre for English Corpus Linguistic Louvain
Centre for corpus research. University of Birmingham
Researchers
Software and Tools
Teaching and Learning a language with Corpora
XML