Session 7: Adapting and integrating existing open source projects
Scope and purpose
-
Guiding question
Which of the available open-source projects may meet project needs?
-
Considerations
Open-ONI, OpeNER, Apache, NERC-Fr, Palladio, Voyant Tools, D3
-
Goal
Shortlist of open source tools to adapt
-
Discussants
Ludovic Moncla (lead), Mary Elings & Elena Azadbakht
Documentation
- Listen:
Session 7 audio recording PART 1
Session 7 audio recording PART 2
- View: Session presentation slide deck
- Read: Session notes
- Briefing Documents:
- Sampsel, Laurie J. 2018. “Voyant Tools.” Music Reference Services Quarterly, 21:3, 153-157, DOI: 10.1080/10588167.2018.1496754
- Agerri Gascón, Rodrigo, Cuadros Sean Gaines, Montse, and Rigau Claramunt, Germán. 2013. “OpeNER: Open Polarity Enhanced Named Entity Recognition.” Sociedad Española para el Procesamiento del Lenguaje Natural. http://dialnet.unirioja.es/servlet/oaiart?codigo=4452510.
Discussion summary
During this session, we reviewed several open-source projects, paying particular attention to the research activities and experience of our grant participants.In discussing the available tools, we attended to four main project needs: 1) browsing and sharing the document collection; 2) annotating the corpus; 3) processing the corpus using machine learning, geoparsing, and text mining techniques; and 4) visualizing and exploring the corpus.
Decisions
We resolved to adapt tools for the following purposes in building our project:
- Browsing Corpus
OpenONI, The Online Newspaper Initiative
which provides a function set for loading, modeling and indexing data
- Annotating Corpus
BRAT
A server-based tool used to annotate the training and verification data for natural language processing
Pelagios and Recogito
A semantic annotation tool for texts and images that can identify and map places
PERDIDO Geoparser
A flexible geoparser that could be adapted to use a manually annotated gazetteer of historic place names
- Natural Language Processing
Spacy
A flexible python library for natural language processing capable of performing most of the needed project tasks
- Visualization
D3, Data-Driven Documents
A Javascript library that will enable us to make custom visualizations that are web-based and interoperable across browsers
Back to main page