PanLex development

Search
Skip to content
  • Acquisition
    • Principles
    • Tools
    • Management
    • Discovery
    • Procurement
  • Assimilation
    • Example
    • Principles
    • Q&A
    • Strategies
    • Tools
    • Workflow
      • Source selection
      • Text standardization
      • Tabularization
      • Serialization
      • Serialization reference
      • Importation
  • Data and interfaces
    • Strategy
    • Translation evaluation
    • User interfaces
    • API
    • Snapshots
    • NLTK
  • Research
    • Quality management
    • Error detection
    • Translation confidence
    • Ontological enrichment
    • Visualization
  • Reference
    • Data model
    • Database design
    • Resource archive
    • Resource organization
    • Source registration
    • PanLex Bot
    • Useful links

Standardizing digital text

Up

Most resources acquired by PanLex until now have consisted of files of digital text. When not, we need to convert to digital text with the same content. Once we have a resource in digital text form, the task of standardizing it begins.

The following pages cover different aspects of standardizing digital text:

  • Human text standardization
  • Simplifying complex text formats
  • Applying basic PanLex text standards
  • Applying aspirational PanLex text standards

Meta

  • Log in
  • Entries feed
  • Comments feed
  • WordPress.org
Proudly powered by WordPress