edu.northwestern.at.morphadorner.corpuslinguistics.lexicon

Interface Lexicon

    • Method Summary

      Methods 
      Modifier and Type Method and Description
      boolean containsEntry(java.lang.String entry)
      Checks if lexicon contains an entry.
      java.lang.String[] getCategories()
      Get the categories, sorted in ascending order.
      java.util.Set<java.lang.String> getCategoriesForEntry(java.util.List<java.lang.String> sentence, int entryIndex)
      Get categories for an entry in a sentence.
      java.util.Set<java.lang.String> getCategoriesForEntry(java.lang.String entry)
      Get categories for an entry in the lexicon.
      java.util.Set<java.lang.String> getCategoriesForEntry(java.lang.String entry, boolean isFirstEntry)
      Get categories for an entry.
      int getCategoryCount(java.lang.String category)
      Get category count.
      int getCategoryCount(java.lang.String entry, java.lang.String category)
      Get count for an entry in a specific category.
      java.util.Map<java.lang.String,MutableInteger> getCategoryCounts()
      Get category counts.
      java.util.Map<java.lang.String,MutableInteger> getCategoryCountsForEntry(java.lang.String entry)
      Get category counts for an entry.
      java.lang.String[] getEntries()
      Get the entries, sorted in ascending order.
      int getEntryCount(java.lang.String entry)
      Get total count for an entry.
      java.lang.String getLargestCategory(java.lang.String entry)
      Get category with largest count for an entry.
      java.lang.String getLemma(java.lang.String entry)
      Get lemma for an entry.
      java.lang.String getLemma(java.lang.String entry, java.lang.String category)
      Get lemma for an entry in a specific category.
      java.lang.String[] getLemmata(java.lang.String entry)
      Get all lemmata for an entry.
      LexiconEntry getLexiconEntry(java.lang.String entry)
      Get a lexicon entry.
      int getLexiconSize()
      Get number of entries in Lexicon.
      int getLongestEntryLength()
      Get the longest entry length in the lexicon.
      int getNumberOfCategories()
      Get number of categories.
      int getNumberOfCategoriesForEntry(java.lang.String entry)
      Get number of categories for an entry.
      PartOfSpeechTags getPartOfSpeechTags()
      Get the part of speech tags list used by the lexicon.
      int getShortestEntryLength()
      Get the shortest entry length in the lexicon.
      void loadLexicon(java.net.URL lexiconURL, boolean compressed, java.lang.String encoding)
      Load entries into a lexicon.
      void loadLexicon(java.net.URL lexiconURL, java.lang.String encoding)
      Load entries into a lexicon.
      void removeEntry(java.lang.String entry)
      Remove entry.
      void removeEntryCategory(java.lang.String entry, java.lang.String category)
      Remove given category for an entry.
      void saveLexiconToTextFile(java.lang.String lexiconFileName, java.lang.String encoding)
      Save lexicon to a file.
      LexiconEntry setLexiconEntry(java.lang.String entry, LexiconEntry entryData)
      Set a lexicon entry.
      boolean setPartOfSpeechTags(PartOfSpeechTags partOfSpeechTags)
      Set the part of speech tags list used by the lexicon.
      void updateEntryCount(java.lang.String entry, java.lang.String category, java.lang.String lemma, int entryCount)
      Update entry count in lexicon for a given category.
    • Method Detail

      • loadLexicon

        void loadLexicon(java.net.URL lexiconURL,
                       boolean compressed,
                       java.lang.String encoding)
                         throws java.io.IOException
        Load entries into a lexicon.
        Parameters:
        lexiconURL - URL for the file containing the lexicon.
        compressed - true if lexicon is gzip compressed.
        encoding - Character encoding of lexicon file text.
        Throws:
        java.io.IOException
      • loadLexicon

        void loadLexicon(java.net.URL lexiconURL,
                       java.lang.String encoding)
                         throws java.io.IOException
        Load entries into a lexicon.
        Parameters:
        lexiconURL - URL for the file containing the lexicon.
        encoding - Character encoding of lexicon file text.
        Throws:
        java.io.IOException
      • updateEntryCount

        void updateEntryCount(java.lang.String entry,
                            java.lang.String category,
                            java.lang.String lemma,
                            int entryCount)
        Update entry count in lexicon for a given category.
        Parameters:
        entry - The entry.
        category - The category.
        lemma - The lemma.
        entryCount - The entry count to add to the current count. Must be positive.
      • removeEntryCategory

        void removeEntryCategory(java.lang.String entry,
                               java.lang.String category)
        Remove given category for an entry.
        Parameters:
        entry - The entry.
        category - The category to remove
      • removeEntry

        void removeEntry(java.lang.String entry)
        Remove entry.
        Parameters:
        entry - The entry to remove.
      • getLexiconEntry

        LexiconEntry getLexiconEntry(java.lang.String entry)
        Get a lexicon entry.
        Parameters:
        entry - Entry for which to get lexicon information.
        Returns:
        LexiconEntry for entry, or null if not found.

        Note: this does NOT call the part of speech guesser.

      • setLexiconEntry

        LexiconEntry setLexiconEntry(java.lang.String entry,
                                   LexiconEntry entryData)
        Set a lexicon entry.
        Parameters:
        entry - Entry for which to get lexicon information.
        entryData - The lexicon entry data.
        Returns:
        Previous lexicon data for entry, if any.
      • getLexiconSize

        int getLexiconSize()
        Get number of entries in Lexicon.
        Returns:
        Number of entries in Lexicon.
      • getEntries

        java.lang.String[] getEntries()
        Get the entries, sorted in ascending order.
        Returns:
        The sorted entry strings as an array of string.
      • getCategories

        java.lang.String[] getCategories()
        Get the categories, sorted in ascending order.
        Returns:
        The sorted category strings as an array of string.
      • containsEntry

        boolean containsEntry(java.lang.String entry)
        Checks if lexicon contains an entry.
        Parameters:
        entry - Entry to look up.
        Returns:
        true if lexicon contains entry. Only an exact match is considered.
      • getCategoriesForEntry

        java.util.Set<java.lang.String> getCategoriesForEntry(java.lang.String entry)
        Get categories for an entry in the lexicon.
        Parameters:
        entry - Entry to look up.
        Returns:
        Set of categories. Null if entry not found in lexicon.
      • getCategoriesForEntry

        java.util.Set<java.lang.String> getCategoriesForEntry(java.lang.String entry,
                                                            boolean isFirstEntry)
        Get categories for an entry.
        Parameters:
        entry - Entry to look up.
        isFirstEntry - True if entry is first in sentence.
        Returns:
        Set of categories. Null if entry not found in lexicon.
      • getCategoriesForEntry

        java.util.Set<java.lang.String> getCategoriesForEntry(java.util.List<java.lang.String> sentence,
                                                            int entryIndex)
        Get categories for an entry in a sentence.
        Parameters:
        sentence - List of entries in sentence.
        entryIndex - Index within sentence (0-based) of entry.
        Returns:
        Set of categories. Null if entry not found in lexicon.
      • getNumberOfCategoriesForEntry

        int getNumberOfCategoriesForEntry(java.lang.String entry)
        Get number of categories for an entry.
        Parameters:
        entry - Entry for which to find number of categories.
        Returns:
        Number of categories for entry.
      • getCategoryCountsForEntry

        java.util.Map<java.lang.String,MutableInteger> getCategoryCountsForEntry(java.lang.String entry)
        Get category counts for an entry.
        Parameters:
        entry - Entry to look up.
        Returns:
        Map of counts for each category. String keys are tags, Integer counts are values. Null if entry not found in lexicon.
      • getLargestCategory

        java.lang.String getLargestCategory(java.lang.String entry)
        Get category with largest count for an entry.
        Parameters:
        entry - Entry to look up.
        Returns:
        Category with largest count. Null if entry not found in lexicon.
      • getCategoryCount

        int getCategoryCount(java.lang.String entry,
                           java.lang.String category)
        Get count for an entry in a specific category.
        Parameters:
        entry - Entry to look up.
        category - Category for which to retrieve count.
        Returns:
        Number of occurrences of entry in category.
      • getLemma

        java.lang.String getLemma(java.lang.String entry)
        Get lemma for an entry.
        Parameters:
        entry - Entry to look up.
        Returns:
        Lemma form of entry.
      • getLemmata

        java.lang.String[] getLemmata(java.lang.String entry)
        Get all lemmata for an entry.
        Parameters:
        entry - Entry to look up.
        Returns:
        Lemmata forms of entry.
      • getLemma

        java.lang.String getLemma(java.lang.String entry,
                                java.lang.String category)
        Get lemma for an entry in a specific category.
        Parameters:
        entry - Entry to look up.
        category - Category for which to retrieve lemma.
        Returns:
        Lemma form of entry.
      • getEntryCount

        int getEntryCount(java.lang.String entry)
        Get total count for an entry.
        Parameters:
        entry - Entry to look up.
        Returns:
        Count of occurrences of entry.
      • getCategoryCount

        int getCategoryCount(java.lang.String category)
        Get category count.
        Parameters:
        category - Get number of times category appears in lexicon.
        Returns:
        Category count.
      • getCategoryCounts

        java.util.Map<java.lang.String,MutableInteger> getCategoryCounts()
        Get category counts.
        Returns:
        Category counts map.
      • getNumberOfCategories

        int getNumberOfCategories()
        Get number of categories.
        Returns:
        Number of categories.
      • saveLexiconToTextFile

        void saveLexiconToTextFile(java.lang.String lexiconFileName,
                                 java.lang.String encoding)
                                   throws java.io.IOException
        Save lexicon to a file.
        Parameters:
        lexiconFileName - File containing the lexicon.
        encoding - Character encoding of lexicon file text.
        Throws:
        java.io.IOException
      • getPartOfSpeechTags

        PartOfSpeechTags getPartOfSpeechTags()
        Get the part of speech tags list used by the lexicon.
        Returns:
        Part of speech tags list.
      • setPartOfSpeechTags

        boolean setPartOfSpeechTags(PartOfSpeechTags partOfSpeechTags)
        Set the part of speech tags list used by the lexicon.
        Parameters:
        partOfSpeechTags - Part of speech tags list.
      • getLongestEntryLength

        int getLongestEntryLength()
        Get the longest entry length in the lexicon.
        Returns:
        The longest entry length in the lexicon.
      • getShortestEntryLength

        int getShortestEntryLength()
        Get the shortest entry length in the lexicon.
        Returns:
        The shortest entry length in the lexicon.