########################################## ### Phramer 1.0.5 (April 9, 2006) ### ########################################## Note: this is a pre-release version of 1.0.5 It is mostly documented until September 18, 2006 Version 1.0.2 was never released (officialy) ########################################## ### Phramer 1.0.2 (September 18, 2006) ### ########################################## General - Added "Getting started" Decoder: - Bugfixes - Speed: 5-120% faster - Now it can work without a configuration file (all parameters passed through the command line) - Now it can take non-xml input. The parameter to set input type: -inputtype. Possible values: text, xml (default). Impact: no need to escape special characters (& , < and >) when non-xml input is provided - Added -start-id parameter that specifies the starting identifier for the generated files (lattices, [re]score files, n-best lists). This feature enables simple mechanisms to distribute decoding tasks. Default: 0 - Added -x-score-mask / -sc-mask parameter that defines the format of simple score files: defines the probabilities/features that will be outputed and their order. Default: d,l,t,w. Vocabulary: d, l, t, w, x (custom/extended probability) - Added compatibility level. Activated by "-compatibility" options. For Phramer 1.0.2, the value can be 1.00.02.00, 1.0.2.0, 1.0.2, 1000200. When run with compatibility level, it is expected to pass regression tests. - Added support for assumption about the values of the features (probabilities) during decoding, through -x-level-good-probabilities parameter 0 - no assumption can be made about the probabilities 1 - pLM + pT + pW + pX <= 0 (in logarithm) 2 - [1] and pLM + pX <= 0 (in logarithm) 3 - [2] and pLM <= 0 and pT <= 0 and pW <= 0 and pX <= 0 (in logarithm) Default value: 0. Impact: up to twice faster decoding (for a standard configuration. Other configurations may have more than 100% speed improvement). Note that standard LMs and TTs imply level 2 or beyond, thus level 2 can be safely assumed unless the data structure generation is atypical (i.e.: greater than 1 probabilities in the translation table). - Changed the default future cost calculator to PharaohDFutureCostCalculator. This class adds distortion to the cost. It has a low impact in quality of the output, but (always) positive impact. Only for compatibility level >= 1.0.2 - Added remote binary translation table. Communication takes place through BinaryRemoteService. The server will be started with: org.phramer.v1.server.StartBinaryRemoteTranslationTable The configuration file will contain a reference to the translation table in the following format: remoteb::/ The filemask is required because the client must be aware of the vocabulary of the translation table. The communication takes place using word indices (integers). - LMPreprocessorWord now works with the StringFilter interface instead of WordLmPrepare (StringFilter is now the standard interface for String-to-String conversion) - LMPreprocessor.useSpecialFactorization() - should return true if the LM preprocessor requires special handling, due to special factorization ( i.e.: P(wn | wn-1 tn-2) where w = word, t = [POS] tag ) and it must be aware which is wn in order to properly preprocess the array of tokens. - VocabularyFastHashBackOffLM - faster implementation of LM class. It uses info.olteanu.utils.lang.HashMapLongToDouble instead of java.util.HashMap. Advantage: slightly faster decoding. Disadvantages: longer initialization time. More memory consumed. Recomended only in server configurations. Activated by "fasthash:" prefix - Distributed decoding: now it supports Phramer extensions (handles nbest list generation, etc.) - Added distributed decoding through passwordless ssh/telnet: org.phramer.tools.distributed.DistributedDecodingSshOrTelnet Advantages: no need to set up servers to take commands for decoding. Higher security (if ssh is used) Disadvanrages: it is required to set up passwordless login. Requires the scripts to be runable through an ssh command (ssh ) - Extension interface was changed and expanded - custom probability function Parameter: -x-custom-probability-calculator (1) - defines the class that implements the custom probability. The argument (String[]) passed to the constructor of the implementation is defined through -x-custom-probability-calculator-param (1-n) (optional Phramer parameter). The custom probability class must implement org.phramer.v1.decoder.extensionifs.CustomProbability . The custom probability for a specific hypothesis state is a vector of features. The weights required to calculate the overall probability are passed through the -px / -weight-x (1-n) parameter. The implementation may define values for future cost evaluation. Defining them correctly (or approximating them as good as posible) help the decoding by guiding the search algorithm towards the final optimal state. In case the user may want to use more than one custom probability, we recomend to use org.phramer.v1.decoder.extensionifs.impl.CustomProbabilityDispatcher . The class allows to aggregate the feature vectors from a set of custom probability calculators into a single vector. Implementation examples can be found in the package called org.phramer.v1.decoder.extensionifs.impl (CustomProbabilityPharaohLanguageModels, CustomProbabilityPharaohDistortion, CustomProbabilityPharaohTranslationModels, CustomProbabilityPharaohWordPenality) - custom constraint processor The user of Phramer can define certain restrictions in hypothesis state generation - can forbid the expansion of a certain state into a specific state based on user-defined criteria. Or can penalize certain states for violation of specific constraints. The custom class must implement org.phramer.v1.decoder.extensionifs.ConstraintProcessor. The parameter to define the custom class: -x-constraint-processor (1) The parameter (String[]) to be passed to the constructor of the class is defined through the optional Phramer parameter -x-constraint-processor-param (1-n). Multiple constraint processors can be enabled using org.phramer.v1.decoder.extensionifs.impl.ConstraintProcessorDispatcher as the constraint processor defined with -x-constraint-processor. In order to use the penalties for the soft constraints (constraints that don't prevent hypothesis state generation, only penalize the violations), the custom probability class org.phramer.v1.decoder.extensionifs.impl.CustomProbabilitySoftConstraints is required. - alter translation variant table The user can alter the table that contains all the possible translations for each input phrase, by adding/removing entries or by altering probabilities. The translation variant table to be altered is generated using the translation table, the markup in the input sentence (pre-translations) or by deriving entries for unknown words. The class that alters/inspects the translation variant table must implement org.phramer.v1.decoder.extensionifs.PhraseTranslationVariantInspector . The class to be used is defined through the parameter -x-translation-variants-inspector (1). The parameter (String[]) to be passed to the constructor of the class is defined through the optional Phramer parameter -x-translation-variants-inspector-param (1-n). - phrases from unknow words processor Continuous phrases made of unknown words can have assigned translated phrases through an extension that implements org.phramer.v1.decoder.extensionifs.OutOfTranslationTablePhraseGenerator Phramer parameters: -x-unk-phrases (1) -x-unk-phrases-param (1-n) (optional) MERT: - New normalization method for lambda: the median for the absolute values of the values in the vector equals 1 - Selection of the normalization method for lambda: parameter normalizer.type - sum: sum of the absolute values in the lambda vector should be 1 - max: the values will be in the range -1 .. 1 (where at least one of them will be equal or very close to -1 or 1) (default) - median: described above - The properties files now loaded with PropertiesTools, allowing higher flexibility in defining configuration files - The properties can be overridden in command line, with "--" prefix. I.e.: mert.sh properties.txt --file.dev.f fileFR.txt --file.dev.e fileEN.txt - Added multi-reference MERT Activated through: file.dev.f = param_for_evaluator ; file.dev.f.reference1 ; file.dev.f.reference2 ; ... In case of BLEU, the parameter specifies the method of choosing the reference length between the multiple references per sentence. Possible values: min, max, closest, closest_down, closest_up, avg or median - min - reference length = minimum length across the references; - max - reference length = maximum length across the references; - closest - reference length = closest length across the references; - closest_down - reference length = closest length across the references, shorter than the hypothesis length; - closest_up - reference length = closest length across the references, longer than the hypothesis length; - avg - reference length = average length of the references; - median - reference length = median length of the references; - Added variable-N BLEU. Not currently supported through org.phramer.v1.decoder.main.MERTMain - Added option to choose between the evaluator type through evaluator.type parameter. Current evaluators supported: bleu (bleuN4, default option), wer, per General toolkit: - PropertiesTools: loads properties files, allowing to define properties using other property ( this.property = stuff ${other.property} other stuff ), include ( @@include baseProperties.ini ), import (@@import baseProperties.ini). - include: expands properties (using ${...} syntax) after solving includes - import: expands properties before solving imports - @@include/import : the path to the file that is included/imported is relative to the current folder - @@@include/import : the path to the file that is included/imported is relative to the file that reffers it - WatchDog - triggers an action, after a specified ammount of time - Added BinaryRemoteService (to support binary remote translation table) - Added application dispatcher - now a port can service more than one RemoteService application - HashMapLongToDouble - to support fasthash language model #################################### ### Phramer 1.0.1 (May 14, 2006) ### #################################### - Fixed bugs. Affected features: remote translation table, forced alignment. The output for decoding and MERT, using standard configuration, should be the same as for Phramer 1.0.0 - Added 2 more methods for executing the decoder at MERT training. Now there is pharaoh mode, that works with Pharaoh or Phramer plus Carmel (the default mode), the phramer-external mode, which works only with Phramer, but as an external code (different VM) and the phramer-internal mode, which executes the decoder within the same virtual machine and also caches the language models and the translation table internally) - Greatly improved the loading speed of the binary language models (4-5 times) - Added optimization for translation table loading, which works on sorted translation tables. The minimum requirement is that all entries that have the same foreign phrase f should be grouped in the translation table file. Activated by "sorted:" option in the configuration file (see ./data/phramer.fast.ini). Adds 5%-50% improvement (depeding on the percentage of pruned entries, as a consequence of the configuration options) - Added additional mode of loading translation tables - binary translation table, pre-pruned and sorted, that contains minimum ammount of information (only f and e phrases and the total probability, according to the configuration options) in single precision mode (4 bytes/number), serialized for space and deserialized at runtime. The application: low memory consumption, especially for interactive applications that cannot benefit from translation table pruning (based on the input sentences). Side effect: very short loading time compared with text-based translation tables, slight decrease in perfomance (estimated: 15-30% in regular applications). Requires org.phramer.tools.ConvertLM2Binary tool to generate the binary translation tables. The binary translation tables are reffered through "nio:binary_tt_file_mask" (see ./data/phramer.fast.ini). The memory requirements for the translation table drops by about 40-50%. - Speed improvements in decoding (less than 10%) - Other speed improvements in config loading - Added simple scoring/rescoring files (space-separated) support (-simple-sc parameter) ###################################### ### Phramer 1.0.0 (April 30, 2006) ### ###################################### - First version