Intrinsic evaluation¶
Word analogy task¶
One of the de-facto standard intrinsic evaluations for word embeddings is the word analogy task. The dataset known as the Google test set became the de-facto standard for evaluating word embeddings, but it is not balanced and samples only 15 linguistic relations, with 19,544 questions in total. A newer dataset is BATS: it is considerably larger (98,000 questions) and is balanced: it contains 40 different relations of 4 types (inflections, derivational morphology, lexicographic and encyclopedic semantics) with 50 unique pairs per relation.
Vecto comes with the script to test 6 different methods of solving word analogies. You can run the script from command line, indicating the path to the config file as the only argument.
python3 -m vecto.benchmarks.analogy /path/to/config_analogy.yaml
The configuration file is structured as follows:
path_vectors: [
"/path/to/your/vsm1/"
"/path/to/your/vsm2/"
]
alpha: 0.6
# this is the exponent for Sigma values of SVD embeddings
normalize: true
# specifies if embeddings should be normalized
method: LRCos
# allowed values are 3CosAdd, 3CosAvg, 3CosMul, SimilarToB, SimilalarToAny, PairDistance, LRCos and LRCosF
exclude: True
# specifies if question words should be excluded from possible answers
path_dataset: "/path/to/the/test/dataset"
# path to dataset. last segment of the path will be interpreted as dataset name
path_results: "/path/where/to/save/results"
# Subfolders for datasets and embeddings willl be created automatically
Vecto also support direct call from run(embeddings, options) function. The options has the same parameters as that in yaml file. This function returns a dict, which indicate the word analogy results.
For example, the following lines can be used to get word analogy results:
path_model = "./test/data/embeddings/text/plain_no_file_header"
model = vecto.model.load_from_dir(path_model)
options = {}
options["path_dataset"] = "./test/data/benchmarks/analogy/"
options["path_results"] = "/tmp/vecto/analogy"
options["name_method"] = "3CosAdd"
vecto.benchmarks.analogy.analogy.run(model, options)
Dataset¶
The BATS dataset can be downloaded here. The script expects the input dataset to be a tab-separated file formatted as follows:
cat cats
apple apples
In many cases there is more than one correct answer; they are separated with slashes:
father dad/daddy
flower blossom/bloom
harbor seaport/haven/harbour
There is a file with a word pairs list for each relation, and these files are grouped into folders by the type of the relation. You can also make your own test set to use in Vecto, formatted in the same way.
Analogy solving methods¶
Consider the analogy \(a\):\(a'\) :: \(b\):\(b'\) (\(a\) is to \(a'\) as \(b\) is to \(b'\)). The script implements 6 analogy solving methods:
Pair-based methods:
**3CosAdd**: \(b'=argmax_{~d\in{V}}(cos(b',b-a+a'))\), where \(cos(u, v) = \frac{u\cdot{}v}{||u||\cdot{}||v||}\)
**PairDistance**, aka PairDirection: \(b'=argmax_{~d\in{V}}(cos(b'-b,a'-a))\)
**3CosMul**: \(argmax_{b'\in{V}} \frac{cos(b',b) cos(b',a')} {cos(b',a) + \varepsilon}\) \(\varepsilon = 0.001\) is used to prevent division by zero)
**SimilarToB**: returns the vector the most similar to the \(b\).
SimilarToAny: returns the vector the most similar to any of \(a\), \(a'\) and \(b\) vectors.
Set-based methods: (current state-of-the-art)
**3CosAvg**: \(b'=argmax_{~b'\in{V}}(cos(b',b+\mathit{avg\_offset}))\) , where \(\mathit{avg\_offset}=\frac{\sum_{i=0}^m{a_i}}{m} - \frac{\sum_{i=0}^n{b_i}}{n}\)
**LRCos** \(b'=argmax_{~b'\in{V}}(P_{~(b'\in{target\_class)}}*cos(b',b))\)
**LRCosF**: a version of LRCos that attempts to only take into account the relevant distributional features.
Caveat: Analogy has been shown to be severely misinterpreted as evaluation task. First of all, all of the above methods are biased by distance in the distributional space: the closer the target is, the more likely you are to hit it. Therefore high scores on analogy task indicate basically to what extent the relations encoded by a given VSM match the relations in the dataset.
Therefore it would be better to not just provide an average score on the whole task, as it is normally done, but to look at the scores for different relations, as that may show what exactly the model is doing. Since everything cannot be close to everything, it is to be expected that success in one type of relations would come at the expense of others.
Extrinsic evaluation¶
The following tasks will soon be available via vecto:
- POS tagging
- Named entity recognition
- Chunking