Question Should I use available “pre-trained” word vectors?
Answer almost always
Question Should I update(“fine tune”) my own word vectors?
- If you only have a small training data set, don’t train the word vectors
- If you have a large dataset, it probably will work better to train = update = fine-tune word vectors to the task
Prevents overfitting when we have a lot of features
Leaky ReLU/Parametric ReLU
Xavier initialzation has variance inversely proportional to fan-in n_in and fan-out n_out:
You can just use a constant learning rate
Better results can generally be obtained by allowing learning rates to decrease as you train
by hand: halve the learning rate every k epochs
By a formula, for epoch t
Fancier method like cyclic learning rates(q.v.)
Fancier optimizers still use a learning rate but it may be an initial rate that the optimizer shrinks - so may be able to start high