This allows us to separate the representation of form and motion in the case
of natural image sequences, a desirable property that is frequently studied in natural movies (see Cadieu and Olshausen, 2012). Furthermore, it allows us to learn how these features should evolve along time to encode the structure of the movies well. In the same way as static filters learned in this way often resemble RFs in visual cortex, the temporal projections learned here could be compared to lateral connections and correlations between neurons in visual selleck compound cortex. Temporal Autoencoding: The idea behind many feature extraction methods such as the autoencoder ( Vincent et al., 2010) and reconstruction ICA ( Le et al., 2011) is to Daporinad find an alternative encoding for a set of data that allows for a good reconstruction of the dataset. This is frequently combined with sparse priors on the encoder. We propose to use a similar framework for TRBMs based on filtering (see Crisan and Rozovskii, 2011) instead of reconstructing through the use of a denoising Autoencoder (dAE). The key difference between an AE and a dAE is that random noise is added to each training sample before it is presented
to the network, but the training procedure still requires the dAE to reproduce the original training data, before the noise was added, thereby denoising the training data. The addition of noise forces the model to learn reliable and larger scale structure from the training data as local perturbations from the added noise will change each time a sample is presented and are therefore unreliable. In the aTRBM, we leverage the concept of denoising by treating
previous samples of a sequential dataset as noisy versions of the current time point that we are trying to reproduce. The use of the term noise here is somewhat of a misnomer, but is used to keep in line with terminology from dAE literature. In the aTRBM case, no noise is added to the training data, but the small changes that exist between consecutive frames of the dataset are conceptually considered to be noise in the terms that we want to remove these changes from previous samples to be able to correctly reproduce or predict the data at the current time point. We can therefore use a dAE approach to constrain the temporal weights. Oxymatrine In this sense, we consider the activity of the time-lagged visible units as noisy observations of the systems state, and want to infer the current state of the system. To this end, we propose pre-training the hidden-to-hidden weights of the TRBM by minimizing the error in predicting the present data frame from the previous observations of the data. This is similar to the approximation suggested by Sutskever et al. (2008), where the distribution over the hidden states conditioned on the visible history is approximated by the filtering distribution. The training is done as follows.