Video Generation from Text using Tree like decision using GANs. The text annotation or statement is encoded using the LM into a embedding, which then is combined with random vector to generate relevant videos and images.
- VAEGAN
- VAEGAN with Latent Variable optimization
- VAEGAN with anti reconstruction loss
- VAEGAN + Anti reconstruction loss + Latent variable models
- variants of above models with different Hyper parameters
- LSTM based model for next frame creation
- Wasserstein GAN setting discriminator
- Word embedding based LM
- Attention based model for classification structure
- The relevant models are in
Tensorflow >= v1.2
- Experimentation with above mentioned models
- The training is done over self generated Bouncing MNIST with sentence based annotation
- The gensim pre trained fastText wikipedia work embeddings are used for embedding tokens as vectors
- Non attention based models are used initially to generate starting frames.
- The GAN tree trains to look for discriminative features (unverified)
- UCF101 : 3 channel image
- Bouncing MNIST
- We use Sync-DRAW to develop our datasets (https://github.com/syncdraw/Sync-DRAW)
- UCF101 is available from University of Montreal
- We use multiple GPU training (or a single K80 or Titan X)
- Cluster traning is impossible for now