TLDR:
We can include more flexibility by considering latent variables.
We can use flexible flow transformations (bijective, Surjective, stochastic) to encode the covariates or other observations.
These could provide more expressive representations and possibly more scalability by reducing the dimension drastically.
We could use a library of pre-trained relevant covariate Embeddings for more expressive representations.
These transformations are fully compatible with state space models.