Hey! Thanks for your patience, I didn’t have much time to write a full comprehensive answer. I think I also confused some things in my answers, sorry for that! Also, again, I’m no time series expert, so I hope you bear with me. If you feel like we’re stuck, we can also try to ping some of the people who definitely know alot more about this than I do. :)

But lets, take a step back. your model is (let me drop the subscripts and take \mu as you have defined it above:

Y \sim MVN(\mu,\Sigma),

where \mu is a vector as specified in you post and \Sigma is variance-covariance matrix with an auto-regressive structure. Now – just to clarify – you indexed everything by N or n, do you have like one time series, or is it multiple units you observe over time?

Ok, I hope I get this right:

If the series of Y you observe is a single time series (coming from one unit), an auto-regressive AR(1) process for this would be

y_{n} \sim N(\mu_n + \rho y_{n-1}, \sigma),

so that is no longer a multivariate normal (MVN), but a univariate normal and the AR component enters into the mean vector – this is just the lagged dependent variable. That is basically, the model I described above.

It’s also possible to specify a moving average MA(1) process for the data, which is given by

\begin{align}
y_{n} &\sim N(\mu_n + \rho (y_{n-1} - \mu_{n-1}), \sigma) \\
\leftrightarrow y_{n} &\sim N(\mu_n + \rho \epsilon_{n-1}, \sigma),
\end{align}

where \epsilon_{n-1} is the lagged residual. This seems to be more like what you wanted to do right? And because,

\begin{align}
\begin{bmatrix}
y_{n-2} \\
y_{n-1} \\
y_{n}
\end{bmatrix} \sim MVN \left(
\begin{bmatrix}
\mu_{n-2} \\
\mu_{n-1} \\
\mu_{n}
\end{bmatrix},
\begin{bmatrix}
\sigma^2 & \rho & \rho^2 \\
\rho & \sigma^2 & \rho \\
\rho^2 & \rho & \sigma^2
\end{bmatrix}
\right)
\end{align}

implies

\begin{align}
y_{n-1} | y_{n-2} &\sim N \left(\mu_{n-1} + \rho(y_{n-2} - \mu_{n-2}), \sqrt{\sigma^2(1 - \rho^2)} \right) \\
y_{n} | y_{n-1} &\sim N \left(\mu_{n} + \rho(y_{n-1} - \mu_{n-1}), \sqrt{\sigma^2(1 - \rho^2)} \right) \\
\end{align}

and letting y_{n-1} - \mu_{n-1} = \epsilon_{n-1} as above and \omega = \sqrt{\sigma^2(1 - \rho^2)}, we can just write:

y_{n} \sim N(\mu_n + \rho \epsilon_{n-1}, \omega)

so it has the same structure as an MA(1) process. The huge benefit is, that you don’t have to use the multivariate normal distribution, but you can rather use the univariate normal. This should be much faster.

And if you take a look again at the other thread you can see, that a similar structure hase been discussed there, but for multiple units (multiple time series --> a panel), thats why I thought that this thread could have been helpful.

Regarding assumptions of the models… well, there is a ton of literature about AR(1) and MA(1) models, but I like to “look” at the model. For example if you come from the multivariate normal distribution it’s pretty clear that |\rho|<1 for example… I can not tell you if one of these models makes sense for your specific application, at least not without knowing a bit more about the context.

How to derive the marginal distribution of `y[1]`

and add it to the target density?

*—see edit below!*

I guess this is usually not done. If you check the users guide for the AR(1) and MA(1) models, it is not done there. If you have a look again at the StanCon tutorial that I have linked above you should find the the gist of it – but generally, you are probably safe if you don’t do that (at least if you have more than just a handful of data points).

**EDIT**: I got around doing this over in another thread, in case you want to check that out, @Gang.

I hope that was not that confusing…