How to predict the wind direction of tomorrow

Tokyo.R #71

Takuto Kotsubo

15 July 2018

Table of content

Table of content

  1. Introduction

  2. Circular Package and Circular statistics

  3. Autoregressive model (AR model)

  4. Apply the AR model on the Circular data

  5. Make the Autoregressive model based on the Circular

  6. Analysis for the wind direction data

Introduction

My profile

  • Name: Takuto Kotsubo

  • Twitter: airspace_nobo

  • Affiliation: Graduate School of Engineering

  • Major: Time series analysis

  • Hobby: Shogi, Bouldering

  • Favorite ggplot2 theme : theme_classic

Meeting circular statistics

Why did I come here ?

  • Thorough my research, I will introduce you that various things can be done in rstan !!!!

Circular Package and Circular statistics

Circular Statistics

  • Circular data refers to data recorded as points which directions are measured and arises in biology, geography, medicine, astronomy, and many other areas.

  • The usual summary statistics, such as the mean and standard deviation, cannot be used with angular values.

Circular plot

  • Use circular package
par(mfcol = c(1,2))
plot(x,main = "Circular plot")
rose.diag(x,main = "Rose diagrame")

Circular plot

  • Of course, ggplot can do it and more beautiful…
ggplot(x, aes(x =theta)) + 
  geom_histogram(bins = 50, fill = "steelblue") + 
  coord_polar(start = 0)  + scale_fill_brewer() + 
  scale_x_continuous("", limits = c(0, 2*pi), breaks = seq(0, 2*pi)) +
  theme_bw() + labs(title = "Circular histogram")

Circular mean direction

  • The average of the two radians \(\pi/3\) and \(\pi/5\) (black line)
  • Arithmetic average is red line, circular mean direction is blue line

mean(y) 
[1] 3.141593
mean.circular(y) 
Circular Data: 
Type = angles 
Units = radians 
Template = none 
Modulo = asis 
Zero = 0 
Rotation = counter 
[1] 0

Circular Standard Deviation

  • circular standard deviation is defined by \(\sqrt{-2 \log(R)}\), \(R\) implies “mean resultant length”

# same data
x <- runif(1000,0,2*pi) 
# normal standard deviation
sd(x) 
[1] 1.740852
# circular standard deviation
sd.circular(circular(x)) 
[1] 2.383737

Circular RMSE

  • RMSE is a measure of the differences between observed values (\(\theta_t\)) and predicted values (\(\hat{\theta_t}\)) by the model,

\[\textrm{RMSE} = \sqrt{\frac{1}{n} \sum_{t=1}^n (\theta^{dif}_t)^2}, \]

where \(\theta^{dif}_t\) consider the circular rotations,

\[ \theta^{dif}_t = \begin{cases} \theta_t - \hat{\theta_t} + 2 \pi & (\theta_t - \hat{\theta_t}< - \pi) \\ \theta_t - \hat{\theta_t} & (- \pi \leq \theta_t - \hat{\theta_t}\leq \pi) \\ \theta_t - \hat{\theta_t}- 2 \pi & (\theta_t - \hat{\theta_t}> \pi) \end{cases}. \]

Circular RMSE with plot

  • Difference betweeen \(\frac{11 \pi}{6}\) and \(\frac{\pi}{3}\) is not \(\frac{3\pi}{2}\) but \(\frac{\pi}{2}\) !!

\[ \frac{11 \pi}{6} - \frac{\pi}{3} = \frac{3\pi}{2}??\]

Circular test

  • Rayleigh test : Whether the angle data is biased …

  • Kupier test : Whether angle data follow the von Mises distribution …

  • Mardia-Watoson-Wheeler test : Whether the angle data of the two groups follow the same distribution …

Autoregressive model (AR model)

Autoregressive process

  • Autoregressive process means that the current data is represented by past data, AR(\(p\)) model can be written as

\[y_t = c + \sum_{i=1}^p \phi_i y_{t-i} + \varepsilon_t,~~ \varepsilon_t \sim \mathcal{N}(0,~\sigma^2).\]

  • Joint probability distribution can be written as follows, using value before the time point \(t\)

\[f(y_1, y_2, ...|c, \phi_i, \varepsilon) = \prod_{t=1+p}^{n}f(y_t|y_{t-1}, ...,y_{t-p},c, \phi_i, \varepsilon). \]

Maximize log likelihood method

  • From the time point \(t-1\), the value at the time point \(t\) as follows,

\[ y_t|y_{t-1},...,y_{t-p} \sim \mathcal{N}(c + \sum_{i=1}^p \phi_i y_{t-i},~\sigma^2). \]

  • Consider the log likelihoods at each time point

\[ \begin{eqnarray*} \mathcal{L} &=& \log f(y_1, y_2, ...|c, \phi_i, \varepsilon) \\ &=& \log f(y_{1+p}|y_{p},...,y_{1}) + ... + \log f(y_{n}|y_{n-1},...,y_{n-p}). \end{eqnarray*} \]

Parameter estimation method

  • My rstan code (AR(\(p\)) model)
data{
  int N; // data size
  int P; // AR(P) 
  vector[N] theta; // data set
}
parameters{
  real alpha_0; // const parameter
  row_vector[P] alpha_1; // coef parameter
  real<lower=0> sigma; // Variance 
}
model{
  alpha_0 ~ normal(0,100); sigma ~ normal(0,100);
  for(i in 1:P){
    alpha_1[i] ~ normal(0,100); // N(0,100)
  }
  for(n in (1+P):N){
    vector[P] pre_theta; 
    for(k in 1:P){
      pre_theta[k] = theta[n-k];
    }
    target += 
      normal_lpdf(theta[n]|alpha_0 + ( alpha_1 * pre_theta),sigma);
  }
}
generated quantities{
  vector[N-P] log_likelihood;
  for(n in (1+P):N){
    vector[P] pre_theta;
    for(k in 1:P){
      pre_theta[k] = theta[n-k];
    }
    log_likelihood[n-P] = 
      normal_lpdf(theta[n]|alpha_0 + ( alpha_1 * pre_theta),sigma);
  } 
}

Sample data set

  • The luteinzing hormone in blood (lh data set)
  • 10 minutes interbals, 48 samples
data.frame(lh) %>% 
    mutate(index = row_number()) %>%
    ggplot(aes(x=index,y=lh)) +
    geom_line()

MCMC settings

  • \(1000\) warm up, \(1000\) iteration, \(4\) chains
rstan_options(auto_write=TRUE) # auto save
options(mc.cores=parallel::detectCores()) # multi core

ar2_model <- readRDS("model/circular_AR2.rds") # complie model
fit3 <- sampling(ar2_model,data = list(N=length(lh),P=3,theta=lh),
                 iter=2000, chains=4) # sampling by model

Select AR(\(3\)) model by WAIC

  • \(\hat{R} < 1.01\) indicate all parameters convergence


p WAIC RMSE
1 64.6 0.449
2 63.8 0.442
3 63.7 0.436
4 65.2 0.438
5 66.4 0.441
6 65.4 0.430
7 65.4 0.425

Autocorrelation of Markov Chains

Posterior Distributions

Apply the Autoregressive model to the Circular data

Wind Direction in Col de la Roa

  • In a place named “Col de la Roa” in the Italian Alps, It reported the wind direction records (wind data set)
  • Total of 310 measures

Preprocessing and MCMC settings

  • change the domain of \(\theta\) and make a data list for MCMC
data(wind) 
wind_data <- wind %>%  
  data.frame(t = seq(1,310,1), tmp=.) %>% # put label
  mutate(theta_real = dplyr::if_else(tmp>pi,tmp - 2*pi, tmp)) %>% 
  dplyr::select(-tmp) 
d.dat7 <- list(N=length(wind_data$theta_real), 
               P=7,theta=wind_data$theta_real) # data list
  • \(1000\) warm up, \(1000\) iteration, \(4\) chains
rstan_options(auto_write=TRUE) # auto save
options(mc.cores=parallel::detectCores()) # multi core

ar2_model <- readRDS("model/circular_AR2.rds") # complie model
fit_ar_7 <- sampling(ar_model,data = d.dat7, 
                     iter=2000, chains=4) # sampling by model

Autocorrelation of Markov Chains

Posterior Distributions

Select AR(\(7\)) model by WAIC

  • \(\hat{R} < 1.01\) indicate all parameters convergence
  • but fitting is wrong …


p WAIC RMSE
1 890.1 0.993
2 888.2 0.987
3 888.8 0.989
4 889.8 0.989
5 885.2 0.984
6 888.3 0.980
7 882.5 0.979

Make the Autoregressive model based on the Circular

State-space model

\[\mu_t = \left( \begin{array}{c} \cos \theta_t \\ \sin \theta_t \end{array} \right), ~~ \theta_t \sim \mathcal{PN}(\mu_t, \sum)\]

\(\mathcal{PN}\) distribution

  • If rstan, you can decsribe a complex distribution

Circular State-space model

\[ \left( \begin{array}{c} \cos \theta_t \\ \sin \theta_t \end{array} \right) = \left( \begin{array}{c} \alpha_{c,0} \\ \alpha_{s,0} \end{array} \right) + \left( \begin{array}{cc} \beta_{c,1,1} & \beta_{c,1,2} \\ \beta_{s,1,1} & \beta_{s,1,2} \end{array} \right) \left( \begin{array}{c} \cos \theta_{t-1} \\ \sin \theta_{t-1} \end{array} \right) \\ + \left( \begin{array}{cc} \beta_{c,2,1} & \beta_{c,2,2} \\ \beta_{s,2,1} & \beta_{s,2,2} \end{array} \right) \left( \begin{array}{c} \cos \theta_{t-2} \\ \sin \theta_{t-2} \end{array} \right) + \cdots + \\ \left( \begin{array}{cc} \beta_{c,p,1} & \beta_{c,p,2} \\ \beta_{s,p,1} & \beta_{s,p,2} \end{array} \right) \left( \begin{array}{c} \cos \theta_{t-p} \\ \sin \theta_{t-p} \end{array} \right) + \left( \begin{array}{c} \varepsilon_{c,t} \\ \varepsilon_{s,t} \end{array} \right) \]

where \(\boldsymbol{\alpha}, \boldsymbol{\beta}\) is coefficient parameters and \(\boldsymbol{\varepsilon}\) is variance parameter.

\[ \left( \begin{array}{c} \varepsilon_{c,t} \\ \varepsilon_{s,t} \end{array} \right) \sim N \left( \left( \begin{array}{c} 0 \\ 0 \end{array} \right), \left( \begin{array}{cc} \sigma_c^2 & \rho \sigma_c \sigma_s \\ \rho \sigma_c \sigma_s & \sigma_s^2 \end{array} \right) \right) \]

Circular State-space model

\[ \left( \begin{array}{c} \cos \theta_t \\ \sin \theta_t \end{array} \right) = \left( \begin{array}{c} \alpha_{c,0} \\ \alpha_{s,0} \end{array} \right) + \left( \begin{array}{cc} \beta_{c,1,1} & \beta_{c,1,2} \\ \beta_{s,1,1} & \beta_{s,1,2} \end{array} \right) \left( \begin{array}{c} \cos \theta_{t-1} \\ \sin \theta_{t-1} \end{array} \right) \\ + \left( \begin{array}{cc} \beta_{c,2,1} & \beta_{c,2,2} \\ \beta_{s,2,1} & \beta_{s,2,2} \end{array} \right) \left( \begin{array}{c} \cos \theta_{t-2} \\ \sin \theta_{t-2} \end{array} \right) + \cdots + \\ \left( \begin{array}{cc} \beta_{c,p,1} & \beta_{c,p,2} \\ \beta_{s,p,1} & \beta_{s,p,2} \end{array} \right) \left( \begin{array}{c} \cos \theta_{t-p} \\ \sin \theta_{t-p} \end{array} \right) + \left( \begin{array}{c} \varepsilon_{c,t} \\ \varepsilon_{s,t} \end{array} \right) \]

where \(\boldsymbol{\alpha}, \boldsymbol{\beta}\) is coefficient parameters and \(\boldsymbol{\varepsilon}\) is variance parameter.

\[ \left( \begin{array}{c} \varepsilon_{c,t} \\ \varepsilon_{s,t} \end{array} \right) \sim N \left( \left( \begin{array}{c} 0 \\ 0 \end{array} \right), \left( \begin{array}{cc} \sigma_c^2 & \rho \sigma_c \sigma_s \\ \rho \sigma_c \sigma_s & \sigma_s^2 \end{array} \right) \right) \]

Circular State-space model

  • Easy to write using rstan

MCMC code for Circular State-space model

functions{
    // PN distribution (observation model)
  real circular_obs_lpdf(real theta, int P, vector pre_theta, vector alpha_0, matrix alpha_1, matrix sigma){
    vector[2] u; vector[2] mu; vector[2*P] tmp; real A; real B; real C; real D; real p;
    for(k in 1:P){
      tmp[2*k-1] = cos(pre_theta[k]); tmp[2*k] = sin(pre_theta[k]);
    }
    mu = alpha_0 + ( alpha_1 * tmp); u[1] = cos(theta); u[2] = sin(theta);
    A = quad_form(inverse_spd(sigma), u); B = u' * inverse_spd(sigma) * mu;
    C = (-0.5) * quad_form(inverse_spd(sigma), mu); D = B/sqrt(A);
    p = - log(A) - 0.5*log(determinant(sigma)) + C + log(1+(D * (normal_cdf(D,0,1)/ (exp(-D^2 /2)/sqrt(2*pi()))))); 
    return p;
  }
  // Multi Normal distribution (state model)
  real circular_state_lpdf(vector u, int P, vector pre_theta, vector alpha_0, matrix alpha_1, matrix sigma){
    vector[2*P] tmp; vector[2] mu; real p;
    for(k in 1:P){
      tmp[2*k-1] = cos(pre_theta[k]); tmp[2*k] = sin(pre_theta[k]);
    }
    mu = alpha_0 + ( alpha_1 * tmp); 
    p = multi_normal_lpdf(u|mu,sigma);
    return p;
  }
}

data{
  // data paramter
  int N; // sample size
  int P; // VAR(P) 
  vector<lower=-pi(),upper=pi()>[N] theta; // data
}

transformed data{
  vector[2] u[N];
  for(k in 1:N){
    u[k,1] = cos(theta[k]);
    u[k,2] = sin(theta[k]);
  }
}

parameters{
  vector[2] alpha_0;
  matrix[2,2*P] alpha_1;
  cov_matrix[2] sigma;
}

model{
  alpha_0[1] ~ normal(0,100); alpha_0[2] ~ normal(0,100);
  for(i in 1:2*P){
    alpha_1[1,i] ~ normal(0,100); // N(0,100)
    alpha_1[2,i] ~ normal(0,100); // N(0,100)
  }
  sigma ~ inv_wishart(4,diag_matrix(rep_vector(1,2)));
  for(n in (1+P):N){
    vector[P] pre_theta;
    for(k in 1:P){
      pre_theta[k] = theta[n-k];
    }
    target += circular_state_lpdf(u[n]|P,pre_theta, alpha_0,alpha_1,sigma);
    target += circular_obs_lpdf(theta[n]|P,pre_theta,alpha_0,alpha_1,sigma);
  }
}

generated quantities{
  vector[N-P] log_likelihood;
  for(n in (1+P):N){
    vector[P] pre_theta;
    for(k in 1:P){
      pre_theta[k] = theta[n-k];
    }
    log_likelihood[n-P] = circular_state_lpdf(u[n]|P,pre_theta, alpha_0,alpha_1,sigma) + 
                          circular_obs_lpdf(theta[n]|P, pre_theta, alpha_0, alpha_1, sigma);
  } 
}

Analysis for the wind direction data

Wind Direction in Col de la Roa

  • In a place named “Col de la Roa” in the Italian Alps, It reported the wind direction records (wind data set)
  • Total of 310 measures

Preprocessing and MCMC settings

  • change the domain of \(\theta\) and make a data list for MCMC
data(wind) 
wind_data <- wind %>%  
  data.frame(t = seq(1,310,1), tmp=.) %>% # put label
  mutate(theta_real = dplyr::if_else(tmp>pi,tmp - 2*pi, tmp)) %>% 
  dplyr::select(-tmp) 
d.dat5 <- list(N=length(wind_data$theta_real),
               P=5,theta=wind_data$theta_real) # data list
  • \(1000\) warm up, \(4000\) iteration, \(4\) chains
rstan_options(auto_write=TRUE) # auto save
options(mc.cores=parallel::detectCores()) # multi core

state_obs_model <- readRDS("model/state_obs_circular_VAR.rds") # complie model
fit_state_obs_5 <- sampling(state_obs_model,data = d.dat5, 
                            iter=5000, chains=4) # sampling by model

Autocorrelation of Markov Chains

Posterior Distributions

Select Circular State-space model(\(5\)) model by WAIC

  • \(\hat{R} < 1.01\) indicate all parameters convergence

p WAIC RMSE
1 444.2 0.915
2 442.5 0.902
3 435.0 0.908
4 436.6 0.899
5 431.9 0.892
6 437.4 0.884
7 449.6 0.886
8 456.4 0.885
9 466.6 0.886
10 461.3 0.882

Off Topic

  • rstan uses ggplot2, so you change the option easily
stan_plot(
  fit_state_obs_5,fill_color ="blue",point_est ="median",show_density=T, 
  ci_level = 0.95, outer_level=1.00, show_outer_line =T,
  pars = c("alpha_0[1]","alpha_0[2]","alpha_1[1,1]","alpha_1[1,2]",
            "alpha_1[2,1]","alpha_1[2,2]","alpha_1[1,3]",
            "alpha_1[1,4]","alpha_1[2,3]","alpha_1[2,4]")) + 
  scale_y_continuous(
    labels= c(expression(alpha[paste("c,",0)]),expression(alpha[paste("s,",0)]),
              expression(beta[paste("c,",1,",",1)]),expression(beta[paste("c,",1,",",2)]),
              expression(beta[paste("s,",1,",",1)]),expression(beta[paste("s,",1,",",2)]),
              expression(beta[paste("c,",2,",",1)]),expression(beta[paste("c,",2,",",2)]),
              expression(beta[paste("s,",2,",",1)]),expression(beta[paste("s,",2,",",2)])) %>% 
      rev(), breaks = 1:10) 

Summary

  • Amazing ggplot2 and rstan!!

  • Fantastic option facet_zoom !!

  • Useful reveal.js for making slide!!

Thank you for listening !!

Enjoy!

Appendix

Model Selection Criteria (WAIC)

  • WAIC can be viewed as an improvement on the DIC for Bayesian models. WAIC can be interpreted as a computationally convenient approximation to cross-validation and is defined as

where \(n\) is the number of data \(\theta\), \(S\) is MCMC sampling number of each parameter, \(\eta^{m, S}\) is the \(S\)-th parameter set of the \(m\)-th chain.

Projected Normal Distribution

  • Distributions of the circle can be obtained by radial projection of distributions on the plane.
  • The density of \(\Theta\) in \(U = (\cos\Theta, \sin\Theta)^T\) can be represented as

\[ \begin{eqnarray*} p(\theta|\mu, \Sigma) = \frac{1}{2\pi A(\theta)} |\Sigma|^{-\frac{1}{2}} \exp(C)\left\{1 + \frac{B(\theta)}{\sqrt{A(\theta)}} \frac{\Phi \left(\frac{B(\theta)}{\sqrt{A(\theta)}}\right)}{\varphi \left(\frac{B(\theta)}{\sqrt{A(\theta)}}\right)}\right\}, \end{eqnarray*} \]

where \(u^T = (\cos\theta,\sin\theta), A(\theta) = u^T\Sigma^{-1} u, \ B(\theta) = u^T \Sigma^{-1} \mu\) and \(\ C = -\frac{1}{2} \mu^T \Sigma^{-1} \mu\). \(\Phi(\cdot), \varphi(\cdot)\) are the standard normal distribution and density functions.