How to predict the wind direction of tomorrow

Tokyo.R #71

Takuto Kotsubo

15 July 2018

Table of content

Introduction
Circular Package and Circular statistics
Autoregressive model (AR model)
Apply the AR model on the Circular data
Make the Autoregressive model based on the Circular
Analysis for the wind direction data

Introduction

My profile

Name: Takuto Kotsubo
Twitter: airspace_nobo
Affiliation: Graduate School of Engineering
Major: Time series analysis
Hobby: Shogi, Bouldering
Favorite ggplot2 theme : theme_classic

Meeting circular statistics

Why did I come here ?

Thorough my research, I will introduce you that various things can be done in rstan !!!!

Circular Package and Circular statistics

Circular Statistics

Circular data refers to data recorded as points which directions are measured and arises in biology, geography, medicine, astronomy, and many other areas.
The usual summary statistics, such as the mean and standard deviation, cannot be used with angular values.

Circular plot

Use circular package

par(mfcol = c(1,2))
plot(x,main = "Circular plot")
rose.diag(x,main = "Rose diagrame")

Circular plot

Of course, ggplot can do it and more beautiful…

ggplot(x, aes(x =theta)) + 
  geom_histogram(bins = 50, fill = "steelblue") + 
  coord_polar(start = 0)  + scale_fill_brewer() + 
  scale_x_continuous("", limits = c(0, 2*pi), breaks = seq(0, 2*pi)) +
  theme_bw() + labs(title = "Circular histogram")

Circular mean direction

The average of the two radians \(\pi/3\) and \(\pi/5\) (black line)
Arithmetic average is red line, circular mean direction is blue line

mean(y)

[1] 3.141593

mean.circular(y)

Circular Data: 
Type = angles 
Units = radians 
Template = none 
Modulo = asis 
Zero = 0 
Rotation = counter 
[1] 0

Circular Standard Deviation

circular standard deviation is defined by \(\sqrt{-2 \log(R)}\), \(R\) implies “mean resultant length”

# same data
x <- runif(1000,0,2*pi) 
# normal standard deviation
sd(x)

[1] 1.740852

# circular standard deviation
sd.circular(circular(x))

[1] 2.383737

Circular RMSE

RMSE is a measure of the differences between observed values (\(\theta_t\)) and predicted values (\(\hat{\theta_t}\)) by the model,

\[\textrm{RMSE} = \sqrt{\frac{1}{n} \sum_{t=1}^n (\theta^{dif}_t)^2}, \]

where \(\theta^{dif}_t\) consider the circular rotations,

\[ \theta^{dif}_t = \begin{cases} \theta_t - \hat{\theta_t} + 2 \pi & (\theta_t - \hat{\theta_t}< - \pi) \\ \theta_t - \hat{\theta_t} & (- \pi \leq \theta_t - \hat{\theta_t}\leq \pi) \\ \theta_t - \hat{\theta_t}- 2 \pi & (\theta_t - \hat{\theta_t}> \pi) \end{cases}. \]

Circular RMSE with plot

Difference betweeen \(\frac{11 \pi}{6}\) and \(\frac{\pi}{3}\) is not \(\frac{3\pi}{2}\) but \(\frac{\pi}{2}\) !!

\[ \frac{11 \pi}{6} - \frac{\pi}{3} = \frac{3\pi}{2}??\]

Circular test

Rayleigh test : Whether the angle data is biased …
Kupier test : Whether angle data follow the von Mises distribution …
Mardia-Watoson-Wheeler test : Whether the angle data of the two groups follow the same distribution …

Autoregressive model (AR model)

Autoregressive process

Autoregressive process means that the current data is represented by past data, AR(\(p\)) model can be written as

\[y_t = c + \sum_{i=1}^p \phi_i y_{t-i} + \varepsilon_t,~~ \varepsilon_t \sim \mathcal{N}(0,~\sigma^2).\]

Joint probability distribution can be written as follows, using value before the time point \(t\)

\[f(y_1, y_2, ...|c, \phi_i, \varepsilon) = \prod_{t=1+p}^{n}f(y_t|y_{t-1}, ...,y_{t-p},c, \phi_i, \varepsilon). \]

Maximize log likelihood method

From the time point \(t-1\), the value at the time point \(t\) as follows,

\[ y_t|y_{t-1},...,y_{t-p} \sim \mathcal{N}(c + \sum_{i=1}^p \phi_i y_{t-i},~\sigma^2). \]

Consider the log likelihoods at each time point

\[ \begin{eqnarray*} \mathcal{L} &=& \log f(y_1, y_2, ...|c, \phi_i, \varepsilon) \\ &=& \log f(y_{1+p}|y_{p},...,y_{1}) + ... + \log f(y_{n}|y_{n-1},...,y_{n-p}). \end{eqnarray*} \]

Parameter estimation method

My rstan code (AR(\(p\)) model)

data{
  int N; // data size
  int P; // AR(P) 
  vector[N] theta; // data set
}
parameters{
  real alpha_0; // const parameter
  row_vector[P] alpha_1; // coef parameter
  real<lower=0> sigma; // Variance 
}
model{
  alpha_0 ~ normal(0,100); sigma ~ normal(0,100);
  for(i in 1:P){
    alpha_1[i] ~ normal(0,100); // N(0,100)
  }
  for(n in (1+P):N){
    vector[P] pre_theta; 
    for(k in 1:P){
      pre_theta[k] = theta[n-k];
    }
    target += 
      normal_lpdf(theta[n]|alpha_0 + ( alpha_1 * pre_theta),sigma);
  }
}
generated quantities{
  vector[N-P] log_likelihood;
  for(n in (1+P):N){
    vector[P] pre_theta;
    for(k in 1:P){
      pre_theta[k] = theta[n-k];
    }
    log_likelihood[n-P] = 
      normal_lpdf(theta[n]|alpha_0 + ( alpha_1 * pre_theta),sigma);
  } 
}

Sample data set

The luteinzing hormone in blood (lh data set)
10 minutes interbals, 48 samples

data.frame(lh) %>% 
    mutate(index = row_number()) %>%
    ggplot(aes(x=index,y=lh)) +
    geom_line()

MCMC settings

\(1000\) warm up, \(1000\) iteration, \(4\) chains

rstan_options(auto_write=TRUE) # auto save
options(mc.cores=parallel::detectCores()) # multi core

ar2_model <- readRDS("model/circular_AR2.rds") # complie model
fit3 <- sampling(ar2_model,data = list(N=length(lh),P=3,theta=lh),
                 iter=2000, chains=4) # sampling by model

Select AR(\(3\)) model by WAIC

\(\hat{R} < 1.01\) indicate all parameters convergence

p	WAIC	RMSE
1	64.6	0.449
2	63.8	0.442
3	63.7	0.436
4	65.2	0.438
5	66.4	0.441
6	65.4	0.430
7	65.4	0.425

Autocorrelation of Markov Chains

Posterior Distributions

Apply the Autoregressive model to the Circular data

Wind Direction in Col de la Roa

In a place named “Col de la Roa” in the Italian Alps, It reported the wind direction records (wind data set)
Total of 310 measures

Preprocessing and MCMC settings

change the domain of \(\theta\) and make a data list for MCMC

data(wind) 
wind_data <- wind %>%  
  data.frame(t = seq(1,310,1), tmp=.) %>% # put label
  mutate(theta_real = dplyr::if_else(tmp>pi,tmp - 2*pi, tmp)) %>% 
  dplyr::select(-tmp) 
d.dat7 <- list(N=length(wind_data$theta_real), 
               P=7,theta=wind_data$theta_real) # data list

\(1000\) warm up, \(1000\) iteration, \(4\) chains

rstan_options(auto_write=TRUE) # auto save
options(mc.cores=parallel::detectCores()) # multi core

ar2_model <- readRDS("model/circular_AR2.rds") # complie model
fit_ar_7 <- sampling(ar_model,data = d.dat7, 
                     iter=2000, chains=4) # sampling by model

Autocorrelation of Markov Chains

Posterior Distributions

Select AR(\(7\)) model by WAIC

\(\hat{R} < 1.01\) indicate all parameters convergence
but fitting is wrong …

p	WAIC	RMSE
1	890.1	0.993
2	888.2	0.987
3	888.8	0.989
4	889.8	0.989
5	885.2	0.984
6	888.3	0.980
7	882.5	0.979

Make the Autoregressive model based on the Circular

State-space model

\[\mu_t = \left( \begin{array}{c} \cos \theta_t \\ \sin \theta_t \end{array} \right), ~~ \theta_t \sim \mathcal{PN}(\mu_t, \sum)\]

\(\mathcal{PN}\) distribution

If rstan, you can decsribe a complex distribution

Circular State-space model

\[ \left( \begin{array}{c} \cos \theta_t \\ \sin \theta_t \end{array} \right) = \left( \begin{array}{c} \alpha_{c,0} \\ \alpha_{s,0} \end{array} \right) + \left( \begin{array}{cc} \beta_{c,1,1} & \beta_{c,1,2} \\ \beta_{s,1,1} & \beta_{s,1,2} \end{array} \right) \left( \begin{array}{c} \cos \theta_{t-1} \\ \sin \theta_{t-1} \end{array} \right) \\ + \left( \begin{array}{cc} \beta_{c,2,1} & \beta_{c,2,2} \\ \beta_{s,2,1} & \beta_{s,2,2} \end{array} \right) \left( \begin{array}{c} \cos \theta_{t-2} \\ \sin \theta_{t-2} \end{array} \right) + \cdots + \\ \left( \begin{array}{cc} \beta_{c,p,1} & \beta_{c,p,2} \\ \beta_{s,p,1} & \beta_{s,p,2} \end{array} \right) \left( \begin{array}{c} \cos \theta_{t-p} \\ \sin \theta_{t-p} \end{array} \right) + \left( \begin{array}{c} \varepsilon_{c,t} \\ \varepsilon_{s,t} \end{array} \right) \]

where \(\boldsymbol{\alpha}, \boldsymbol{\beta}\) is coefficient parameters and \(\boldsymbol{\varepsilon}\) is variance parameter.

\[ \left( \begin{array}{c} \varepsilon_{c,t} \\ \varepsilon_{s,t} \end{array} \right) \sim N \left( \left( \begin{array}{c} 0 \\ 0 \end{array} \right), \left( \begin{array}{cc} \sigma_c^2 & \rho \sigma_c \sigma_s \\ \rho \sigma_c \sigma_s & \sigma_s^2 \end{array} \right) \right) \]

Circular State-space model

where \(\boldsymbol{\alpha}, \boldsymbol{\beta}\) is coefficient parameters and \(\boldsymbol{\varepsilon}\) is variance parameter.

Circular State-space model

Easy to write using rstan

MCMC code for Circular State-space model

functions{
    // PN distribution (observation model)
  real circular_obs_lpdf(real theta, int P, vector pre_theta, vector alpha_0, matrix alpha_1, matrix sigma){
    vector[2] u; vector[2] mu; vector[2*P] tmp; real A; real B; real C; real D; real p;
    for(k in 1:P){
      tmp[2*k-1] = cos(pre_theta[k]); tmp[2*k] = sin(pre_theta[k]);
    }
    mu = alpha_0 + ( alpha_1 * tmp); u[1] = cos(theta); u[2] = sin(theta);
    A = quad_form(inverse_spd(sigma), u); B = u' * inverse_spd(sigma) * mu;
    C = (-0.5) * quad_form(inverse_spd(sigma), mu); D = B/sqrt(A);
    p = - log(A) - 0.5*log(determinant(sigma)) + C + log(1+(D * (normal_cdf(D,0,1)/ (exp(-D^2 /2)/sqrt(2*pi()))))); 
    return p;
  }
  // Multi Normal distribution (state model)
  real circular_state_lpdf(vector u, int P, vector pre_theta, vector alpha_0, matrix alpha_1, matrix sigma){
    vector[2*P] tmp; vector[2] mu; real p;
    for(k in 1:P){
      tmp[2*k-1] = cos(pre_theta[k]); tmp[2*k] = sin(pre_theta[k]);
    }
    mu = alpha_0 + ( alpha_1 * tmp); 
    p = multi_normal_lpdf(u|mu,sigma);
    return p;
  }
}

data{
  // data paramter
  int N; // sample size
  int P; // VAR(P) 
  vector<lower=-pi(),upper=pi()>[N] theta; // data
}

transformed data{
  vector[2] u[N];
  for(k in 1:N){
    u[k,1] = cos(theta[k]);
    u[k,2] = sin(theta[k]);
  }
}

parameters{
  vector[2] alpha_0;
  matrix[2,2*P] alpha_1;
  cov_matrix[2] sigma;
}

model{
  alpha_0[1] ~ normal(0,100); alpha_0[2] ~ normal(0,100);
  for(i in 1:2*P){
    alpha_1[1,i] ~ normal(0,100); // N(0,100)
    alpha_1[2,i] ~ normal(0,100); // N(0,100)
  }
  sigma ~ inv_wishart(4,diag_matrix(rep_vector(1,2)));
  for(n in (1+P):N){
    vector[P] pre_theta;
    for(k in 1:P){
      pre_theta[k] = theta[n-k];
    }
    target += circular_state_lpdf(u[n]|P,pre_theta, alpha_0,alpha_1,sigma);
    target += circular_obs_lpdf(theta[n]|P,pre_theta,alpha_0,alpha_1,sigma);
  }
}

generated quantities{
  vector[N-P] log_likelihood;
  for(n in (1+P):N){
    vector[P] pre_theta;
    for(k in 1:P){
      pre_theta[k] = theta[n-k];
    }
    log_likelihood[n-P] = circular_state_lpdf(u[n]|P,pre_theta, alpha_0,alpha_1,sigma) + 
                          circular_obs_lpdf(theta[n]|P, pre_theta, alpha_0, alpha_1, sigma);
  } 
}

Analysis for the wind direction data

Wind Direction in Col de la Roa

In a place named “Col de la Roa” in the Italian Alps, It reported the wind direction records (wind data set)
Total of 310 measures

Preprocessing and MCMC settings

change the domain of \(\theta\) and make a data list for MCMC

data(wind) 
wind_data <- wind %>%  
  data.frame(t = seq(1,310,1), tmp=.) %>% # put label
  mutate(theta_real = dplyr::if_else(tmp>pi,tmp - 2*pi, tmp)) %>% 
  dplyr::select(-tmp) 
d.dat5 <- list(N=length(wind_data$theta_real),
               P=5,theta=wind_data$theta_real) # data list

\(1000\) warm up, \(4000\) iteration, \(4\) chains

rstan_options(auto_write=TRUE) # auto save
options(mc.cores=parallel::detectCores()) # multi core

state_obs_model <- readRDS("model/state_obs_circular_VAR.rds") # complie model
fit_state_obs_5 <- sampling(state_obs_model,data = d.dat5, 
                            iter=5000, chains=4) # sampling by model

Autocorrelation of Markov Chains

Posterior Distributions

Select Circular State-space model(\(5\)) model by WAIC

\(\hat{R} < 1.01\) indicate all parameters convergence

p	WAIC	RMSE
1	444.2	0.915
2	442.5	0.902
3	435.0	0.908
4	436.6	0.899
5	431.9	0.892
6	437.4	0.884
7	449.6	0.886
8	456.4	0.885
9	466.6	0.886
10	461.3	0.882

Off Topic

rstan uses ggplot2, so you change the option easily

stan_plot(
  fit_state_obs_5,fill_color ="blue",point_est ="median",show_density=T, 
  ci_level = 0.95, outer_level=1.00, show_outer_line =T,
  pars = c("alpha_0[1]","alpha_0[2]","alpha_1[1,1]","alpha_1[1,2]",
            "alpha_1[2,1]","alpha_1[2,2]","alpha_1[1,3]",
            "alpha_1[1,4]","alpha_1[2,3]","alpha_1[2,4]")) + 
  scale_y_continuous(
    labels= c(expression(alpha[paste("c,",0)]),expression(alpha[paste("s,",0)]),
              expression(beta[paste("c,",1,",",1)]),expression(beta[paste("c,",1,",",2)]),
              expression(beta[paste("s,",1,",",1)]),expression(beta[paste("s,",1,",",2)]),
              expression(beta[paste("c,",2,",",1)]),expression(beta[paste("c,",2,",",2)]),
              expression(beta[paste("s,",2,",",1)]),expression(beta[paste("s,",2,",",2)])) %>% 
      rev(), breaks = 1:10)

Summary

Amazing ggplot2 and rstan!!
Fantastic option facet_zoom !!
Useful reveal.js for making slide!!

Thank you for listening !!

Enjoy!

Appendix

Model Selection Criteria (WAIC)

WAIC can be viewed as an improvement on the DIC for Bayesian models. WAIC can be interpreted as a computationally convenient approximation to cross-validation and is defined as

where \(n\) is the number of data \(\theta\), \(S\) is MCMC sampling number of each parameter, \(\eta^{m, S}\) is the \(S\)-th parameter set of the \(m\)-th chain.

Projected Normal Distribution

Distributions of the circle can be obtained by radial projection of distributions on the plane.
The density of \(\Theta\) in \(U = (\cos\Theta, \sin\Theta)^T\) can be represented as

\[ \begin{eqnarray*} p(\theta|\mu, \Sigma) = \frac{1}{2\pi A(\theta)} |\Sigma|^{-\frac{1}{2}} \exp(C)\left\{1 + \frac{B(\theta)}{\sqrt{A(\theta)}} \frac{\Phi \left(\frac{B(\theta)}{\sqrt{A(\theta)}}\right)}{\varphi \left(\frac{B(\theta)}{\sqrt{A(\theta)}}\right)}\right\}, \end{eqnarray*} \]

where \(u^T = (\cos\theta,\sin\theta), A(\theta) = u^T\Sigma^{-1} u, \ B(\theta) = u^T \Sigma^{-1} \mu\) and \(\ C = -\frac{1}{2} \mu^T \Sigma^{-1} \mu\). \(\Phi(\cdot), \varphi(\cdot)\) are the standard normal distribution and density functions.