[ML] Real-Time Eye Blink Detection using Facial Landmarks 논문 리뷰

18 Jul 2020 in Study on ML

최근 얼굴의 생동감(liveness)를 감지하는 프로젝트를 진행하고 있습니다. 앞서 살펴본 texture and frequency analysis 방법 외에도 인간의 미묘한 생체 활동 역시 생동감을 감지하기에 좋은 단서입니다. 그 중 눈 깜빡임은 모든 인간이 공통적으로, 무의식적으로 행하는 생체 활동입니다. 눈 깜빡임 여부를 알 수 있다면 생동감 역시 판단할 수 있다고 생각했습니다. 그래서 이번 포스팅에서는 눈 깜빡임 감지 알고리즘을 제시한 Real-Time Eye Blink Detection using Facial Landmarks 논문을 살펴보도록 하겠습니다.

[ML] Face Liveness Detection based on Texture and Frequency Analysis 논문 리뷰

31 May 2020 in Study on ML

최근 들어 FIDO(Fast IDentity Online) 방식을 통해 기존 ID, 패스워드가 아닌 홍채, 얼굴, 지문과 같은 생체 정보를 통해 인증하는 방식이 유행하고 있습니다. 특히 얼굴 인식 방식이 편리하다는 이유로 각광받고 있습니다. 모든 보안 문제가 그러하듯, 얼굴 인증 방식을 통과하려는 악의적인 침입인 Spoofing 역시 등장하였습니다. 얼굴 인증을 통과하는 Spoofing은 가상의 얼굴을 대조함으로써 인증을 통과하는데 구체적인 방법은 아래와 같습니다.

[Kaggle] Pseudo Labeling

24 Apr 2020 in Study on ML

최근 kaggle 대회 참가를 준비하면서 성능을 향상시키기 위한 다양한 머신러닝 학습 방법을 알게 되었습니다. 오늘 포스팅에서는 그 중에서도 가장 직관적이지만 좋은 결과를 보여주었던 Pseudo Labeling에 대해 살펴보고자 합니다.

[ML-20]Hierarchial Clustering

12 Oct 2019 in Study on ML

20. Hierarchial Clustering

[ML-19]K-Means Clustering

11 Oct 2019 in Study on ML

19. K-Means Clustering

[ML-18]PCA(Principal Component Analysis)

10 Oct 2019 in Study on ML

18. PCA(Principal Component Analysis)

[ML-17]Stacking

09 Oct 2019 in Study on ML

17. Stacking

what?

앙상블 학습에서 각 모델의 예측값을 가지고 새로운 메타 모델(meta learner)을 학습시켜 최종 예측 모델을 만드는 방법
base-level classifier를 통해 도출된 예측값을 메타 모델을 학습시키는 input data로 사용한다.

stacking

why?

단일 모델을 사용하는 경우보다 더 높은 정확도를 보인다
기존에 학습시킨 모델을 활용하는 것이 가능해 협업에 유리하다

why not?

연산량이 많아 속도가 느리고 computational cost가 높아 현업에서 사용하기 힘들다

how?

`Input`

1) Data $D = {(x_i, y_i)}_{i=1}^m$

`Step 1` for t = 1 to T(number of base-level classifiers) train $h_t$ based on $D$

base-level classifier(ex SVM, KNN, Random Forest…)를 학습시킨다

`Step 2` train meta classifier

1) for i = 1 to m(number of samples) construct new data set

base-level classifier의 예측값을 meta classifier를 학습시키기 위한 input data로 사용한다

2) train meta classifier

learn $H$ based on $D_h$

`Step 3` output ensemble classifier $H$

Code usage

from sklearn.model_selection import train_test_split 

from sklearn.metrics import accuracy_score 

from sklearn.ensemble import ExtraTreesClassifier 

from sklearn.ensemble import RandomForestClassifier 

from xgboost import XGBClassifier 

from vecstack import stacking

iris = load_iris()

X, y = iris.data, iris.target

X_train, X_test, y_train, y_test = train_test_split(X,y, test_size=0.2, random_state=0)

models = [ ExtraTreesClassifier(random_state = 0, n_jobs = -1, n_estimators = 100, max_depth = 3), 

          RandomForestClassifier(random_state = 0, n_jobs = -1, n_estimators = 100, max_depth = 3), 

          XGBClassifier(seed = 0, n_jobs = -1, learning_rate = 0.1, n_estimators = 100, max_depth = 3)]

S_train, S_test = stacking(models, X_train, y_train, X_test, 

                           regression = False, metric = accuracy_score, 

                           n_folds = 4, stratified = True, shuffle = True, 

                           random_state = 0, verbose = 2)

model = XGBClassifier(seed = 0, n_jobs = -1, learning_rate = 0.1, n_estimators = 100, max_depth = 3) 

# Fit 2-nd level model 

model = model.fit(S_train, y_train) 

# Predict 

y_pred = model.predict(S_test) 

# Final prediction score 

print('Final prediction score: [%.8f]' % accuracy_score(y_test, y_pred))

Reference

핸즈온 머신러닝

stacking 코드

stacking 설명

[ML-16]Catboost

08 Oct 2019 in Study on ML

16. Catboost(Categorical Boost)

[ML-15]LightGBM

07 Oct 2019 in Study on ML

14. LightGBM(Light Gradient Boosting Machine)

[ML-14]XGBoost

06 Oct 2019 in Study on ML

14. XGBoost

[ML-13]Gradient Boosting

05 Oct 2019 in Study on ML

13.Gradient Boosting

[ML-12]Adaboost

04 Oct 2019 in Study on ML

12. Adaboost

[ML-11]Random Forest

03 Oct 2019 in Study on ML

11. Random Forest

[ML-10]Voting and Bagging

02 Oct 2019 in Study on ML

10. Voting and Bagging

10.1 Voting

[ML-09]Decision Tree

01 Oct 2019 in Study on ML

9. Decision Tree

[ML-08]K-Nearest Neighbors

30 Sep 2019 in Study on ML

8. K-Nearest Neighbors

[ML-07]Support Vector Machine

29 Sep 2019 in Study on ML

7. Support Vector Machine

7.1 Support Vector Machine for Classification

what?

서로 다른 범주에 속하는 데이터를 구분짓는 hyper plane과 support vector(범주 최전방에 속해있는 데이터) 간의 거리를 최대화하는 분류 알고리즘. Large Margin Classifier라고도 불림.

1_QJZVKh-YHhPn5Q83kzJ96Q

1) Logistic Regression과의 차이점

logistic regression Loss function

\[\frac 1 m [\sum_{i=1}^M(y_ilog(h_\theta(x_i))+ (1-y_i)log(1-h_\theta(x_i)))] + \frac {\lambda} {2m}\sum_{j=0}^N\theta_j^2\]

if $\theta^Tx \ge 0$, then $y=1$, else $y=0$

Support Vector Machine Loss function

\[C\sum_{i=1}^M(y_icost_1(h(x_i)) + (1-y_i)cost_0(1-h(x_i))) + \frac 1 2 \sum_{j=0}^N\theta_j^2\]

if $\theta^Tx \ge 1$, then $y=1$, else $y=0$

support vector machine은 기존 logistic regresion보다 margin을 더 크게 만든다.
왼쪽항이 규제항으로 바뀐다.

logistic regression : A + $\lambda$B

support vecotr machine : CA + B(C=$\frac 1 {\lambda}$)

2) Mathmetics behind Support Vector Machine

규제항을 제외한 $\frac 1 2 \sum_{j=0}^N\theta_j^2$을 minimize해야함

#####= $\frac 1 2 (\theta_0^2 + \theta_1^2 + ... + \theta_n^2)$

ex) n = 2, $\frac 1 2 (\theta_0^2 + \theta_1^2)$

Cap 2019-10-01 18-07-45-526

$\theta^Tx = p * ||\theta|| \ge 1$이 성립하기 위해서 SVM 알고리즘은 p의 크기를 최대화함으로써 $\theta$를 minimize할 것이다.

3) Non-linear classification

3-1) polynominal features를 더해줌으로써 가능

ex) $x_1$ ->$x_1^2 + 2$$

3-2) similarity(유사도) 추가

각 데이터를 랜드마크로 지정해 랜드마크와 데이터 사이의 거리를 feature로 추가하는 방식
$l$ = landmark, $f_1$ = similarity($x, l$)
similarity를 정하는 방식에는 여러 가지가 있다. 그 중 Gaussian kernel은 다음과 같다.

Gaussian kernel = $$exp(\frac {-

x - l

^2} {2\sigma^2})$$

(if $x \approx l$ : $exp(\frac {-0} {2\sigma^2}) = 1$,

else $exp(\frac {-large} {2\sigma^2})$) \[\theta_0 + \theta_1x_1 + ... + \theta_nx_n$ -> $\theta_0 + \theta_1f_1 + ... + \theta_nf_n\]

why?

non-linear 한 데이터 분류가 가능하다.
feature의 수가 data의 수보다 많을 때 효과적이다.
고차원 데이터에 대해서 좋은 성능을 보인다.
decision boundary는 support vector의 영향만을 받기 때문에 이상치의 영향을 조금 받는다.

why not?

연산량이 많아 데이터의 크기가 클 경우 결과를 내기까지 시간이 오래걸릴 수 있다.
범주가 겹치는 경우 좋은 성능을 보이지 않는다.

how?

`Input`

1)Data{($x_i, y_i$)}, M rows(data) and 1 column(feature)

2) Model : $h_\theta(x) =\theta_0 + \theta_1x + ... + \theta_nx_n$

3) Loss function $C\sum_{i=1}^M(y_icost_1(h(x_i)) + (1-y_i)cost_0(1-h(x_i))) + \frac 1 2 \sum_{j=0}^N\theta_j^2$

4) parameters C, $\gamma$(if using gaussian kernel),

`Step 1` initialize parameters $\theta_0, \theta_1,..., \theta_n$ for Model

`step 2.` find optimal paramters

1) Loss function $J(\theta)$ 계산하기
2) Gradient Descent 방법으로 parameter 최적화 하기(순서 주의!!)

$temp0 : = \theta_0 - \alpha$ $\frac {\partial J(\theta)} {\partial \theta_0}$

$temp1 : = \theta_1 - \alpha$ $\frac {\partial J(\theta)} {\partial \theta_1}$

$tempn : = \theta_n - \alpha$ $\frac {\partial J(\theta)} {\partial \theta_n}$

$\theta_0 : = temp0$

$\theta_1 : = temp1$

$\theta_n : = tempn$

3) update된 parameter를 토대로 Loss function 계산
4) Loss function이 최소가 될 때까지 step2의 과정 반복

`step 3.` Ouput : optimal hyper plane $h_\theta(x)$

Code usage

1) Linear Classification

from sklearn.svm import SVC

from sklearn.datasets import load_iris

iris = load_iris()

X = iris['data'][:, (2,3)]

y = iris['target']

setosa_or_versicolor = (y==0) | (y==1)

X = X[setosa_or_versicolor]

y = y[setosa_or_versicolor]

svm_clf = SVC(kernel = 'linear', C=float('inf'))

svm_clf.fit(X, y)

2) Non-linear Classification by adding polynominal features

from sklearn.datasets import make_moons

from sklearn.pipeline import Pipeline

from sklearn.preprocessing import PolynomialFeatures

from sklearn.preprocessing import StandardScaler

from sklearn.svm import LinearSVC

X, y = make_moons(n_samples=100, noise=0.15, random_state=42)

poly_kernel_svm_clf = Pipeline([

    ('scaler', StandardScaler()),

    ('svm_clf', SVC(kernel='poly', degree=3, coef0=1, C=5))

])

poly_kernel_svm_clf.fit(X, y)

3) Non-linear Classification by adding Gaussian similarity features

from sklearn.datasets import make_moons

from sklearn.pipeline import Pipeline

from sklearn.preprocessing import PolynomialFeatures

from sklearn.preprocessing import StandardScaler

from sklearn.svm import LinearSVC

X, y = make_moons(n_samples=100, noise=0.15, random_state=42)

rbf_kernel_svm_clf = Pipeline([

    ('scaler', StandardScaler()),

    ('svm_clf', SVC(kernel='rbf', gamma=5, C=0.001))

])

rbf_kernel_svm_clf.fit(X, y)

7.2 Support Vector Machine for Regression

what?

범주를 구분짓는 Hyper plane과 support vector를 지나면서 hyper plane과 평행한 boundary line 사이의 공간에 데이터가 최대한 많이 속하도록 학습시키는 알고리즘
epsilon 파라미터가 hyper plane과 boundart line 사이의 거리를 조정한다. epsilon이 커질수록 포함되는 데이터의 수가 많아진다.

Cap 2019-10-02 16-32-44-364

why?

Support Vector Machine for Classification과 동일

why not?

Support Vector Machine for Classification과 동일

how?

Support Vector Machine for Classification과 동일

Code usage

import numpy as np

import sklearn.svm import SVR

np.random.seed(42)

m = 100

X = 2 * np.random.rand(m, 1) - 1

y = (0.2 + 0.1 * X + 0.5 * X**2 + np.random.randn(m, 1)/10).ravel()

svm_poly_reg = SVR(kernel="poly", gamma='auto', degree=2, C=100, epsilon=0.1)

svm_poly_reg.fit(X, y)

Tips

Support Vector Machine Parameters

parameter

의미

higher

lower

|:———:|:—:|:——-:|:——:|

얼마나 많은 샘플이 다른 범주에 놓일지 결정

이상치 가능성을 높게 봄, 높으면 underfit, hard margin

이상치 가능성을 낮게 봄, 낮으면 overfit, soft margin

gamma

하나의 데이터 샘플의 영향력을 결정

작은 표준편차, 영향력 거리가 작음, underfit

큰 표준편차, 영향력 거리가 큼, overrfit

epsilon

마진 안에 얼마나 많은 샘플이 들어올지 결정

샘플이 마진 안에 들어올 수 있는 범위가 넓어짐, underfit

샘플이 마진 안에 들어올 수 있는 범위가 좁아짐, overfit

Reference

핸즈온 머신러닝

Coursera : Machine Learning by Andrew Ng

Support Vector Machine 장단점

Support Vector Machine for Regression 설명

[ML-06]Naive Bayes

28 Sep 2019 in Study on ML

6. Naive Bayes

[ML-05]Logistic Regression

27 Sep 2019 in Study on ML

5. Logistic Regression

5.1 Logistic Regression

[ML-04]규제가 있는Linear Regression

26 Sep 2019 in Study on ML

4. 규제가 있는 Linear Regression

4.1 Lasso Regression

[ML-03]복잡한 Linear Regression

25 Sep 2019 in Study on ML

3. 복잡한 Linear Regression

3.1 Multivariate Linear Regression

[ML-02]Linear Regression

24 Sep 2019 in Study on ML

2. Linear Regression

[ML-01]Machine learning

23 Sep 2019 in Study on ML

1. 머신러닝(Machine Learning)이란

20. Hierarchial Clustering

19. K-Means Clustering

18. PCA(Principal Component Analysis)

17. Stacking

what?

why?

why not?

how?

Input

Step 1 for t = 1 to T(number of base-level classifiers) train \(h_t\) based on \(D\)

Step 2 train meta classifier

Step 3 output ensemble classifier \(H\)

Code usage

Reference

16. Catboost(Categorical Boost)

14. LightGBM(Light Gradient Boosting Machine)

14. XGBoost

13.Gradient Boosting

12. Adaboost

11. Random Forest

10. Voting and Bagging

10.1 Voting

9. Decision Tree

8. K-Nearest Neighbors

7. Support Vector Machine

7.1 Support Vector Machine for Classification

what?

ex) n = 2, \(\frac 1 2 (\theta_0^2 + \theta_1^2)\)

\(\theta^Tx = p * ||\theta|| \ge 1\)이 성립하기 위해서 SVM 알고리즘은 p의 크기를 최대화함으로써 \(\theta\)를 minimize할 것이다.

why?

why not?

how?

Input

Step 1 initialize parameters \(\theta_0, \theta_1,..., \theta_n\) for Model

step 2. find optimal paramters

\(temp0 : = \theta_0 - \alpha$ $\frac {\partial J(\theta)} {\partial \theta_0}\)

\(temp1 : = \theta_1 - \alpha$ $\frac {\partial J(\theta)} {\partial \theta_1}\)

\(tempn : = \theta_n - \alpha$ $\frac {\partial J(\theta)} {\partial \theta_n}\)

\(\theta_0 : = temp0\)

\(\theta_1 : = temp1\)

\(\theta_n : = tempn\)

step 3. Ouput : optimal hyper plane $h_\theta(x)$

Code usage

7.2 Support Vector Machine for Regression

what?

why?

why not?

how?

Code usage

Tips

Support Vector Machine Parameters

Reference

6. Naive Bayes

5. Logistic Regression

5.1 Logistic Regression

4. 규제가 있는 Linear Regression

4.1 Lasso Regression

3. 복잡한 Linear Regression

3.1 Multivariate Linear Regression

2. Linear Regression

1. 머신러닝(Machine Learning)이란

Pagination

`Input`

`Step 1` for t = 1 to T(number of base-level classifiers) train \(h_t\) based on \(D\)

`Step 2` train meta classifier

`Step 3` output ensemble classifier \(H\)

`Input`

`Step 1` initialize parameters \(\theta_0, \theta_1,..., \theta_n\) for Model

`step 2.` find optimal paramters

`step 3.` Ouput : optimal hyper plane $h_\theta(x)$