Time-Series-ARIMA-XGBOOST-RNN

个人家庭电力预测的时间序列预测:ARIMA,xgboost,RNN。「Time series forecasting for individual household power prediction: ARIMA, xgboost, RNN」

  • 所有者: Jenniferz28/Time-Series-ARIMA-XGBOOST-RNN
  • 平台: Linux, Mac, Windows
  • 許可證:
  • 分類:
  • 主題:
  • 喜歡:
    0
      比較:

Github星跟蹤圖

个人家庭用电的时间序列预测

数据集:https://archive.ics.uci.edu/ml/datasets/individual...

在2006年12月至2010年11月(47个月)的时间内,以一分钟的采样率收集了数据。6个自变量(电量和子计量值)是一个数值相关变量,具有2,075,259个观测值的全局有功功率可用。我们的目标是预测未来的全局有功功率。

在这里,为简单起见,删除了缺失值。此外,我们发现并非所有观察都按日期时间排序。因此,我们以显式时间戳作为索引来分析数据。在预处理步骤中,我们对原始数据执行存储桶平均,以减少一分钟采样率带来的噪声。为简单起见,我们仅关注原始数据集的最后18000行(2010年11月的最新数据)。

python文件列表:

  • Gpower_Arima_Main.py:单变量 ARIMA 模型的可执行 python 程序。
  • myArima.py:使用用于 ARIMA 模型的一些可调用方法实现一个类。
  • Gpower_Xgb_Main.py:基于树的模型(xgboost)的可执行 python 程序。
  • myXgb.py:实现一些用于 xgboost 模型的功能。
  • lstm_Main.py:LSTM 模型的可执行 python 程序。
  • lstm.py:使用 LSTMCell 实现一个时间序列模型类。功劳应归功于https://github.com/hzy46/TensorFlow-Time-Series-Ex...
  • util.py:实现各种用于数据预处理的功能。
  • Exploratory_analysis.py:探索性分析和数据图。

+环境:Python 3.6,TensorFlow 1.4。

在这里,我使用了3种不同的方法来对功耗模式进行建模。

  • 单变量时间序列 ARIMA。(对数据进行平均30分钟以减少噪声)
  • 基于回归树的 xgboost。(平均执行5分钟)
  • 递归神经网络单变量 LSTM(长短期记忆)模型。(平均执行15分钟以减少噪声)

未来工作中可能采取的方法:

(i)动态回归时间序列模型

鉴于次级计量1、次级计量2和次级计量3与我们的目标变量之间存在很强的相关性,这些变量可以纳入动态回归模型或回归时间序列模型。

(ii)动态 Xgboost 模型

包括时移的全局有功功率列作为特征。目标变量将是当前的全局有功功率。截止到此时间戳(例如,从100个时间步开始)之前的全局有功功率的近期历史应作为附加特性包括在内。

(iii)多元 LSTM

将每个时间戳子计量1、子计量2和子计量3、日期、时间和我们的目标变量包含到多元时间序列 LSTM 模型的 RNNCell 中

主要指標

概覽
名稱與所有者Jenniferz28/Time-Series-ARIMA-XGBOOST-RNN
主編程語言Python
編程語言Python (語言數: 1)
平台Linux, Mac, Windows
許可證
所有者活动
創建於2017-11-13 19:11:27
推送於2019-10-02 15:51:32
最后一次提交2018-01-29 16:44:28
發布數0
用户参与
星數704
關注者數16
派生數221
提交數51
已啟用問題?
問題數3
打開的問題數2
拉請求數0
打開的拉請求數3
關閉的拉請求數0
项目设置
已啟用Wiki?
已存檔?
是復刻?
已鎖定?
是鏡像?
是私有?

Time Series Prediction for Individual Household Power

Dateset: https://archive.ics.uci.edu/ml/datasets/individual+household+electric+power+consumption

The data was collected with a one-minute sampling rate over a period between Dec 2006
and Nov 2010 (47 months) were measured. Six independent variables (electrical quantities and sub-metering values) a numerical dependent variable Global active power with 2,075,259 observations are available. Our goal is to predict the Global active power into the future.

Here, missing values are dropped for simplicity. Furthermore, we find that not all observations are ordered by the date time. Therefore we analyze the data with explicit time stamp as an index. In the preprocessing step, we perform a bucket-average of the raw data to reduce the noise from the one-minute sampling rate. For simplicity, we only focus on the last 18000 rows of raw dataset (the most recent data in Nov 2010).

A list of python files:

  • Gpower_Arima_Main.py : The executable python program of a univariate ARIMA model.
  • myArima.py : implements a class with some callable methods used for the ARIMA model.
  • Gpower_Xgb_Main.py : The executable python program of a tree based model (xgboost).
  • myXgb.py : implements some functions used for the xgboost model.
  • lstm_Main.py : The executable python program of a LSTM model.
  • lstm.py : implements a class of a time series model using an LSTMCell. The credit should go to https://github.com/hzy46/TensorFlow-Time-Series-Examples/blob/master/train_lstm.py
  • util.py : implements various functions for data preprocessing.
  • Exploratory_analysis.py : exploratory analysis and plots of data.
+ Environment : Python 3.6, TensorFlow1.4.

Here, I used 3 different approaches to model the pattern of power consumption.

  • Univariate time series ARIMA.(30-min average was applied on the data to reduce noise.)
    onestep
    dynamic
    forecast
  • Regression tree-based xgboost.(5-min average was performed.)
    xgbManual
  • Recurrent neural network univariate LSTM (long short-term memoery) model. (15-min average was performed to reduce the noise.)
    predict_result

Possible approaches to do in the future work:

(i) Dynamic Regression Time Series Model

Given the strong correlations between Sub metering 1, Sub metering 2 and Sub metering 3 and our target variable,
these variables could be included into the dynamic regression model or regression time series model.

(ii) Dynamic Xgboost Model

Include the timestep-shifted Global active power columns as features. The target variable will be current Global active power.
Recent history of Global active power up to this time stamp (say, from 100 timesteps before) should be included
as extra features.

(iii) Multivariate LSTM

Include the features per timestamp Sub metering 1, Sub metering 2 and Sub metering 3, date, time and our target variable into the RNNCell for the multivariate time-series LSTM model.
multivariate