深度学习实战案例:基于LSTM的Li电池寿命预测

释放双眼,带上耳机,听听看~!
本文是一个深度学习的实战案例,介绍了基于长短期记忆模型LSTM的Li电池寿命预测,涵盖了Python机器学习建模所需的库和数据处理步骤。

公众号:尤而小屋
作者:Peter
编辑:Peter

大家好,我是Peter~

本文是一个深度学习的实战案例:基于长短期记忆模型LSTM的Li电池寿命预测

导入库

在Python中进行机器学习建模,通常需要以下一些库:

  • NumPy:这是Python的一个基本库,用于处理大型多维数组和矩阵,这对于机器学习算法非常重要。
  • Pandas:这个库提供了一种灵活的数据结构,可以方便地处理和分析数据,进行数据清洗和预处理。
  • Matplotlib:用于数据可视化的库,可以创建各种图表和图形。
  • Seaborn:基于Matplotlib库的高级封装,提供了一些更高级的可视化功能。
  • scikit-learn:这是最常用的机器学习库之一,包含许多经典的机器学习算法和实用程序。
  • TensorFlow 或 PyTorch:这两个都是深度学习框架,用于构建和训练神经网络。根据你的需求和项目类型,你可能需要选择其中一个。
  • Keras:Keras是一个建立在TensorFlow、Theano和CNTK上的高级神经网络API,可以方便地构建和训练深度学习模型。
  • Statsmodels:用于统计模型的拟合和预测,包括线性回归、逻辑回归等。

In [1]:

import numpy as np
import pandas as pd
pd.set_option('display.float_format', lambda x: '%.8f' % x)

# 可视化
import matplotlib.pyplot as plt
import matplotlib.ticker as tkr
%matplotlib inline

import seaborn as sns
sns.set_context("paper", font_scale=1.3)
sns.set_style('white')

import plotly_express as px
import plotly.graph_objects as go

from time import time

import math
from scipy import stats
from statsmodels.tsa.stattools import adfuller
from statsmodels.tsa.stattools import pacf

# sklearn
from sklearn import preprocessing
from sklearn.preprocessing import MinMaxScaler
from sklearn.metrics import mean_squared_error
from sklearn.metrics import mean_absolute_error

# TensorFlow-Keras
import keras
from keras.models import Sequential
from keras.layers import Dense
from keras.layers import LSTM
from keras.layers import Dropout
from keras.layers import *
from keras.callbacks import EarlyStopping

# 忽略警告
import warnings
warnings.filterwarnings('ignore')

导入数据

In [2]:

df=pd.read_csv('data.csv')
df.head()

Out[2]:

t v q T
0 736374.62276940 2.74435500 -0.00004110 41.49570100
1 736374.62278098 2.87145950 0.19542568 41.54560500
2 736374.62279255 2.90456840 0.40105436 41.62048000
3 736374.62280412 2.92857840 0.60668314 41.55807100
4 736374.62281570 2.94793870 0.81231222 41.62048000
398325 736088.79050114 2.75599150 -673.31840507 40.98415400
398326 736088.79051271 2.74112060 -673.52392141 41.03405400
398327 736088.79052429 2.72440580 -673.72943744 41.00908700
398328 736088.79053586 2.70612760 -673.93495363 41.04652000
398329 736088.79053983 2.69991470 -674.00544561 41.03405400

398330 rows × 4 columns

整体的数据量大小:

In [3]:

df.shape

Out[3]:

(398330, 4)

In [4]:

df.dtypes

Out[4]:

t    float64
v    float64
q    float64
T    float64
dtype: object

查看缺失值情况:

In [5]:

df.isnull().sum()

Out[5]:

t    0
v    0
q    0
T    0
dtype: int64

结果表明:本次数据中不存在缺失值

In [6]:

df.describe()  # 查看数据的描述统计信息

Out[6]:

t v q T
count 398330.00000000 398330.00000000 398330.00000000 398330.00000000
mean 736242.01817497 3.79064920 -12.73245818 40.37094253
std 185.23179234 0.23120589 357.85993286 0.50859680
min 735954.81583877 2.69965200 -734.81896570 38.20174400
25% 736071.60333095 3.68081355 -318.52437892 40.06082500
50% 736230.62326857 3.81470610 -13.14100665 40.28541600
75% 736407.73875342 3.94288035 292.40127446 40.67221800
max 736564.74362440 4.20021440 718.74129793 42.03221900
  • t:表示测试时间
  • v:表示测试的电压
  • q:表示正负极容量
  • T:表示测试的环境温度

根据测试时间t将数据升序排列:

In [7]:

df.sort_values("t",inplace=True,ignore_index=True)

Out[7]:

t v q T
0 735954.81583877 4.18963380 0.00000000 40.11072900
1 735954.81585035 4.17384050 -0.19524245 40.12323400
2 735954.81586192 4.17215730 -0.40063316 40.13570000
3 735954.81587350 4.17087460 -0.60602176 40.13570000
4 735954.81588507 4.16947170 -0.81141119 40.02338800

数据分布

时间t分布

In [8]:

# 根据索引升序排列

df["t"].value_counts().sort_index()  

Out[8]:

735954.81583877    1
735954.81585035    1
735954.81586192    1
735954.81587350    1
735954.81588507    1
                  ..
736564.74358940    1
736564.74360097    1
736564.74361255    1
736564.74362412    1
736564.74362440    1
Name: t, Length: 398330, dtype: int64

可以看到每个时间点是不同的。

In [9]:

print("start time: ", df["t"].min())
print("end time: ", df["t"].max())
start time:  735954.8158387741
end time:  736564.743624398

电压v分布

In [10]:

df["v"].describe()

Out[10]:

count   398330.00000000
mean         3.79064920
std          0.23120589
min          2.69965200
25%          3.68081355
50%          3.81470610
75%          3.94288035
max          4.20021440
Name: v, dtype: float64

In [11]:

print("V-min: ", df["v"].min())
print("V—max: ", df["v"].max())
V-min:  2.699652
V—max:  4.2002144

In [12]:

fig = px.box(df, y="v")

fig.show()

深度学习实战案例:基于LSTM的Li电池寿命预测

查看数据分布是否符合高斯分布:

In [13]:

sns.distplot(df.v)

plt.show()

深度学习实战案例:基于LSTM的Li电池寿命预测

温度T分布

In [14]:

print("T-min: ", df["T"].min())
print("T-max: ", df["T"].max())
T-min:  38.201744
T-max:  42.032219

In [15]:

fig = px.box(df, y="T")

fig.show()

深度学习实战案例:基于LSTM的Li电池寿命预测

容量q分布

绘制数据分布的密度图,并且计算峰度和偏度来检验是否符合正态分布。

  1. Kurtosis 指的是峰度,衡量实数随机变量概率分布的峰态的统计量
    • 它表征概率密度分布曲线在平均值处峰值高低的特征数。描述分布尾部的重轻程度。
    • 正态分布的kurtosis接近0。如果kurtosis大于0,则分布尾部较重。如果kurtosis小于0,则分布尾部较轻
  2. Skewness 指的是偏度
    • 统计学中衡量实数随机变量概率分布的不对称性的统计量;它反映的是分布偏斜的方向和程度,即分布的不对称性。
    • 如果偏度在-0.5和0.5之间,数据相当对称。如果偏度在-1和-0.5之间或0.5和1之间,数据中度偏斜。如果偏度小于-1或大于1,数据高度偏斜。

In [16]:

print( 'Kurtosis of normal distribution: {}'.format(stats.kurtosis(df.q)))
print( 'Skewness of normal distribution: {}'.format(stats.skew(df.q)))
Kurtosis of normal distribution: -1.1309840344769533
Skewness of normal distribution: 0.0058767955337298994

In [17]:

sns.distplot(df.q)

plt.show()

深度学习实战案例:基于LSTM的Li电池寿命预测

检验数据是否符合正态分布的一种方法:D’Agostino’s K^2 Test

该检验是偏度检验的一种,用于检查一组观察值的分布是否符合正态分布。它基于 Anderson-Darling 测试和 Shapiro-Wilk 测试,通过比较数据分布与理论分布(在这种情况下是正态分布)的拟合程度来评估数据是否符合正态分布。使用scipy库来实现。

In [18]:

import numpy as np  
from scipy.stats import normaltest  
  
# 生成一组随机数据  
data = np.random.normal(0, 1, 1000)  
  
# 执行 D'Agostino's K^2 Test ;有两个返回值K^2和p值
# 如果 P 值大于显著性水平(通常为 0.05),则我们不能拒绝数据符合正态分布的假设:即符合正态分布
k2, p = normaltest(data)  
  
# 输出结果  
print("K^2 statistic:", k2)  
print("P-value:", p)
K^2 statistic: 1.5222953698760753
P-value: 0.4671300011784085

In [19]:

sns.distplot(data)  # 明显的正态分布

Out[19]:

<AxesSubplot:ylabel='Density'>

深度学习实战案例:基于LSTM的Li电池寿命预测

t&q的分布

t和q的分布关系:

In [20]:

df1=df.loc[:,['t','q']]
df1.set_index('t',inplace=True)
df1.plot(figsize=(12,5))
plt.ylabel('q')
plt.legend().set_visible(False)  # 不显示图例

plt.tight_layout()
plt.title('Lithium-ion battery Time Series')
sns.despine(top=True)  # 移除顶部边框
plt.show();

深度学习实战案例:基于LSTM的Li电池寿命预测

基于LSTM建模预测

数据提取

In [21]:

dataset = df.q.values
dataset = dataset.astype("float32")
dataset = np.reshape(dataset, (-1,1))

dataset

Out[21]:

array([[ 0.0000000e+00],
       [-1.9524245e-01],
       [-4.0063316e-01],
       ...,
       [ 5.0550665e+02],
       [ 5.0571231e+02],
       [ 5.0571719e+02]], dtype=float32)

数据标准化

In [22]:

mm = MinMaxScaler(feature_range=(0,1))  # 标准化的范围在0-1之间
dataset = mm.fit_transform(dataset)

训练集和测试集

划分训练集和测试集

In [23]:

train_size = int(len(dataset) * 0.8)  #  训练集大小
test_size = len(dataset) - train_size  #  测试集大小

train, test = dataset[0:train_size,:], dataset[train_size:len(dataset),:]

数据转换

将数组转成矩阵形式:

In [24]:

def convert_dataset(dataset, k=1):
    X,Y = [],[]
    
    for i in range(len(dataset) - k - 1):
        x = dataset[i:(i+k), 0]
        y = dataset[i+k, 0]
        X.append(x)
        Y.append(y)
            
    return np.array(X), np.array(Y)

对训练集和测试集的转换:

In [25]:

k = 30

X_train, Y_train = convert_dataset(train, k)
X_test, Y_test = convert_dataset(test, k)

In [26]:

X_train.shape  

Out[26]:

(318633, 30)

In [27]:

Y_train.shape

Out[27]:

(318633,)

In [28]:

X_test.shape

Out[28]:

(79635, 30)

In [29]:

Y_test.shape

Out[29]:

(79635,)

In [30]:

# shape的重置  中间增加一个维度1

X_train = np.reshape(X_train, (X_train.shape[0], 1, X_train.shape[1]))  
X_test = np.reshape(X_test, (X_test.shape[0], 1, X_test.shape[1]))

In [31]:

X_train.shape

Out[31]:

(318633, 1, 30)

模型搭建

In [32]:

model = Sequential()
model.add(LSTM(100, input_shape=(X_train.shape[1], X_train.shape[2])))  # 除去shape的第一个值
model.add(Dropout(0.2))
model.add(Dense(1))

模型编译

In [33]:

model.compile(loss="mean_squared_error", optimizer="adam")

模型训练

In [34]:

history = model.fit(X_train, Y_train,
                   epochs=20,
                   batch_size=70,
                   validation_data=(X_test, Y_test),
                   callbacks=[EarlyStopping(monitor="val_loss", patience=10)],  # 早停观测指标
                   verbose=1,
                   shuffle=False
                   )

model.summary()
Epoch 1/20
4552/4552 [==============================] - 12s 2ms/step - loss: 0.0071 - val_loss: 0.0043
Epoch 2/20
4552/4552 [==============================] - 11s 2ms/step - loss: 0.0045 - val_loss: 0.0059
Epoch 3/20
4552/4552 [==============================] - 11s 2ms/step - loss: 0.0029 - val_loss: 0.0041
Epoch 4/20
4552/4552 [==============================] - 10s 2ms/step - loss: 0.0020 - val_loss: 0.0028
Epoch 5/20
4552/4552 [==============================] - 10s 2ms/step - loss: 0.0016 - val_loss: 0.0028
Epoch 6/20
4552/4552 [==============================] - 10s 2ms/step - loss: 0.0014 - val_loss: 0.0022
Epoch 7/20
4552/4552 [==============================] - 10s 2ms/step - loss: 0.0011 - val_loss: 0.0040
Epoch 8/20
4552/4552 [==============================] - 10s 2ms/step - loss: 9.6566e-04 - val_loss: 0.0014
Epoch 9/20
4552/4552 [==============================] - 10s 2ms/step - loss: 9.4363e-04 - val_loss: 0.0036
Epoch 10/20
4552/4552 [==============================] - 10s 2ms/step - loss: 8.5776e-04 - val_loss: 2.2590e-04
Epoch 11/20
4552/4552 [==============================] - 10s 2ms/step - loss: 8.4321e-04 - val_loss: 3.7745e-04
Epoch 12/20
4552/4552 [==============================] - 10s 2ms/step - loss: 7.1527e-04 - val_loss: 7.2602e-04
Epoch 13/20
4552/4552 [==============================] - 8s 2ms/step - loss: 7.3660e-04 - val_loss: 7.1454e-04
Epoch 14/20
4552/4552 [==============================] - 9s 2ms/step - loss: 6.6346e-04 - val_loss: 4.9174e-04
Epoch 15/20
4552/4552 [==============================] - 9s 2ms/step - loss: 6.3772e-04 - val_loss: 0.0011
Epoch 16/20
4552/4552 [==============================] - 10s 2ms/step - loss: 5.4255e-04 - val_loss: 5.4382e-04
Epoch 17/20
4552/4552 [==============================] - 9s 2ms/step - loss: 6.4824e-04 - val_loss: 3.9280e-04
Epoch 18/20
4552/4552 [==============================] - 10s 2ms/step - loss: 5.3917e-04 - val_loss: 3.5105e-04
Epoch 19/20
4552/4552 [==============================] - 9s 2ms/step - loss: 5.4563e-04 - val_loss: 3.9737e-04
Epoch 20/20
4552/4552 [==============================] - 10s 2ms/step - loss: 5.1043e-04 - val_loss: 2.2002e-04
Model: "sequential"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
=================================================================
 lstm (LSTM)                 (None, 100)               52400     
                                                                 
 dropout (Dropout)           (None, 100)               0         
                                                                 
 dense (Dense)               (None, 1)                 101       
                                                                 
=================================================================
Total params: 52,501
Trainable params: 52,501
Non-trainable params: 0
_________________________________________________________________

模型预测

In [35]:

train_pred = model.predict(X_train)
test_pred = model.predict(X_test)

# 预测结果还原  inverse_transform
train_pred = mm.inverse_transform(train_pred)
Y_train = mm.inverse_transform([Y_train])

test_pred = mm.inverse_transform(test_pred)
Y_test = mm.inverse_transform([Y_test])

9958/9958 [==============================] - 8s 753us/step
2489/2489 [==============================] - 2s 779us/step

In [36]:

train_pred[:5]

Out[36]:

array([[-25.417528],
       [-25.618053],
       [-25.818405],
       [-26.018715],
       [-26.219025]], dtype=float32)

In [37]:

test_pred[:5]

Out[37]:

array([[339.21115],
       [339.40436],
       [339.59738],
       [339.7904 ],
       [339.98343]], dtype=float32)

预测指标

In [38]:

print("Train-MAE:", mean_absolute_error(Y_train[0], train_pred[:, 0]))
print("Train-RMSE:", np.sqrt(mean_squared_error(Y_train[0], train_pred[:, 0])))

print("Test-MAE:", mean_absolute_error(Y_test[0], test_pred[:, 0]))
print("Test-RMSE:", np.sqrt(mean_squared_error(Y_test[0], test_pred[:, 0])))
Train-MAE: 22.344867164491458
Train-RMSE: 32.17790188891644
Test-MAE: 15.700488038320737
Test-RMSE: 21.560603609538187

In [39]:

Y_test.shape

Out[39]:

(1, 79635)

模型损失可视化

In [40]:

history_dict = history.history

for k in history_dict.keys():
    print(k)
loss
val_loss

In [41]:

plt.figure(figsize=(10,6))
plt.plot(history_dict['loss'], label='Train Loss')
plt.plot(history_dict['val_loss'], label='Test Loss')

plt.title('LSTM Loss')
plt.ylabel('Loss')
plt.xlabel('Epochs')
plt.legend(loc='upper right')
plt.show()

深度学习实战案例:基于LSTM的Li电池寿命预测

真实值vs预测值

对比测试集下的真实值和预测值:

In [42]:

x = [i for i in range(70000)]

plt.figure(figsize=(10,5))

plt.plot(x, Y_test[0][:70000], marker=".",label="actual")
plt.plot(x, test_pred[:,0][:70000], "r",label="predict")

plt.tight_layout()
sns.despine(top=True)

plt.subplots_adjust(left=0.07)
plt.xlabel("Time",size=15)
plt.ylabel("Charge",size=15)
plt.legend(fontsize=15)

plt.show()

深度学习实战案例:基于LSTM的Li电池寿命预测

本网站的内容主要来自互联网上的各种资源,仅供参考和信息分享之用,不代表本网站拥有相关版权或知识产权。如您认为内容侵犯您的权益,请联系我们,我们将尽快采取行动,包括删除或更正。
AI教程

深度学习经典模型:BERT详解

2023-11-23 18:36:55

AI教程

使用chatglm3-6b模型计算两个地址的经纬度之间的距离

2023-11-23 18:39:55

个人中心
购物车
优惠劵
今日签到
有新私信 私信列表
搜索