# THE AITREE – PROOF OF CONCEPT

In the feature engineering process, the authors found a single feature to have a linear correlation factor of -0.92 with the log of RUL. This feature was the log of the variance of the difference between discharge capacity curves, as a function of voltage, between the 100th and the 10th cycle. Thus, the engineered feature could be used by itself to achieve great prediction results of RUL after the 100th charge/discharge cycle.

Using Elastic Nets, with default parameters, we obtained the following prediction results after the 100th cycle.

Note that we decided to use the first cycle as a reference when creating the feature instead of the 10th suggested in the paper. Next, the degradation of five (randomly selected) individual batteries is shown together with their predicted RUL. Its exact value is difficult to predict correctly but may be of sufficient precision for simple classification tasks.

While the engineered feature used has an outstanding correlation with RUL it is nevertheless very restrictive. Using such a feature basically means performing 100 charge/discharge cycles in a controlled environment before a prediction is possible. In a commercial setting, such a setup will be too time-consuming, costly and thus impractical. Therefore, finding other features that are less restrictive but still offer good predictive performance is important. For example, we found that using the first cycle as a reference instead of the 10th is a suitable candidate for predicting RUL, increasing the commercial viability of the prediction method. The figure below visualizes the correlation coefficient between the feature and RUL for each cycle up to 200 cycles.

The figure tells us that the linear correlation is the largest around 100 cycles. It might still be possible to use the feature already after 40-50 cycles with a smaller reduction in performance. Keep in mind that this is the linear correlation most indicative of prediction performance for linear machine learning methods, while non-linear methods can find useful information for prediction even though the linear correlation is low.

Until now, we have only considered one feature for predicting RUL, but several more can be engineered from the charge-/discharge cycle data. Introducing more features leads us to another problem, namely, feature selection. There are regression methods that can report on the feature importance for prediction following their training. An example of such a method is the Random Forest Regressor (RFR), which is also a non-linear estimator. Supplied below is an example of feature importance an RFR after fitting.

Using the smallest subset with a joined total importance of more than 0.8 the top 6 features were selected and the following prediction chart was obtained on test data.

As can be seen, the predictions are the best when the RUL is smaller than 200, but between 250-1500 the mean prediction is close to the RUL at the cost of increasing variance. Only half of the battery cells live longer than 850 cycles, which decreases the amount of training data for larger RULs and will introduce a bias towards better predictions for smaller RULs.

The eager reader might be wondering about the elephant in the room by now – what about actual batteries? After all, they would be the focus of any real battery lifetime prediction efforts. The discussion above, however, has considered individual cells, tens or even hundreds of which are usually combined into battery packs to obtain the necessary voltage and current in commercial applications. Unfortunately, no public dataset of the same magnitude as the Stanford study is available. As a stopgap measure, we can construct a collection of “virtual” batteries by simply aggregating the cells to mimic existing products – e.g. a 72Ah LiFePO_{4} pack. This method is akin to bootstrapping, in this case choosing 68 cells with replacement from the dataset to obtain a collection of training and testing packs. It is also a poor approximation of reality, since it does not model any of the possible complex interactions within a pack. It thus servers more as a preview of possible future studies. We can then train a Random Forest Regressor on the whole lifetime until failure of the training selection and predict RUL of the testing collection. The figure below presents the resulting predictions in orange against the blue RUL lines of five selected packs, showing not only their striking similarity but also a high accuracy of our predictions (r^{2} ≈ 0.99). It serves as a depiction of our main idea – by aggregating the cells together we effectively “smooth” over their individual impurities, removing uncommon outliers and thus enhancing the predictive capabilities of our model.

Of course, the applicability of this approach needs to be tested extensively against real-world battery health data. Gathering and analyzing it will be the next exciting step on our journey and our contribution toward saving the world. After all, we wholeheartedly agree with Mr. Anderson, and humbly add – the future is electric or not at all.

ENTER THE NEXT LEVEL

## Visit us!

#### Address

Järntorget 4

413 04 Gothenburg