HIL test platform setup
To verify the effectiveness of the DDQN strategy, this paper has established a HIL testing platform as shown in Fig. 9 to conduct HIL tests on the proposed energy management strategy. The HIL testing platform is primarily composed of the vehicle ECU controller PowerECU-57A produced by Shandong Hydrogen Exploration New Energy Technology Co., Ltd. in China, and the HIL testing cabinet manufactured by National Instruments in the United States. The HIL testing process is illustrated in Fig. 9: The ECU communicates with the NI real-time simulation machine using analog voltage signals for data transmission. By utilizing the corresponding Matlab/Simulink plugins, PowerECU-Toolbox for the ECU and NI-VeriStand Blocks for the NI real-time simulation machine, appropriate analog I/O communication modules are added to the tractor model and energy management strategy model built in this paper. Using the target language compiler (TLC) files corresponding to the ECU and NI real-time simulation machine, along with the Matlab/RTW code generator, the tractor model and energy management strategy model are compiled into C code. The compiled code is then burned into the ECU and NI real-time simulation machine using PowerBOOT V1.10 and NI-VeriStand 2020, respectively. Finally, software configuration and data monitoring of the ECU and NI real-time simulator are performed using Power-CAL V1.32 and NI-VeriStand 2020.
Result analysis
Plowing condition
Under plowing conditions, the desired vehicle velocity tracking performance during HIL testing is shown in Fig. 10. The average error between the desired vehicle velocity and the current vehicle velocity remains at 0.028 km/h, indicating that the simulation model can accurately exchange data during the HIL testing.

Vehicle velocity tracking effect under plowing condition.
Under plowing conditions, the iterative results of the DDQN algorithm are shown in Fig. 11. The iterative rewards and average iterative rewards begin to converge during the 74th and 81st iteration training processes, respectively. After 100 iterations of training, the reward curve tends to stabilize, verifying the reliability of the algorithm’s iterative results.

Iteration results of the DDQN algorithm under plowing condition.
As shown in Fig. 12a, under the plowing condition, the driving motor power is concentrated between 40 and 50 kW, with an average load power of 40.90 kW and a total power consumption of approximately 34.09 kWh. As illustrated in Fig. 12b, under the control of the DDQN strategy, the engine starts up to charge the battery within approximately 15 s, and thereafter, the battery power remains below 23.44 kW. Under the control of the PF strategy, the battery power equals the driving motor power multiple times, indicating that during the PF strategy control process, the engine starts and stops multiple times to charge the battery.

Plowing condition, (a) drive motor power; (b) battery power.
As shown in Fig. 13a, under the control of the DDQN strategy, the engine power is concentrated between 23 and 52 kW, with an average power of 35.62 kW and a total power consumption for electricity generation of approximately 29.70 kWh. Under the control of the PF strategy, the engine power is concentrated between 47 and 58 kW, with an average power of 36.19 kW and a total power consumption for electricity generation of approximately 30.17 kWh. As illustrated in Fig. 13b, the engine operating points selected by both the PF strategy and the DDQN strategy are concentrated near the OOL curve. However, under the control of the DDQN strategy, the engine’s operating range is wider.

Plowing condition, (a) engine power; (b) engine operating point.
As shown in Fig. 14a, under the control of the PF strategy, the remaining SOC is approximately 60.04%, while under the control of the DDQN strategy, the remaining SOC is about 60.75%. The difference in remaining SOC between the two strategies is approximately 1.18%. Under the control of the DDQN strategy, the trend of the SOC curve is smoother compared to the PF strategy, with no continuous significant increases or decreases in SOC. As illustrated in Fig. 14b, the equivalent fuel consumption of the PF strategy and the DDQN strategy is approximately 13.84 L and 12.40 L respectively, and the diesel consumption is approximately 8.27 L and 7.89 L respectively. Compared to the PF strategy, the DDQN strategy reduces equivalent fuel consumption by 10.40% and diesel consumption by 4.59%.

Plowing condition, (a) SOC change curve; (b) fuel consumption.
Rotary tillage condition
Under the rotary tillage condition, the desired vehicle velocity tracking effect of the HIL test is shown in Fig. 15, where the average error between the expected vehicle speed and the current vehicle speed remains at 0.030 km/h.

Vehicle velocity tracking effect under rotary tillage condition.
Under rotary tillage conditions, the iteration results of the DDQN algorithm are shown in Fig. 16. The iteration reward and average iteration reward began to converge at the 68th and 73rd iteration training processes, respectively. After 100 iteration trainings, the reward curve stabilized, verifying the reliability of the algorithm’s iteration results.

Iteration results of the DDQN algorithm under rotary tillage condition.
As shown in Fig. 17a, under rotary tillage conditions, the driving motor power is concentrated between 40 and 50 kW, with an average load power of 42.10 kW, and the total power consumption is approximately 35.09 kWh. As shown in Fig. 17b, under the control of the DDQN strategy, the battery power variation is similar to that in plowing conditions. Approximately 11 s after the engine starts, it charges the battery, and thereafter the battery power remains below 29.08 kW. Similarly, under the control of the PF strategy, there are also multiple instances where the battery power is equal to the driving motor power.

Rotary tillage condition, (a) drive motor power; (b) battery power.
As shown in Fig. 18a, under the control of the DDQN strategy, the engine power is concentrated between 23 and 51 kW, with an average power of 34.17 kW, and the total power consumption for electricity generation is approximately 28.48 kWh. Under the control of the PF strategy, the engine power is concentrated between 47 and 60 kW, with an average power of 35.04 kW, and the total power consumption for electricity generation is approximately 29.21 kWh. As shown in Fig. 18b, the engine operating points selected by both the PF strategy and the DDQN strategy are concentrated near the OOL curve. Similar to the results obtained in plowing conditions, the engine operating range is wider under the control of the DDQN strategy.

Rotary tillage condition, (a) engine power; (b) engine operating point.
As shown in Fig. 19a, the remaining SOC under the PF strategy control is approximately 53.1%, while under the DDQN strategy control, it is approximately 51.46%, with a difference of about 3.09% between the two strategies. Similar to the results obtained in plowing conditions, the trend of the SOC curve under the DDQN strategy control is gentler compared to the PF strategy. As shown in Fig. 19b, the equivalent fuel consumption under the PF strategy and DDQN strategy is approximately 13.40 L and 12.09 L respectively, with diesel consumption of approximately 7.97 L and 7.59 L respectively. Compared to the PF strategy, the DDQN strategy reduces equivalent fuel consumption by 9.78% and diesel consumption by 4.77%.

Rotary tillage condition, (a) SOC change curve; (b) fuel consumption.
Transportation condition
Under transportation conditions, the desired vehicle velocity tracking performance during HIL testing is shown in Fig. 20, with the average error between the expected vehicle speed and the current vehicle speed maintained at 0.070 km/h.

Vehicle velocity tracking effect under transportation condition.
Under transportation conditions, the iterative results of the DDQN algorithm are shown in Fig. 21. The iterative rewards and average iterative rewards begin to converge during the 66th and 72nd iterative training sessions respectively. After 100 iterations of training, the reward curve tends to stabilize, verifying the reliability of the algorithm’s iterative results.

Iteration results of the DDQN algorithm under transportation condition.
As shown in Fig. 22a, under transportation conditions, the driving motor power is concentrated between 50 and 105 kW, with an average load power of 65.63 kW, and the total power consumption is approximately 29.19 kWh. As shown in Fig. 22b, under the control of both the DDQN strategy and the PF strategy, the maximum charging power of the battery is 70.92 kW and 58.90 kW respectively, and the trend of battery power change is basically consistent.

Transportation condition, (a) drive motor power; (b) battery power.
As shown in Fig. 23a, under the control of the DDQN strategy, the engine power is concentrated between 23 and 76 kW, with an average power of 53.61 kW, and the total power consumption for electricity generation is approximately 23.84 kWh. Under the control of the PF strategy, the engine power is concentrated between 50 and 80 kW, with an average power of 53.48 kW, and the total power consumption for electricity generation is approximately 23.78 kWh. As shown in Fig. 23b, the engine operating points selected by both the PF strategy and the DDQN strategy are concentrated near the OOL curve. Compared to the plowing and rotary tillage conditions, the engine operating range under both control strategies has increased.

Transportation condition, (a) engine power; (b) engine operating point.
As shown in Fig. 24a, the remaining SOC under the PF strategy control is approximately 56.35%, while under the DDQN strategy control, it is approximately 55.51%, with a difference of about 1.49% between the two strategies. As shown in Fig. 24b, the equivalent fuel consumption under the PF strategy and DDQN strategy is approximately 6.48 L and 5.86 L respectively, with diesel consumption of approximately 6.14 L and 6.18 L respectively. Compared to the PF strategy, the DDQN strategy reduces equivalent fuel consumption by 9.57% but increases diesel consumption by 0.65%.

Transportation condition, (a) SOC change curve; (b) fuel consumption.
Comparative discussion
The HIL test results for the DDQN strategy and PF strategy under different operating conditions are shown in Fig. 25.

HIL test results, (a) equivalent fuel consumption and diesel consumption; (b) remaining SOC.
Under plowing conditions, the DDQN strategy saved 1.18% of SOC, reduced equivalent fuel consumption by 10.40%, and decreased diesel consumption by 4.59% compared to the PF strategy.
Under rotary tillage conditions, the DDQN strategy consumed 3.09% more SOC, reduced equivalent fuel consumption by 9.78%, and decreased diesel consumption by 4.77% compared to the PF strategy.
Under transportation conditions, the DDQN strategy consumed 1.49% more SOC, reduced equivalent fuel consumption by 9.57%, and increased diesel consumption by 0.65% compared to the PF strategy.
Based on the analysis of the HIL test data results, it can be concluded that under the three operating conditions of plowing, rotary tillage, and transportation, the DDQN strategy consumes up to 3.09% more remaining SOC and increases diesel consumption by a maximum of 0.65% compared to the PF strategy. Therefore, it can be inferred that under the premise of similar energy consumption, the DDQN strategy significantly reduces equivalent fuel consumption compared to the PF strategy.
