At the moment we have the issue that we need more charging ports for electric cars and busses. Also the electric grid under enormous pressure, because of the electric vehicles that have entered the market. The electricity grid goes from low to high voltage levels. It does this by using transformers. A transformer works by going from a big coil to a large coil. If you are putting a lot of strain on these transformers you get a lot of heat which could damage the transformer and therefore it requires more maintenance. Also more energy is lost. Because of this we need to add more transformers. We have a grid with certain capacities, EV demands, Fulfill demands as soon as possible while trying to keep overload to a minimum - Cumulative load $P_{F,t}=P_{H,t}+P_{\text{EV},t}+P_{\text{Line},t}$ - Transformer load ratio $\Lambda_t=?$ - bidirectional charging infrastructure with continuous charging rate - planned departure time & desired state of charge (SOC) when leaving are given - Impact of a charging rate on the SOC over a timestep - $\alpha_{i,t}$ = actual charging rate, control signal between -1 and 1 - $\delta t^{\text{slot}}$= timestep duration - To quantify the charging demand urgency: - The time steps left until departure: $\Delta t_{i,t}^\text{depart}=t_i^{\text{depart}}-t+1$ # reinforcement learning approach Imagine you have an an agent in an environment that given action gives a state and reward. Offline optimization is difficult to model and not generalizable. Online methods have no knowledge of the future. **problem formulation:** - Each EV can receive an individual control signal; not scalable with reinforcement learning - Get scalable state and action representation In the graph we get certain peaks which represent that are arriving or leaving in the morning and evening. **Reinforcement learning algorithm** EV charging management; sequential decision making problem, formulated using a Markov Decision Process (MDP) with unknown transition probabilities. Reward function $r_t$: meet EV demand asap, avoiding transformer overloading, without prior knowledg of future arriving of EVS and power consumptions of households. Multi-objective RL framework with conflicting objectives. # Case-study and simulation results Used data from Kerber Network: LV distribution grid 144 households, fed by single transformer. EV penetration is 80%: 115 private charging points. We see that the training process converges after 5000 episodes. RL-agent intervention shifts charging to off-peak hours. A negative charging rate (V2G) from some EV groups to reduce overload risk. Day twelve and thirteen agents reacts fluctuating but overload risk is low. # Points of discussion - Determine optimal grouping criterion - Include additional info from EV customers - Battery deterioration - Something that is good that they took a winter period to work with - Real life applications - Scalable