The relative performance improvement of Algorithm 1 with different numbers of DMRG kept states, comparing to the previous work. Optimization on the CPU (□) increases about 20% performance for our largest D. The total improvement (°) is above 40% for D = 4096, and decreases as D increases. |