[1] F. J. Pollack, “New micro architecture challenges in the coming generations of CMOS process technologies”, in Proceedings of the 32nd annual ACM/IEEE international symposium on Microarchitecture, 1999, p.2.
[2] G. E. Moore, “Cramming more components onto integrated circuits”, Electronics, Vol. 86, No. 1, 1998, pp. 82-85.
[3] C. McNairy and R. Bhatia, “Montecito: a dual-core, dual-thread titanium processor”, IEEE Micro, Vol. 25, No. 2, 2005, pp. 10–20.
[4] S. Naffziger, B. Stackhouse, T. Grutkowski, D. Josephson, J. Desai, E. Alon and M. Horowitz, “The implementation of a 2-core, multi-threaded titanium family processor”, IEEE Journal of Solid-state circuits, Vol. 41, No. 1,2005, pp. 197–209.
[5] A. Carbine and D. Feltham, “Pentium pro processor design for test and debug”, IEEE Design &Test of Computer, Vol. 15, No. 3, 1998, pp. 77–82.
[6] J. W. Langston and X.He, ”Multi-core Processors and caching: A Survey” , http://blogs.cae.tntech.edu/jwlangston21/files/2008/08/multi-core-processors-and-caching-a-survey-ieee-format.pdf
[7] V. Romanchenko, “Evaluation of the multi-core processor architecture Intel core: Conroe, Kentsfield...“, in Digital-Daily.com, 2006.
[8] V. P. Heuring and H. F. Jordan, Computer Systems Design and Architecture, Prentice Hall, 2004.
[9] J. L. Hennessy, D. A. Patterson, Computer architecture: a quantitative approach, Morgan Kaufmann Publishers, 2007.
[10] D. Tam, R. Azimi, L. Soares and M. Stumm,” Managing shared L2 caches on multi-core systems in software”, in Workshop on the Interaction between Operating Systems and Computer Architecture, 2007, pp. 26-33.
[11] F. Guo and Y. Solihin, “An analytical model for cache replacement policy performance”, ACM SIGMETRICS Performance Evaluation Review, Vol. 34, No. 1, 2006, pp. 228-229.
[12] H. Kannan, F. Guo, L. Zhao, R. Illikkal, R. Iyer, D. Newell, Y. Solihin and C.Kozyrakis, “From chaos to QoS: case studies in CMP resource management”, in ACM SIGARCH computer Architecture News, Vol. 35, No. 1, 2007, pp. 21-30.
[13] M. Qureshi and Y. Patt, “Utility-based cache partitioning: a low overhead, high-performance, runtime mechanism to partition shared caches”, in Micro 39, 2006, pp. 422-432.
[14] S. Cho and L. Jin, “Managing distributed, shared L2 caches through OS-level page allocation,” in Micro 39, 2006, pp. 455-468.
[15] L. Jin and S. Cho, “Better than the two: exceeding private and shared caches via two-dimensional page coloring”, in Workshop on Chip Multiprocessor Memory Systems and Interconnects, 2007.
[16] A. Asaduzzaman, F. N. Sibai, and M. Rani, “Impact of level-2 cache sharing on the performance and power requirements of homogeneous multi-core embedded systems“, Microprocessors and Microsystems, Embedded Hardware Design, Vol. 33, No. 5,2009, pp. 388-397.
[17] R. Manikantan, K. R. Govindarajan,” Nucache: an efficient multi-core cache organization based on next-use distance”, in the proc of the 17th International Computer Architecture, 2011, pp. 243-253.
[18] M. D. Hill and A. J. Smith, “Evaluating associativity in cpu caches”, IEEE Transactions on Computer’s, Vol. 38, No. 12, 1989, pp. 1612–1630.
[19] D. Chandra, F. Guo, S. Kim and Y. Solihin, “Predicting inter-thread cache contention on a chip multi-processor architecture”, In HPCA, 2005, pp. 340–351.
[20] R. Iyer, “CQOS: A framework for enabling QoS in shared caches of CMP platforms”, in Proc. Annual International Conference on Supercomputing, 2004, pp. 257–266.
[21] C.Xu, X. Cheny, R. P. Dicky and Z. M. Mao,” Cache contention and application performance prediction for multi-core systems”, in Performance Analysis of Systems & Software (ISPASS), 2010, pp.76-86.
[22] D. K. Tam, R. Azimi, L. B. Soares, and M. Stumm,“ Rapid MRC: approximating L2 miss rate curves on commodity systems for online optimizations”, ACM SIGARCH Computer Architecture News, Vol. 37, No. 1. ACM, 2009, pp. 121–132.
[23] D. Kaseridis, M. F. Iqbal and L. K. John, “Cache friendliness-aware management of shared last-level cachesfor high performance multi-core systems”, IEEE transactions on computers, Vol. 63, 2014, pp. 874-887.
[24] N.L. Binkert, R.G. Dreslinski, L.R. Hsu, K.T. Lim, A.G. Saidi, and S. K. Reinhardt, “The m5 simulator: Modeling networked systems”, IEEE Micro, Vol. 26, No. 4,2006, pp. 52–60.
[25] T. Austin, E. Larson and D. Ernst, “Simple scalar: an infrastructure for computer system modeling”, IEEE Computer, Vol. 35, No. 2, 2002, pp. 59–67.
[26] Compaq. Alpha 21264 Microprocessor Hardware Reference Manual. Technical report, Compaq Computer Corporation, 1999.
[27] The Standard Performance Evaluation Corporation. http://www.spec.org/.
[28] C. Lee, M. Potkonjakand W. H. M. Smith, “Media bench: a tool for evaluating and synthesizing multimedia and communications systems”, In MICRO 30: Proceedings of the 30th annual ACM/IEEE international symposium on Micro architecture, 1997, pp. 330–335.
[29] S. M. Khan, A. R. Alameldeen, C. Wilkerson, J. Kulkarni and D. A. Jimenez, “Improving multi-core performance using mixed-cell cache architecture,” IEEE 19th International Symposium on High Performance Computer Architecture (HPCA), 2013, pp. 119-130.
[30] W. Zang and A. G. Ross, “A single-pass cache simulation methodology for two-level unified caches”, IEEE International Symposium on Performance Analysis of Systems & Software, Vol. 0, 2012, pp. 168-177.
[31] A. Asaduzzaman, V. R. Suryanarayana, F. N. Sibai, “on level-1 cache locking for high performance low-power real-time multi-core systems”, computers and electrical engineering, Vol. 39, 2013, pp. 1333-1345.
[32] I. Kotra, “Performance and power aware cache memory architectures”, Ph.D. thesis, Department of Computer and Mathematical Sciences, TOHOKU University, Sendai, Japan, 2009.