Smart Bot's answer to: !fresh tikaText("http://www.drdobbs.com/cpp/increasing-compiler-speed-by-over-75/240158941") [3000518]

[12575 ms] 
			
			
		

				
			
		

	
			
			
		

	


		


	
	

	
		   
	   
	    
	  
	    		Subscribe
	Newsletters
	Digital Library
	RSS

	
							
		

    	
		
	
		


	
		
	


	Search: 
	 Site
	 Source Code

	

	

		
 	



	Home
	Articles
	News
	Blogs
	Source Code
	Dobb's TV
	Webinars & Events








  	 	
  		
	


  
		



 
 
 

	
	        Sections ▼
					Home
	Articles
	News
	Blogs
	Source Code
	Dobb's TV
	Webinars & Events

	

  
  
  




		







	
	
	
			Cloud
	Mobile
	Parallel
	.NET
	JVM Languages
	C/C++
	Tools
	Design
	Testing
	Web Dev
	Jolt Awards




	  

	
	
	        Channels ▼
					Cloud
	Mobile
	Parallel
	.NET
	JVM Languages
	C/C++
	Tools
	Design
	Testing
	Web Dev
	Jolt Awards

	

  	
	
	
					
			
		

		

	
			
		
    
    	  	
			    	
    	        
	    	
        
            Walter Bright
            
            Dr. Dobb's Bloggers

        

    

    			
    
     
        
            
                
                	                    Bio |  Archive 
                

                Walter Bright

            

             
                
        	            



        					  
            

        

    



		
				
			






































	 
	 
				
		    Tweet
	
        
						 
        					 
		
				    	 
			
		
        
			
		

		         
        		
		
		
		
		

		
		
		
		
		

				
				
		
						 
						
						
			 
			
			Permalink
			

		

		
		
				
	

	
			



	



	
		Increasing Compiler Speed by Over 75%

		 July 25, 2013

	




D is designed to be a language that is amenable to fast compilation.

    

		    
		
	
	

I began implementing compilers on painfully slow machines like the IBM PC, so having the compiler run fast was always a big deal. It got a lot of attention, and the compilers I wrote were known for being very fast. I naturally built the techniques learned along the way into the DMD D compiler, and D itself is designed to be a language that is amenable to fast compilation.





	
More Insights



White Papers
		 
       
			
					
				Managing Access to SaaS Applications				
								
			
					
				Rogue Wave Tools and Libraries for Big Data				
								


More >>Reports
		 
       
			
					
				Hard Truths about Cloud Differences				
								
			
					
				Database Defenses				
								


More >>Webcasts
		 
       
			
					
				Accelerate Cloud Computing Success with Open Standards				
								
			
					
				How to Create an End-to-End Enterprise Payments Hub				
								


More >>



This has been a big success for D, and many engineers have told me that D's compile speed was a huge factor in being seduced by D. Every big boost in compile speed has a transformative effect on the development process.



It's easy to get complacent and focus on other issues with DMD. Subtle  degradations in speed can creep in, especially with a lot of people contributing to the codebase. Compiling the D test suite was taking too much time — making that faster has a lot of leverage in improving productivity. In particular, some D code makes very heavy use of templates, and these were not compiling as fast as I'd like. The library module std.algorithm stood out.



The obvious low hanging fruit was that some templates were being instantiated 70,000 times or more. This is a problem well known among C++ compiler guys — given a template and the arguments to instantiate it, look to see if it's already been instantiated and simply point to the existing one. The lookup for this was, I'm ashamed to admit, a linear list. It's like dragging a block of concrete behind your sports car. Replacing this with a hash table netted a mere 10% improvement, I expected more. How disappointing.



My next idea revolved around the issue that templates, when instantiated, are given a string identifier that uniquely identifies them. You can see these names if you disassemble any template-using code written in C++ or D. They tend to be long, as they incorporate a "mangled" version of each and every template argument.



The thing is, an awful lot of templates are generated that never appear in the  generated code, as they are templates that produce a value or a type. These don't need generated identifiers. I switched the identifier generation to being lazy; i.e., only generated if needed by the object code generation. This also produced a perceptible improvement, but much more modest than I'd anticipated.



Time to stop guessing where the speed problems were, and start instrumenting. Time to trot out gprof, the Gnu profiler. I fired it up on the slow example, and waited. And waited, and waited, and waited. I waited overnight. It pole-axed my Ubuntu box, I had to cold boot it. The test case was so big that it plus gprof was too much. gprof slows things down a lot, so I had to cut things way down to get it to work.



The top three time-eating tasks were:




	lexing

	hash table lookups

	storage allocation





Lexing has always been the succubus leaching the cycles out of a compiler. It's written with that in mind (and so is D's grammar), and I didn't see much more oil to be pressed out of that olive.



I figured hash table lookups would be high on the list, but not that high. DMD makes heavy use of hash tables internally. The symbol tables are all hash tables, the template instantiations are (now!), etc. Looking at the code, nothing too obvious stuck out:



    if (aa)
    {
        size_t len = aa->b_length;
        size_t i = (size_t)key % len;
        aaA* e = aa->b[i];
        while (e)
        {
            if (key == e->key)
                return e->value; // found
            e = e->next;
        }
    }
    return NULL;    // not found




But there is something interesting about it — the key%len operation. Division is a notoriously expensive operation. I could replace it with a mask, but that supposedly leads to a not-so-uniform distribution. A few months back, I implemented in the compiler back end the optimization of replacing an integral divide by constant with a multiply by the reciprocal. [4] It's a lot faster, but here the denominator is a variable, and so I can't use that optimization.



Or can I? The denominator is a prime number pulled out of a table of primes:



    static const size_t prime_list[] = {
              31UL,
              97UL,            389UL,
            1543UL,          6151UL,
            24593UL,          98317UL,
          393241UL,        1572869UL,
          6291469UL,      25165843UL,
        100663319UL,      402653189UL,
      1610612741UL,    4294967291UL,
    };




If the hash table chains get too long, it is rehashed using a larger prime from this table. But most of the time, the symbol tables are small (every scope has its own symbol table). So I rewrote the key%len as:



    size_t i;
    size_t len = aa->b_length;
    if (len == 4)
        i = (size_t)key & 3;
    else if (len == 31)
        i = (size_t)key % 31;
    else
        i = (size_t)key % len;




The smallest bucket has a length of 4, the next larger size is 31. So, for the most common cases, the compiler is able to generate the fast modulo code because it's dividing by a known constant.



This change alone was worth a 5% speed boost. Not bad for 3 or 4 lines of code!



That left storage allocation.



Storage allocation is one of the great unsolved problems in programming. You can do manual allocation at the expense of endless pointer bugs. You can do reference counting with its performance problems and bloat. You can do garbage collection with its pausing and excessive memory consumption.



DMD does memory allocation in a bit of a sneaky way. Since compilers are short-lived programs, and speed is of the essence, DMD just mallocs away, and never frees. This eliminates the scaffolding and complexity of figuring out who owns the memory and when it should be released. (It has the downside of consuming all the resources of your machine if the module being compiled is big enough.)



But malloc() itself is designed with the presumption that the memory allocated will eventually get freed. Since it's not in DMD, I tried replacing the storage allocator with a dead simple bump-the-pointer greedy allocator:



    #define CHUNK_SIZE (4096 * 16)

    static size_t heapleft = 0;
    static void *heapp;

    void * operator new(size_t m_size) {
        // 16 byte alignment is better
        // (and sometimes needed) for doubles
        m_size = (m_size + 15) & ~15;

        // The layout of the code is selected so the
        // most common case is straight through
        if (m_size <= heapleft) {
          L1:
            heapleft -= m_size;
            void *p = heapp;
            heapp = (void *)((char *)heapp + m_size);
            return p;
        }

        if (m_size > CHUNK_SIZE) {
            void *p = malloc(m_size);
            if (p)
                return p;
            printf("Error: out of memory\n");
            exit(EXIT_FAILURE);
            return p;
        }

        heapleft = CHUNK_SIZE;
        heapp = malloc(CHUNK_SIZE);
        if (!heapp) {
            printf("Error: out of memory\n");
            exit(EXIT_FAILURE);
        }
        goto L1;
    }

    void operator delete(void *p) { }




It's like I poured nitrous down the compiler's throat. Along with the other improvements, this achieved a near doubling of the compiler speed. (It even reduced the memory consumption somewhat, apparently because malloc() needed memory for its data structures.)



The benchmark used was compiling Phobos' std.algorithm for unit tests, which is:



    dmd std\algorithm -unittest –main




dmd 2.063 does the nasty in 21.56 seconds, and the latest development version does it in 12.19. This was done on Windows. There's a little bit of apples versus oranges here, because the latest Phobos is larger than the 2.063 one.



(DMD 2.064 with this improvement hasn't been released yet, but of course you can try it out from github.)




Conclusion



Even if you know your code well, you're likely wrong about where the performance bottlenecks are. Use a profiler. If you haven't used one on your codebase in a while, it's highly likely there's a bottleneck in there that's fixable with just a few lines of code.



References




	DMD source code

	hash tables

	storage allocator

	Torbjörn Granlund and Peter Montgomery, Division by Invariant Integers using Multiplication

	std.algorithm source code



Acknowledgments



Thanks to Jason House and Andrei Alexandrescu for their helpful comments on this article.

	
	

	







	












 

		
		
		
		
	             
     
      
    Related Reading

    
        	News
	Commentary


        
            News

			    	Application Intelligence For Advanced Dummies
	SmartBear Supports Selenium WebDriver
	JetBrains Upsource 1.0 Final Release
	Restlet Completes "Complete" API Platform

More News»                        
                        
        

        
            Commentary

			    	Tools To Build Payment-Enabled Mobile Apps
	Java Plumbr Unlocks Threads
	The Touch of a Button
	Abstractions For Binary Search, Part 9: What Do We Need to Test?

More Commentary»                        
            

    
     
      
    
        	Slideshow
	Video


        
            Slideshow

			    	Jolt Awards 2015: Coding Tools
	Developer Reading List
	2013 Developer Salary Survey
	Jolt Awards: The Best Testing Tools

More Slideshows»                        
            
        
        
            Video

			    	IBM Mobile Developer Challenge
	Solve for Tomorrow
	Perceptual Computing
	Connected Vehicles

More Videos»                        
            

    
 
    
    
        	Most Popular


        
            Most Popular

			    	The C++14 Standard: What You Need to Know
	So You Want To Write Your Own Language?
	Unit Testing with Python
	MongoDB with C#: Deep Dive

More Popular»                        
            
        
    
  
    

    
    


			
			
 			
				
					INFO-LINK
	
				
			
		

	
	
				
			
		

	
	
				
			
		

	
	
				
			
		

	






				

				

				

				
							
			
		

					 	
 			

 		

 		
		
		
		
  








	
    	
    		
    	

    	

 
    	
      		Currently we allow the following HTML tags in comments:

			Single tags

			These tags can be used alone and don't need an ending tag. 

			<br> Defines a single line break 

 
			<hr> Defines a horizontal line


			Matching tags

			These require an ending tag - e.g. <i>italic text</i> 

			<a> Defines an anchor


			<b> Defines bold text


			<big> Defines big text



			<blockquote> Defines a long quotation


			<caption> Defines a table caption


			<cite> Defines a citation


			<code> Defines computer code text


			<em> Defines emphasized text



			<fieldset> Defines a border around elements in a form


			<h1> This is heading 1


			<h2> This is heading 2


			<h3> This is heading 3


			<h4> This is heading 4


			
			<h5> This is heading 5


			<h6> This is heading 6


			<i> Defines italic text


			<p> Defines a paragraph


			<pre> Defines preformatted text


			
			<q> Defines a short quotation


			<samp> Defines sample computer code text


			<small> Defines small text


			<span> Defines a section in a document


			<s> Defines strikethrough text


			
			<strike> Defines strikethrough text 


			<strong> Defines strong text 


			<sub> Defines subscripted text 


			<sup> Defines superscripted text 


			<u> Defines underlined text



    	
  
	


	
		  Dr. Dobb's encourages readers to engage in spirited, healthy debate, including taking us to task.  
		  However, Dr. Dobb's moderates all comments posted to our site, and reserves the right to modify or remove any content that it determines to be derogatory, offensive, inflammatory, vulgar, irrelevant/off-topic, racist or obvious marketing or spam. Dr. Dobb's further reserves the right to disable the profile of any commenter participating in said activities.

	
	


	 
		
			To upload an avatar photo, first complete your Disqus profile. | View the list of supported HTML tags you can use to style comments. | Please read our commenting policy.
			
		
	 

  















		





		
 
		
			










		

		
			
 						
		 
				


			
			
				
	C/C++ Recent Articles

		Dr. Dobb's Archive
	Jolt Awards 2015: Coding Tools
	Building Node.js Projects in Visual Studio
	Building Portable Games in C++
	C# and .NET's Sudden Ubiquity
	
	

			

			
						
			
				Most Popular



	Stories
	Blogs



		
					The C++14 Standard: What You Need to Know
					
	
					A Simple and Efficient FFT Implementation in C++:
 Part I
					
	
					State Machine Design in C++
					
	
					Lambdas in C++11
					
	
					A Lightweight Logger for C++
					
	
					
					
	
					
					





		
					C++11: unique_ptr
					
	
					C++11's async Template
					
	
					Abstractions For Binary Search, Part 10: Putting It All Together
					
	
					It's Hard To Compare Floating-Point Numbers
					
	
					Auto Types and Range-Based For Statements in C++11
					
	
					
					
	
					
					



			

			
			
							

			
			
				

	This month's Dr. Dobb's Journal

	
		
			
				
					
				
			

			
				This month, 
Dr. Dobb's Journal is devoted to mobile programming. We introduce you to Apple's new Swift programming language, discuss the perils of being the third-most-popular mobile platform, revisit SQLite on Android		
				, and much more!


				Download the latest issue today. >> 
			

		
	
	
				 
			

			
			
			
			
		

		
			



   


    Upcoming Events

    
    
    
       
         Live Events
         WebCasts
                 
       

              
       
           		 
     	  	Interop ITX: The Independent Conference For Tech Leaders (April 30 - May 4 In Las Vegas) - InteropITX 2018
	                 
	Enterprise Connect Conference and Expo: March 12-15 - Enterprise Connect Orlando
	                 
	Speech Technologies - New Track at Enterprise Connect! - Enterprise Connect Orlando
	                 
	Transform Business Processes with API-enabled Integrations - Enterprise Connect Orlando
	                 


  
 
   
			
       

      
                        
         		 
      
                        		Agile Desktop Infrastructures: You CAN Have It All                        	
                            
	Mobile Content Management: What You Really Need to Know                        	
                            
	New Technologies to Optimize Mobile Financial Services                        	
                            
	How to Stop Web Application Attacks                        	
                            
	Developing a User-Centric Secure Mobile Strategy: It's in Reach                        	
                            


	   
		   

           		 More Webcasts>>

     	   
        
		       
       
       
                
    
    









	Featured Reports

	 
			 

			What's this?

		

	
				 
       
			
					
				Return of the Silos				
								
			
					
				Strategy: The Hybrid Enterprise Data Center				
								
			
					
				State of Cloud 2011: Time for Process Maturation				
								
			
					
				SaaS and E-Discovery: Navigating Complex Waters				
								
			
					
				Research: Federal Government Cloud Computing Survey				
								


		More >>

		
	
	






	Featured Whitepapers

	 
			 

			What's this?

		

	
				 
       
			
					
				Driving Your Cloud Strategy with Private Network Solutions				
								
			
					
				Managing Access to SaaS Applications				
								
			
					
				The People Problem: Cyber Threats Aren't Just a Technology Challenge				
								
			
					
				Simplify IT With Cloud-Based Wireless Management				
								
			
					
				State of Private Cloud Report: Lessons from Early Adopters				
								


		More >>

		
	
	







 
			
				Most Recent Premium Content


	Digital Issues

	


2014

Dr. Dobb's Journal
	November - Mobile Development
	August - Web Development
	May - Testing
	February - Languages




Dr. Dobb's Tech Digest

	DevOps
	Open Source
	Windows and .NET programming
	The Design of Messaging Middleware and 10 Tips from Tech Writers
	Parallel Array Operations in Java 8 and Android on x86: Java Native Interface and the Android Native Development Kit




2013
	January - Mobile Development
	February - Parallel Programming
	March - Windows Programming
	April - Programming Languages
	May - Web Development
	June - Database Development
	July - Testing
	August - Debugging and Defect Management
	September - Version Control
	October - DevOps
	November- Really Big Data
	December - Design




2012
	January - C & C++
	February - Parallel Programming
	March - Microsoft Technologies
	April - Mobile Development
	May - Database Programming
	June - Web Development
	July - Security
	August - ALM & Development Tools
	September - Cloud & Web Development
	October - JVM Languages
	November - Testing
	December - DevOps


	
	
2011


			
			

		

		
		
		
			







  
    
    
    
    
    

    
    
      
        	TECHNOLOGY GROUP
	Black Hat
	Content Marketing Institute
	Content Marketing World
	Dark Reading


        	Enterprise Connect
	GDC
	Gamasutra
	HDI


        	ICMI
	InformationWeek
	INsecurity
	Interop ITX


        	Network Computing
	No Jitter
	Service Management World
	VRDC


      

      
        	COMMUNITIES SERVED
	Content Marketing
	Enterprise IT
	Enterprise Communications
	Game Developers
	Information Security
	IT Services & Support


      

      
        	WORKING WITH US
	Advertising Contacts
	Event Calendar
	Tech Marketing
	Solutions
	Contact Us
	Licensing


      

    

  

    

  
    	Terms of Service
	Privacy Statement
	Legal Entities
	Copyright © 2018 UBM, All rights reserved


  

  










      
	
	  
      	Dr. Dobb's Home
	Articles
	News
	Blogs
	Source Code
	Dobb's TV
	Webinars & Events


	  
	

	  
	
	  
      	About Us
	Contact Us
	Site Map
	Editorial Calendar
Travelled to 12 computer(s): aoiabmzegqzx, bhatertpkbcr, cbybwowwnfue, gwrvuhgaqvyk, ishqpsrjomds, lpdgvwnxivlt, mqqgnosmbjvj, pyentgdyhuwx, pzhvpgtvlbxg, tslmcundralx, tvejysmllsmz, vouqrxazstgt
Snippet ID:	#3000518
Snippet name:	Smart Bot's answer to: !fresh tikaText("http://www.drdobbs.com/cpp/increasing-compiler-speed-by-over-75/240158941")
Eternal ID of this version:	#3000518/1
Text MD5:	d799d7d1c59b1bb56308df93ed788772
Author:	someone
Category:
Type:	New Tinybrain snippet
Gummipassword:	smart-bot-for-user
Uploaded from IP:	185.183.156.46
Public (visible to everyone):	Yes
Archived (hidden from active list):	No
Created/modified:	2018-01-14 11:05:42
Source code size:	19965 bytes / 1443 lines
Pitched / IR pitched:	No / No
Views / Downloads:	452 / 142
Referenced in:	[show references]
< > BotCompany Repo | #3000518 // Smart Bot's answer to: !fresh tikaText("http://www.drdobbs.com/cpp/increasing-compiler-speed-by-over-75/240158941")

New Tinybrain snippet