spssread.pl: a Perl script to parse SPSS SAV file metadata

I have been spending some time trying to figure out why R’s read.spss() function won’t read Qualtrics-generated SPSS SAV files. (Qualtrics is a very nice online survey system which we have been using with one of our partners.)

I have to admit that I have no interest in the structure of SPSS files (or most others, for that matter), so I was very glad to find Scott Czepiel’s spssread.pl Perl script to parse and display metadata.

So far I can tell that R’s read.spss()/code> is croaking on null characters (as in ASCII 0) at the end of variable names. What was puzzling is that the open source PSPP seems to read these Qualtrics files just fine and the read.spss() code was originally based on PSPP.

To read these files from R, I have been reading them into PSPP first and saving new copies.

Thanks to spssread.pl, I can now see that PSPP doesn't like these variable names either. But instead of croaking, PSPP simply assigns new variable names as spssread.pl -r shows:

$ ./spssread.pl -r qualtrics_short.sav 
Name	Type	Label
A_1	String (20)	ResponseID  
A_2	String (20)	ResponseSet 
A_3	String (255)	Name
A_4	String (255)	ExternalDataReference   
A_5	String (255)	Email   
A_6	String (255)	IPAddress   
A_7	String (255)	StartDate   
A_8	String (255)	EndDate 
A_9	Numeric	Finished
A_10	Numeric	Many airlines are involved in a continui
A_11	Numeric	Please check which applies to this trip.
A_12	Numeric	About how full was your cabin of the air
A_13	Numeric	What was the primary purpose of this fli
A_14	Numeric	Who made the decision regarding the airp
A_15	Numeric	Please divide 100 points among the five -Schedule convenience   
A_16	Numeric	Please divide 100 points among the five -Preference for airline 
A_17	Numeric	Please divide 100 points among the five -Frequent flyer/Mileage program 
A_18	Numeric	Please divide 100 points among the five -Ticket price   
A_19	Numeric	Please divide 100 points among the five -Company policy 
A_20	Numeric	How close to the scheduled departure tim
A_21	Numeric	Please rate the services you received fr-Speed in getting through to Agent  
A_22	Numeric	Please rate the services you received fr-Helpfulness of Agent   
A_23	Numeric	Please rate the services you received fr-Courtesy of Reservation Agent  
A_24	Numeric	Please rate the services you received fr-Accuracy of flight information 
A_25	Numeric	Please rate the services you received fr-Accuracy of fare information   
A_26	Numeric	Please rate the services you received fr-Value for the money
A_27	Numeric	Please rate the services you received fr-Overall rating of the flight   
A_28	String (255)	Including this trip how many air trips fBusiness
A_29	String (255)	Including this trip how many air trips fPleasure
A_30	Numeric	For classification purposes are you...  
A_31	String (255)	Approximate age:
A_32	Numeric	Occupation  
A_33	Numeric	Approximately how many people are employ
A_34	String (255)	City and state of residence:
A_35	Numeric	THANK YOU FOR YOUR COOPERATION. 

$ ./spssread.pl -r pspp_short.sav 
Name	Type	Label
V1      	String (20)	ResponseID  
V2      	String (20)	ResponseSet 
V3      	String (255)	Name
V4      	String (255)	ExternalDataReference   
V5      	String (255)	Email   
V6      	String (255)	IPAddress   
V7      	String (255)	StartDate   
V8      	String (255)	EndDate 
V9      	Numeric	Finished
A22777  	Numeric	Many airlines are involved in a continui
A22778  	Numeric	Please check which applies to this trip.
A22779  	Numeric	About how full was your cabin of the air
A22780  	Numeric	What was the primary purpose of this fli
A22781  	Numeric	Who made the decision regarding the airp
A22782_1	Numeric	Please divide 100 points among the five -Schedule convenience   
A22782_2	Numeric	Please divide 100 points among the five -Preference for airline 
A22782_3	Numeric	Please divide 100 points among the five -Frequent flyer/Mileage program 
A22782_4	Numeric	Please divide 100 points among the five -Ticket price   
A22782_5	Numeric	Please divide 100 points among the five -Company policy 
A22783  	Numeric	How close to the scheduled departure tim
A_21    	Numeric	Please rate the services you received fr-Speed in getting through to Agent  
A_22    	Numeric	Please rate the services you received fr-Helpfulness of Agent   
A_23    	Numeric	Please rate the services you received fr-Courtesy of Reservation Agent  
A_24    	Numeric	Please rate the services you received fr-Accuracy of flight information 
A_25    	Numeric	Please rate the services you received fr-Accuracy of fare information   
A_26    	Numeric	Please rate the services you received fr-Value for the money
A_27    	Numeric	Please rate the services you received fr-Overall rating of the flight   
A22823_0	String (255)	Including this trip how many air trips fBusiness
A22823_1	String (255)	Including this trip how many air trips fPleasure
A22825  	Numeric	For classification purposes are you...  
A22826_0	String (255)	Approximate age:
A22827  	Numeric	Occupation  
A22828  	Numeric	Approximately how many people are employ
A22829_0	String (255)	City and state of residence:
Q16     	Numeric	THANK YOU FOR YOUR COOPERATION. 

File header information can be displayed with spssread.pl -h:

$ ./spssread.pl -h qualtrics_short.sav 

Record type         $FL2
Product name        @(#) SPSS DATA FILE PHP Writer (c) Qualtrics - 0.9.0        
Layout code         2
Case Size           349
Compression         1
Weight index        0
Number of cases     -1
Bias                100.000000
Creation date       21 Jul 10
Creation time       10:28:27
File label                                                                          

$ ./spssread.pl -h pspp_short.sav 

Record type         $FL2
Product name        @(#) SPSS DATA FILE GNU pspp 0.7.6-g55e6e7 - i386-apple-darw
Layout code         2
Case Size           349
Compression         1
Weight index        0
Number of cases     1
Bias                100.000000
Creation date       15 Dec 10
Creation time       12:57:31
File label                                                                          

Thanks, Scott. spssread.pl sure beats the heck out of some quality time with od and the SAV file format docs!