*In this do file we present the retail econometric models for IAP in PR19 *This do file has the following stages *1.- DATA PREPARATION *1a. - Open master dataset, assign labels and clean up variables *1b. - Organise variables *1c. - Renaming and generating dependent variables *1d. - Set up panel and logs *1e. - Create the unbalanced panel data. *2. - SET UP GLOBALS FOR REGRESSIONS *2a. - Other costs *2b. - Bad debt costs *2b. - Total costs *3. - REGRESS AND EXPORT RESULTS *3a. - Regressions *3b. - Pooled OLS (pooled OLS regressions have been added for reference) *3c. - Random effects *4. - SAVE RESULTS *4a. - Export results *4b. - Export time and date of run ************************************************************************************************************ clear set more off set matsize 800 matrix drop _all estimates drop _all cd "O:\OFWSHARE\PR19 Modelling\Live models\Cost Assessment" /*Set so that all new variables that are created must be in double format (definition: this data storage option has about 16 digits of accuracy with a trade off of a larger do-file size) rather than float format (definition: this default data storage option has only 7 digits of accuracy) in order to ensure accuracy to the last decimal point (espeically important when doing the transformations from £ millions to £s) */ set type double, perm ************************************************************************************************************ * 1. DATA PREPARATION ************************************************************************************************************ *1a. - Open master dataset, assign labels to variables and clean up *Import import excel "FM_RR1.xlsx", /// cellrange(A2:AH231) sheet("data for modelling") firstrow * Labels foreach var of varlist * { label variable `var' "`=`var'[1]'" } drop in 1/3 * Renames rename A codecombine rename B companycode rename Histboncode financialyear * For retail data, only info from 2013-14 onwards is used (anything before is not always accurate) drop if financialyear == "2010-11" drop if financialyear == "2011-12" drop if financialyear == "2012-13" drop if financialyear == "2018-19" drop if financialyear == "2019-20" drop if financialyear == "2020-21" drop if financialyear == "2021-22" drop if financialyear == "2022-23" drop if financialyear == "2023-24" drop if financialyear == "2024-25" ************************************************************************************************************ *1b. - Variables organisation * Destring g start="" g end="" order codecombine companycode financialyear start destring start-end, replace * year g year=. replace year=2014 if financialyear=="2013-14" replace year=2015 if financialyear=="2014-15" replace year=2016 if financialyear=="2015-16" replace year=2017 if financialyear=="2016-17" replace year=2018 if financialyear=="2017-18" * Unit conversions global Vars_in_million /// /*Ofwat*/ DC_t sTC_tr sOC_tr /// global Vars_in_000 /// /*Ofwat*/ hh_t global Vars_in_perc /// /*Ofwat*/ hhm_hh hhdu_hh /// /*External*/ incomescore eq_lpcf62 counciltax totalmigration /// global Vars_in_units /// /*Ofwat*/ rev_hh /// * Million * 1,000,000 foreach v of varlist $Vars_in_million { summ `v' rename `v' `v'_million g `v' = `v'_million*1000000 label variable `v' "`v' (units)" } * Thousands * 1,000 foreach v of varlist $Vars_in_000 { summ `v' rename `v' `v'_000 g `v' = `v'_000*1000 label variable `v' "`v' (units)" } * Percs * 100 foreach v of varlist $Vars_in_perc { summ `v' rename `v' `v'_perc g `v' = `v'_perc label variable `v' "`v' (%)" } summ $Vars_in_perc * Units * 1 (no change, only label) foreach v of varlist $Vars_in_units { summ `v' rename `v' `v'_unit g `v' = `v'_unit*1 label variable `v' "`v' (£/hh)" } /*Our independent variables are denoted by the following acronyms: *hhdu_hh: % of dual service customers *hhm_hh: % of metered customers *lnhh_t: log (number of connected households) *lnrev_hh: log (average bill size £/hh) *eq_lpcf62: % of households with default *counciltax: % council tax collection rate *incomescore: % income deprived (IMD income domain) *totalmigration: % total internal and international migration */ ************************************************************************************************************ *1c. - Renaming and generating dependent variables * Generate unit cost dependent variables /* We model at 3 levels of aggregation i) Total costs per customer. The acronym is sTC_tr ii) Bad debt related costs per customer. The acronym is DC_t:. iii) Other costs per customer. The acronym is sOC_tr i) Total costs per customer are calculated in the following way. customer services + debt management + doubtful debt + meter reading + other operating expenditure + depreciation smoothed over 5 years + recharges costs net of recharges income + (local authority rates + exceptional items when reported separately) / number of connected households third party costs and pension deficit repair costs are excluded. ii) Bad debt related costs per customer = doubtful debt + debt management costs / number of connected households. iii) Other costs per customer = Total costs per customer - Bad debt related costs per customer */ g DC_hh = DC_t / hh_t label variable DC_hh "Bad debt related cost per household (£/hh)" g sTC_hh = sTC_tr / hh_t label variable sTC_hh "Total cost per household with smoothed depreciation(£/hh)" g sOC_hh = sOC_tr / hh_t label variable sOC_hh "Total cost less debt per household with smoothed depreciation(£/hh)" ************************************************************************************************************ * 1d. - Set up panel and logs * Set up panel encode companycode, g(id) xtset id year * Set natural logs foreach v of varlist hh_t rev_hh DC_hh sOC_hh sTC_hh { g ln`v' = ln(`v') } ************************************************************************************************************ * 1e. - Create the unbalanced panel data. Model SWT + BWH for 2013/14 to 2015/16 and SWB for 2016/17 and 2017/18 drop if companycode=="SWB" & year==2014 drop if companycode=="SWB" & year==2015 drop if companycode=="SWB" & year==2016 drop if companycode=="SWT" & year==2017 drop if companycode=="BWH" & year==2017 drop if companycode=="HDD" drop if companycode=="SVE" drop if companycode=="SVH" ************************************************************************************************************ * 2. SET UP GLOBALS FOR REGRESSIONS ************************************************************************************************************ *2a. OTHER (TOTEX MINUS DEBT ) COSTS /*i Totex less debt per household = proportion of dual households + proportion of metered households */ global regROC2 lnsOC_hh /// hhdu_hh hhm_hh /// /*ii Totex less debt per household = proportion of dual households + proportion of metered households + number of connected households */ global regROC4 lnsOC_hh /// hhdu_hh hhm_hh lnhh_t /// *2b. BAD DEBT RELATED COSTS /* i. bad debt per household = average bill size + probability of default */ global regRDC1 lnDC_hh /// lnrev_hh eq_lpcf62 global regRDC10 lnDC_hh /// lnrev_hh counciltax global regRDC20 lnDC_hh /// lnrev_hh incomescore totalmigration *2c. TOTAL COSTS /* i. totex per household = proportion of metered households + average bill size + probability of default */ /* ii. totex per household = proportion of metered households + average bill size + probability of default + number of households*/ global regRTC3 lnsTC_hh /// lnrev_hh hhm_hh eq_lpcf62 /// global regRTC4 lnsTC_hh /// lnrev_hh hhm_hh eq_lpcf62 lnhh_t /// global regRTC8 lnsTC_hh /// lnrev_hh hhm_hh incomescore lnhh_t /// global regRTC11 lnsTC_hh /// lnrev_hh hhm_hh counciltax ************************************************************************************************************ * 3. - REGRESS AND EXPORT RESULTS ************************************************************************************************************ *3a. - Regs global MODELS ROC2 ROC4 /// RDC1 RDC10 RDC20 /// RTC3 RTC4 RTC8 RTC11 *3b. - Pooled OLS (pooled OLS regressions have been added for reference) foreach v in $MODELS { /* Regs */ eststo ols`v' : reg ${reg`v'}, vce(cluster id) /* R2 adjusted */ estadd scalar Adj_R_squared = e(r2_a): ols`v' /* VIF test */ vif estadd scalar VIF_statistic = r(vif_1): ols`v' /* RESET */ ovtest estadd scalar RESET_P_value = r(p): ols`v' /* Econometric_model title */ estadd local Estimation_method = "OLS": ols`v' } *3c. - Random effects foreach v in $MODELS { /* Regs */ eststo re`v' : xtreg ${reg`v'}, re vce(cluster id) /* Yhats */ predict yhatre`v' , xb /* Resid */ predict ere`v' , ue /* RESET - the following lines calculate the RESET test /// as Stata does not have a command for random effects models */ quietly sum yhatre`v' g nmlz_xb`v' = (yhatre`v'-r(mean))/r(sd) g xb2`v' = (nmlz_xb`v')^2 g xb3`v' = (nmlz_xb`v')^3 g xb4`v' = (nmlz_xb`v')^4 xtreg ${reg`v'} xb2`v' xb3`v' xb4`v' test xb2`v' xb3`v' xb4`v' estadd scalar RESET_P_value = r(p): re`v' /* R2 adjusted */ estadd scalar Adj_R_squared = e(r2_o): re`v' /* Econometric_model title */ estadd local Estimation_method = "RE": re`v' } ************************************************************************************************************ * 4. - SAVE RESULTS ************************************************************************************************************* *4a. - Export results estout * using "O:\OFWSHARE\PR19 Modelling\Live models\Cost Assessment\Stata files\RR coefficients test.xls", replace cells(b(fmt(10))) estout * using "O:\OFWSHARE\PR19 Modelling\Live models\Cost Assessment\Stata files\RR coefficients test2.xls", replace /// cells(b(star fmt(3)) /// p(par({ }))) /// starlevels( * 0.10 ** 0.05 *** 0.010) /// stats ( /// Estimation_method /// depvar /// Adj_R_squared /// VIF_statistic /// RESET_P_value /// N /// ) /// mlabels(,titles) eststo clear *4b. - Export time and date of run g run_time_stamp = "$S_TIME" g run_date_stamp = "$S_DATE" export excel run_date_stamp run_time_stamp using "O:\OFWSHARE\PR19 Modelling\Live models\Cost Assessment\Stata files\RR coefficients datestamp.xls", cell (A1) sheetmodify firstrow(variables) export excel run_time_stamp using "O:\OFWSHARE\PR19 Modelling\Live models\Cost Assessment\Stata files\RR coefficients datestamp.xls", cell (B1) sheetmodify firstrow(variables) *************************************************************************************************************