Development and external validation of a logistic regression derived algorithm to estimate a 12-month post open defecation free slippage risk

Keywords: Chiefdom, CLTS, Community Led Total Sanitation, District Health Information Software (DHIS2), Prognostic model, open defecation free status (ODF)


Appropriate open defecation free (ODF) sustainability interventions are key to mobilising communities to consume sanitation and hygiene products and services that enhance quality of life and result in embedded behavioural change. This study aims to develop a logistic regression derived risk algorithm to estimate the risk of the loss of ODF status over a 12-month period, and to externally validate the model using an independent data set. ODF status loss occurs when one or more toilet adequacy parameters is no longer present for one or more toilets in a community. Data collected in the Zambia district health information software for water sanitation and hygiene management was utilised in this study. Datasets for the Chungu and Chabula chiefdoms were selected for this study. The data was collected from the date of attainment of ODF status (October 2016) for a period of 12 months until September 2017. The Chungu chiefdom data set was utilised as the development data set whilst the Chabula chiefdom data set was utilised as the validation data set. Data was assumed to be missing at random and the complete case analysis approach was used. The events per variables were satisfactory for both the development and validation data sets. Multivariable regression with a backwards selection procedure was used to decide candidate predictor variables with p values less than 0.05 meriting inclusion. To correct for optimism, the study compared amount of heuristic shrinkage by comparing the model’s apparent C-statistic to the C-statistic computed by non-parametric bootstrap resampling. In the resulting model, an increase in the covariates ‘months after ODF attainment’, ‘village population’ and ‘latrine built after CLTS’, were all associated with a higher probability of ODF status loss. Conversely, an increase in the covariate ‘presence of a handwashing station with soap’, was associated with reduced probability of ODF status loss. The predictive performance of the model was improved by the heuristic shrinkage factor of 0.988. The external validation test confirmed good prediction performance with an area of 0.85 under the receiver operating characteristic curve and no significant lack of fit (Hosmer-Lemeshow test: p = 0.246). The results of this study must be interpreted with caution in context where ODF definitions, cultural and other factors are different from those described in the study.

Author Biography

Warren Mukelabai Simangolwa, SNV Netherlands Development Organisation
Warren M.S; co-author to the chapter “CLTS and sanitation marketing: aspects to consider for a better integrated approach” in the book sustainable sanitation for all: experiences, challenges and innovations, is a seasoned innovative WASH systems strengthening design, rural-urban & peri-urban faecal sludge management value chain, supply chains, inclusive business, sanitation marketing, inclusive technology development, governance & policy, infrastructure, emergency & fragile regions, CLTS for rural and peri-urban and last mile approaches researcher and practitioner


Bennett, D. A., 2001. How can I deal with missing data in my study?. Australian and New Zealand Journal of Publich Health, 25(5), pp. 464-469.

Bodner, T., 2008. What Improves with Increased Missing Data Imputations?. A Multidisciplinary Journal, 15(4), pp. 651-675.

Bongartz, N. V. a. P., 2016. Going beyond open defecation free. In: N. V. a. J. F. (. P. Bongartz, ed. Sustainable Sanitation for All: Experiences, Challenges, and Innovations. Rugby: Practical Action,, pp. 1-3.

Collins, G. S., Ogundimu,, E. O. & Altman , D. G., 2016 . Sample size considerations for the external validation of a multivariable prognostic model: a resampling study. Statistics in Medicine, 35(2), p. 214–226..

Crocker, J., Saywell, D. & Bartr, J., 2017. Sustainability of community-led total sanitation outcomes: Evidence from Ethiopia and Ghana. International Journal of Hygiene and Environmental Health, 220(3), p. 551–557.

CSO, 2010. 2010 census of population and housing, Lusaka: The Central Statistical Office .

Habbema, J., EW., S., MJ., E. & FE., H. J., 2001. Prognostic modeling with logistic regression analysis: in search of a sensible strategy in small data sets. Med Decis Making, Volume 21, p. 45–56.

Harrell, F. & Mark, D., 1996. Tutorial in Biostatistics: Multivariable prognostic models: issues in developing models, evaluating assumptions and adequacy and measuring and reducing errors. Statist Med, Issue 15, pp. 361-387.

Hendriksen, J., GJ, G., Moons KGM, K. & De, G., 2013. Diagnostic and prognostic prediction models. J Thromb Haemost, (Suppl. 1)(11), p. 129–41..

Heymans, M., 2015. PROGNOSTIC AND DIAGNOSTIC MODELS. quality handbook that was developed by the EMGO+ Institute for Health and Care Research, Volume V2, pp.

Hosmer, D. W. & Lemeshow, S., 2000. Applied Logistic Regression. Second ed. New York: John Wiley & Sons, Inc..

Hutton, G. & Haller, L., 2012. Global costs and benefits of drinking-water supply and sanitation interventions to reach the MDG target and universal coverage, Geneva: World Health Organisation.

Jinks, R. C., 2012. Sample Size for Multivariable Prognostic Models. [Online]

Available at: [Accessed 3 March 2018].

JMP, 2017. Progress on Drinking Water, Sanitation and Hygiene: 2017 Update and SDG Baselines, Geneva: World Health Organization (WHO) and the United Nations Children’s Fund (UNICEF), 2017.

Kang, H., 2013. The prevention and handling of the missing data. Korean Journal of anesthesilogy, May, 64(5), p. 402–406.

Kanyamuna, B. M., 2010. The Impacct of Implimentating the D-WASHE Programmes in Chanyaya Community-Kafue District, Zambia: What role has National Water Policy (1994), played?. [Online]

Available at: [Accessed 18 April 2018].

Khale, M. & Ashok , D., 2008. The impact of rural sanitation on water quality and waterborne diseases, London: L. Mehta and S. Movik Shit Matters: The Potential of Community-Led Total Sanitation, Practical Action.

Lixil, WaterAid Japan & Oxford Economics, 2016. The true cost of poor sanitation, Tokyo, Japn: Lixil.

Lungu, C. & Harvey, P., 2009. Multi¬sectoral decentralized water and sanitation provision in Zambia: Rhetoric and reality. Addis Ababa, Ethiopia, WEDC, p. PAPER 124.

Mara, D., Lane, J., Scott, B. & Trouba, D., 2010. Sanitation and Health. PLoS Med, 7(11), p. e1000363.doi:10.1371/journal.pmed.1000363.

Markle, L. et al., 2017. A Mobile Platform Enables Unprecedented Sanitation Uptake in Zambia. PLoS Negl Trop Dis, 11(1), p.

Mukaka, M. et al., 2016. Is using multiple imputation better than complete case analysis for estimating a prevalence (risk) difference in randomized controlled trials when binary outcome observations are missing?. Trials, 17(341), pp. doi: 10.1186/s13063-016-1473-3.

Munkhondia, T., Simangolwa, W. M. & Maceda, A. Z., 2018. CLTS and sanitation marketing: aspects to consider for a better integrated approach. In: N. V. a. J. F. Petra Bongartz, ed. Sustainable Sanitation for All Experiences, challenges, and innovations. Rugby, UK: Practical Action Publishing Ltd, <>, pp. 100, 101.

NRWSSP, 2011. ODF Certificatio Procedure: National Rural Water and Sanitation Programme , Lusaka: Ministry of Local Government, Housing, Ealry Education and Environmental Protection.

Odagiri, M. et al., 2017. Enabling Factors for Sustaining Open Defecation-Free Communities in Rural Indonesia: A Cross-Sectional Study. International journal of environmental research and public health, 14(1572), pp. 15-18.

Pavlou, M. et al., 2015. How to develop a more accurate risk prediction model when there are few events. BMJ, 351(h3868), p. doi: 10.1136/bmj.h3868.

Peng, C.-Y. J. H. M. L. S.-M. &. E. L. H., 2006. Advances in missing data methods and implications for educational reseach. In: I. S. S. (Ed.), ed. Real data analysis . Greenwich : Information Age., p. 31–78).

Prüss-Üstün, A., Bos, R., Gore, F. & Bartram, J., 2008. Safer water, better health: costs, benefits and sustainability of interventions to protect and promote health, Geneva: World Health Organization.

Royston, P., Moons, K., Altman, D. & Y., V., 2009. Prognosis and prognostic research: developing a prognostic model. BMJ, p. 338.

Sharmani , B. et al., 2013. Impact of Indian Total Sanitation Campaign on Latrine Coverage and Use: A Cross-Sectional Study in Orissa Three Years following Programme Implementation. PLoS ONE, 8(8 ), p. e71438. doi:10.1371/journal.pone.0071438.

Shivanarain, S. & Nancy, B., 2015. Sustainability of ODF Practices in Kenya. About the UNIC EF Eastern and Southern Africa Sanitation Learning Series, November , WASH field Note(November ), p.

Sinha, A. et al., 2017. Assessing patterns and determinants of latrine use in rural settings: A longitudinal study in Odisha, India. International Journal of Hygiene and Environmental Health, 220(5), p. 906–915.


SNV, 2018. SSH4A results programme endline brief [Practice Brief]. Kasama: Netherlands Developemnt Organisation.

Spratt, M. et al., 2010. Strategies for Multiple Imputation in Longitudinal Studies. American Journal of Epidemiology, 172(4), p. DOI: 10.1093/aje/kwq137.

Steyerberg, E. W., 2009. Clinical Prediction Models. A Practical Approach to Development, Validation, and Updating. Springer.

Thomas, A., 2016. Strengthening post-ODF programming:reviewing lessons from sub-Saharan Africa. In: P. Bongartz, N. Vernon & J. Fox, eds. Sustainable Sanitation for All Experiences, challenges, and innovations. Rugby, UK: Practical Action Publishing <>, p. 84.

Tyndale-Biscoe, P., Bond , M. & Kidd, . R., 2013. ODF Sustainability Study, FH Designs: Plan International.

UCLA, 2018. MULTIPLE IMPUTATION IN STATA. Institute for Digital Research and Education, 13 Accessed in March, p.

UNICEF, 2015. Sustainability of ODF Practices in Kenya. UNICEF Eastern and Southern Africa Sanitation Learning Se ries, November, p. 5.

Van Houwelingen, H. C. & Le Cessie, S., 2001. Shrinkage and Penalized Likelihood as Methods to Improve. Statistica neerlandica, 21 December, pp. 17-34.

Van Minh, H. & Hung, N.-V., 2011. Economic Aspects of Sanitation in Developing Countries. Environmental Health Insights, 5(, p. 63–70.

Vergouw, D. et al., 2012. Missing Data and Imputation: A Practical Illustration in a Prognostic Study on Low Back Pain. Journal of manupulative and therapeutics, 35(6), p. 464–471.

WHO, 2009. World Health Organization . Global health risks: mortality and burden of disease attributable to selected major risks, Geneva: World Health Organization.

Wikipedia contributors, 2018. Lake Bangweulu. Wikipedia, The Free Encyclopedia, 26 April, pp. Retrieved 19:20, May 25, 2018, from