Statistical disclosure control /
"This handbook provides technical guidance on statistical disclosure control and on how to approach the problem of balancing the need to provide users with statistical outputs and the need to protect the confidentiality of respondents. Statistical disclosure control is combined with other tools...
Κύριος συγγραφέας: | |
---|---|
Μορφή: | Ηλ. βιβλίο |
Γλώσσα: | English |
Έκδοση: |
Chichester, West Sussex, United Kingdom :
Wiley,
2012.
|
Σειρά: | Wiley series in survey methodology.
|
Θέματα: | |
Διαθέσιμο Online: | Full Text via HEAL-Link |
Περίληψη: | "This handbook provides technical guidance on statistical disclosure control and on how to approach the problem of balancing the need to provide users with statistical outputs and the need to protect the confidentiality of respondents. Statistical disclosure control is combined with other tools such as administrative, legal and IT in order to define a proper data dissemination strategy based on a risk management approach. The key concepts of statistical disclosure control are presented, along with the methodology and software that can be used to apply various methods of statistical disclosure control. Examples will also be used to illustrate methods described in the book. The handbook is based upon material prepared by the leading National Institute of Statistics in Europe. The context is relevant globally, not just within the EU."-- |
---|---|
Περιγραφή τεκμηρίου: | Machine generated contents note: Preface vii Acknowledgements ix 1 Introduction 1 1.1 Concepts and Definitions 2 1.1.1 Disclosure 2 1.1.2 Statistical disclosure control 2 1.1.3 Tabular data 3 1.1.4 Microdata 3 1.1.5 Risk and utility 4 1.2 An approach to Statistical Disclosure Control 6 1.3 The chapters of the handbook 8 2 Ethics, Principles, Guidelines and Regulations, a general background 9 2.1 Introduction 9 2.2 Ethical codes and the new ISI code 9 2.2.1 ISI Declaration on Professional Ethics 10 2.2.2 New ISI Declaration on Professional Ethics 10 2.2.3 European Statistics Code of Practice 14 2.3 UNECE Principles and guidelines 14 2.4 Laws 17 2.4.1 Committee on Statistical Confidentiality 18 2.4.2 European Statistical System Committee 18 3 Microdata 21 3.1 Introduction 21 3.2 Microdata Concepts 22 3.2.1 Stage 1: Assess need for confidentiality protection 22 3.2.2 Stage 2: Key characteristics and uses of microdata 24 3.2.3 Stage 3: Disclosure risk 27 3.2.4 Stage 4: Protection methods 29 3.2.5 Stage 5: Implementation 30 3.3 Definitions of disclosure 32 3.3.1 Definitions of disclosure scenarios 33 3.4 Definitions of Disclosure Risk 34 3.4.1 Disclosure risk for categorical quasi-identifiers 35 3.4.2 Disclosure risk for continuous quasi-identifiers 37 3.5 Estimating Re-identification Risk 39 3.5.1 Individual risk based on the sample: threshold rule 39 3.5.2 Estimating individual risk using sampling weights 39 3.5.3 Estimating individual risk by Poisson model 42 3.5.4 Further models that borrow information from other sources 43 3.5.5 Estimating per record risk via heuristics 44 3.5.6 Assessing risk via record linkage 45 3.6 Non-Perturbative Microdata Masking 45 3.6.1 Sampling 46 3.6.2 Global recoding 46 3.6.3 Top and bottom coding 47 3.6.4 Local suppression 47 3.7 Perturbative Microdata Masking 48 3.7.1 Additive noise masking 48 3.7.2 Multiplicative noise masking 52 3.7.3 Microaggregation 54 3.7.4 Data swapping and rank swapping 66 3.7.5 Data shuffling 66 3.7.6 Rounding 67 3.7.7 Resampling 67 3.7.8 PRAM 67 3.7.9 MASSC 71 3.8 Synthetic and Hybrid Data 71 3.8.1 Fully synthetic data 72 3.8.2 Partially synthetic data 77 3.8.3 Hybrid data 79 3.8.4 Pros and cons of synthetic and hybrid data 88 3.9 Information Loss in Microdata 91 3.9.1 Information loss measures for continuous data 92 3.9.2 Information loss measures for categorical data 99 3.10 Release of multiple files from the same microdata set 101 3.11 Software 102 3.11.1 _-ARGUS 102 3.11.2 sdcMicro 103 3.11.3 IVEware 106 3.12 Case Studies 106 3.12.1 Microdata files at Statistics Netherlands 106 3.12.2 The European Labour Force Survey Microdata for Research Purposes 108 3.12.3 The European Structure of Earnings Survey Microdata for Research Purposes 111 3.12.4 NHIS Linked Mortality Data Public Use File, USA 117 3.12.5 Other real case instances 119 4 Magnitude tabular data 121 4.1 Introduction 121 4.1.1 Magnitude Tabular Data: Basic Terminology 121 4.1.2 Complex tabular data structures: hierarchical and linked tables 122 4.1.3 Risk Concepts 124 4.1.4 Protection Concepts 127 4.1.5 Information Loss Concepts 127 4.1.6 Implementation: Software, Guidelines and Case Study 127 4.2 Disclosure Risk Assessment I: Primary Sensitive Cells 128 4.2.1 Intruder Scenarios 128 4.2.2 Sensitivity rules 129 4.3 Disclosure Risk Assessment II: Secondary risk assessment 140 4.3.1 Feasibility Interval 141 4.3.2 Protection Level 142 4.3.3 Singleton and multi cell disclosure 143 4.3.4 Risk models for hierarchical and linked tables 144 4.4 Non-Perturbative Protection Methods 145 4.4.1 Global Recoding 145 4.4.2 The Concept of Cell Suppression 145 4.4.3 Algorithms for Secondary Cell Suppression 146 4.4.4 Secondary Cell Suppression in Hierarchical and Linked Tables 149 4.5 Perturbative Protection Methods 151 4.5.1 A pre-tabular method: Multiplicative Noise 152 4.5.2 A Post-tabular Method: Controlled Tabular Adjustment 153 4.6 Information Loss Measures for Tabular Data 153 4.6.1 Cell Costs for Cell Suppression 153 4.6.2 Cell Costs for CTA 154 4.6.3 Information Loss Measures to Evaluate the Outcome of Table Protection 155 4.7 Software for Tabular Data Protection 155 4.7.1 Empirical comparison of cell suppression algorithms 156 4.8 Guidelines: Setting up an efficient table model systematically 160 4.8.1 Defining Spanning Variables 161 4.8.2 Response Variables and Mapping Rules 162 4.9 Case Studies 164 4.9.1 Response Variables and Mapping Rules of the Case Study 164 4.9.2 Spanning Variables of the Case Study 165 4.9.3 Analysing the Tables of the Case Study 165 4.9.4 Software Issues of the Case Study 167 5 Frequency tables 169 5.1 Introduction 169 5.2 Disclosure risks 169 5.3 Methods 176 5.4 Post-tabular methods 178 5.4.1 Cell Suppression 178 5.4.2 ABS Cell Perturbation 179 5.4.3 Rounding 179 5.5 Information loss 184 5.6 Software 186 5.6.1 Introduction 186 5.7 Case Studies 188 5.7.1 UK Census 188 5.7.2 Australian and New Zealand Censuses 190 6 Data Access Issues 193 6.1 Introduction 193 6.2 Research Data Centres 193 6.3 Remote Execution 194 6.4 Remote Access 195 6.5 Licensing 196 6.6 Guidelines on output checking 196 6.6.1 Introduction 196 6.6.2 General approach 197 6.6.3 Rules for output checking 199 6.6.4 Organizational/procedural aspects of output checking 208 6.6.5 Researcher training 215 6.7 Additional issues concerning data access 218 6.7.1 Examples of disclaimers 218 6.7.2 Output description 218 6.8 Case Studies 219 6.8.1 The U.S. Census Bureau Microdata Analysis System 219 6.8.2 Remote Access at Statistics Netherlands 220 7 Glossary 225 8 Bibliography 243 References 243 Index. |
Φυσική περιγραφή: | 1 online resource. |
Βιβλιογραφία: | Includes bibliographical references and index. |
ISBN: | 9781118348215 1118348214 9781118348208 1118348206 9781118348222 1118348222 9781118348239 1118348230 1119978157 9781119978152 9781280879890 1280879890 |
DOI: | 10.1002/9781118348239 |