Some of the principles of the General Data Protection Regulation (GDPR) look nice on paper, but it can be hard to implement them.
The principle of "data minimisation", for instance, states that personal data must be "adequate, relevant and limited to what is necessary in relation to the purposes for which they are processed" (Art. 5(1)(c) GDPR). It is a principle that applies at every stage of the lifecycle of personal data: only collect the data that you need, only analyse or use the data that you need, only store the data that you need – and only as long as you actually need it (the sister principle of "storage limitation" has strong ties with data minimisation). But how can you identify what you actually need?
Likewise, the principle of "integrity and confidentiality" states that the processing must ensure "appropriate security of the personal data" (Art. 5(1)(f) GDPR), and the GDPR further mentions "the pseudonymisation and encryption of personal data" as possible measures to comply with this principle (Art. 32 GDPR). So then: what is appropriate security?
All of those data protection principles come together in the general obligation for controllers to implement "data protection by design" and "data protection by default" (Art. 25 GDPR). While the legal obligation only applies to controllers, nothing prevents them from contractually requiring their processors to comply with these principles as well. So again: what does this actually mean?
To give an indication of what these concepts mean and how to implement them, we will look at guidance from the EDPB (the European Data Protection Board), the Norwegian Data Protection Authority and ENISA (the EU Agency for Cybersecurity).
NOTE: some of the guidance discussed hereunder is technical and requires some knowledge of information security or software development/management. We have therefore summarised in general terms the guidance itself, and have included at the bottom of the article technical notes that delve further into the technical content of the guidance in question. If these areas are not your strength, we suggest sharing those sections with your information security or IT colleagues to determine how you as an organisation can make the most out of the guidance.
1. EDPB guidelines on data protection by design & by default: almost practical
The EDPB has been working on new Guidelines on data protection by design & by default ("DPbDD" in their lingo), and has published them in the context of a public consultation. Here are some of the key takeaways of these guidelines:
- Organisations need to bear the cost of implementing DPbDD
("incapacity to bear the costs is no excuse for
non-compliance"), but effectiveness of solutions is
relevant in determining what cost is indeed "necessary"
(the guidelines state that low-cost solutions can sometimes be just
as or even more effective than expensive ones);
- A risk assessment is required as part of the process for
determining the relevant measures (controllers must take into
account "the risks of varying likelihood and severity for
rights and freedoms of natural persons posed by the
processing"). This "risk-based approach" is
consistent across Articles 24 [responsibility of the controller],
25 [DPbDD], 32 (security) and 35 [DPIAs] of the GDPR, so DPIA risk
assessments are relevant for assessing risk in the context of
DPbDD. However, warn the guidelines, toolboxes should take into
account the processing at hand (i.e. you cannot simply copy over a
generic risk assessment);
- DPbDD is a continuous obligation, starting at the very
beginning, and regular re-evaluation of measures and safeguards is
required (also at the level of processors – the controller
should regularly review and assess its own processor's
operations);
- "Data protection by default" means limiting
staff's access to data and minimising processing "out of
the box";
- Retention of data "should be objectively justifiable and
demonstrable";
- While anonymisation helps limit the risk in relation to certain
processing (and is a good application of both DPbDD and the
security measures under Art. 32 GDPR), it should not be viewed as
the end of the story in terms of compliance, as re-identification
might still be a risk as techniques evolve and other datasets are
created – therefore, a regular assessment of the likelihood
and severity of risk (including the risk of re-identification) is
still required;
- The guidelines contain useful input on how to implement key
principles. For instance, universal design and accessibility are
mentioned as examples of key design and default elements (with an
explicit reference to machine readable languages) in relation to
the principle of transparency. Non-discrimination is mentioned as
an example of a key design and default element for the principle of
fairness. Drop-down menus are also mentioned as a way to improve
the accuracy of personal data.
- DPbDD is a factor to be taken into account by supervisory authorities in determining the level of fines.
The guidelines provide a useful framework for organisations to understand how the principles of DPbDD work, but there is still little in the guidelines that is immediately actionable.
2. Norwegian Data Protection Authority guidance on data protection by design: actionable checklists
While Norway is not part of the European Union, it is part of the European Economic Area (EEA) and is subject to the rules of the GDPR since July 2018 by virtue of the EEA Agreement. In this context, the Norwegian Data Protection Authority, Datatilsynet, is a member of the EDPB, although it does not have voting rights within the EDPB.
The Norwegian DPA has maintained and regularly updated over the past few years a guide on software development with data protection by design and by default, available online in English, which it prepared in cooperation with security experts and software developers.
While it focusses on software development, the Norwegian DPA's guidance is relevant in many non-software circumstances. In addition, it anticipates many of the recommendations contained in the EDPB's DPbDD guidelines, making it a useful baseline for the practical implementation of the principles of data protection by design and data protection by default.
The guidance covers seven stages or activities (training, requirements, design, coding, testing, release and maintenance), and for each of these activities the guidance includes a practical and actionable checklist.
We have included at the bottom of this article a summary of what the checklists cover in relation to each of these topics (see Technical Note 1).
As a result, the guidance can serve as a technical baseline for teams working on implementing the principles of data protection by design and by default, e.g. with the checklists as an annex to a more general (and less technical) "data protection by design policy".
3. ENISA recommendations on pseudonymisation techniques and best practices
On 3 December 2019, ENISA published a new report, "Pseudonymisation techniques and best practices – recommendations on shaping technology according to data protection and privacy provisions".
This report starts with a discussion of a number of pseudonymisation scenarios and analyses various adversarial models and attacking techniques used against pseudonymisation (e.g. brute force attack, dictionary search, guesswork). It also presents the main pseudonymisation techniques (e.g. counter, random number generator, cryptographic hash function, message authentication code and encryption) and pseudonymisation policies (e.g. deterministic, document-randomised and fully randomised pseudonymisation) available today.
The practical significance of this report comes, however, from the examination in practice of these pseudonymisation techniques in specific scenarios, in particular e-mail address pseudonymisation.
As with the guidance of the Norwegian DPA, we have included a summary of this ENISA guidance at the bottom of this article, due to its technical nature (see Technical Note 2).
The report concludes that the best approach to pseudonymisation involves applying pseudonymisation to all data values, taking the whole dataset into account and ensuring that the resulting dataset keeps only the type of utility necessary for the purpose of processing.
The report is a useful addition to the toolset of the teams involved in assessing whether and how to deploy pseudonymisation techniques, and can assist organisations in determining how best to minimise their processing of personal data while improving the security of the personal data in question.
4. Conclusion
What then does data protection by design and by default mean? In non-technical language, it means embedding data protection into the culture of the organisation and its product development process. In practice, the best way forward for technical teams involves drawing up a list of requirements for every stage of development, and the checklists of the Norwegian DPA and the pseudonymisation guidance of ENISA can serve as a useful baseline for those requirements. These requirements must also be discussed with other roles within the organisation (business, operations, legal, data protection, ...) to ensure all appropriate controls are in place.
Whatever you decide, document it properly. After all, data protection by design and by default also includes the principle of accountability, and it will also make it easier for you to demonstrate that you are indeed compliant.
***
TECHNICAL NOTE 1: summary of the checklists of the Norwegian Data Protection Authority:
- Training: the guidance recommends training on
the GDPR itself, on related legislation (e.g. e-Privacy), on
information security frameworks (e.g. ISO 27001), on the framework
for software development (e.g. Microsoft Security Development
Lifecycle), on security testing (e.g. OWASP Top 10), on threat and
risk assessment documentation requirements (e.g. Microsoft Threat
Modelling Tool). It moreover recommends differentiated training
based on individuals' roles: a basic understanding of privacy
and information security is crucial for all employees, while
developers must be competent in e.g. the topic of secure coding.
The checklist for training even includes a reference to metaphors
and mnemonics, such as XKCD's "Little Bobby Tables" cartoon.
- Requirements: Organisations should define the
data protection and information security requirements for any given
project. The checklist for requirements contains an impressively
detailed (but non-exhaustive) list of action items on e.g. what
needs to be done before the requirements are set, requirements for
meeting the principles of data protection, requirements to protect
the rights of data subjects etc. In relation to security in
general, the checklist mentions five security principles:
confidentiality, integrity, accessibility, resilience and
traceability (C, I, A, R, T). The specific security requirements
will then typically be linked to one or more of those security
principles (e.g. identification of users in the context of access
control = T; strong password requirements = C, I, A). The checklist
mentions the OWASP Application Security Verification Standards as a
useful illustration of security requirements for use in software
development, as well as ISO 27034 as an example on how to find an
acceptable level of risk.
- Design: The design-related checklist refers to
the subdivision introduced by ENISA (in its 2014 report on privacy and data protection by
design) between data-oriented design requirements
("minimise and limit", "hide and protect",
"separate", "aggregate", "data protection
by default") and process-oriented design requirements
("inform", "control", "enforce",
"demonstrate"), with practical implementation examples.
In addition, the checklist recommends (i) analysing and reducing
the attack surface of the software under development and (ii)
threat modelling, with notably a reference to the STRIDE (spoofing,
tampering, repudiation, information disclosure, denial of service
and elevation of privilege) and DREAD (damage, reproducibility,
exploitability, affected users and discoverability)
methodologies.
- Coding: The coding checklist focusses on four
main areas: (i) the use of approved tools and libraries, (ii)
scanning dependencies for known vulnerabilities or outdated
versions, (iii) manual code review and (iv) static code analysis
with security rules. The checklist includes useful recommendations
on e.g. what to include in a list of tools and libraries, as well
as examples of tools for static code analysis.
- Testing: At the testing stage, the checklist
includes general test recommendations as well as specific guidance
on security testing (dynamic testing, fuzz testing, penetration
testing or vulnerability analysis; testing in multiple instances;
automatic execution of test sets before release). In addition, the
checklist stresses the importance of reviewing the attack surface
of the software under development.
- Release: At the release stage, the focus
should lie on (i) an incident response plan, (ii) a full security
review of the software and (iii) a process involving approval of
release and archiving. In relation to the incident response plan,
the checklist sets out detailed recommendations on the life cycle
of deviations and related procedures for detecting, analysing and
verifying, reporting and handling incidents, followed by the need
for normalising (restoring management, operation and maintenance to
their normal state).
- Maintenance: In relation to maintenance, the key recommendation relates to incident response (the previous checklist expands upon this point in further detail). For the surplus, the checklist mentions topics such as continuous assessment of vulnerability detection measures, metrics etc.
TECHNICAL NOTE 2: summary of the takeaways of the ENISA report on pseudonymisation:
- Counter and random number generator techniques are considered
strong as long as the mapping table is secured and
stored separately from the pseudonymised data, but counter
techniques are deemed weaker than random number generator
techniques as they allow for predictions (due to their sequential
nature);
- Cryptographic hash functions are considered to be a weak
technique for e-mail address pseudonymisation (EAP hereunder)
because a dictionary attack is trivial (due to the number of e-mail
addresses used today);
- Message authentication code (MAC) techniques: compared to
hashing, MAC presents significant data protection advantages also
for EAP, as long as the secret key is securely stored, and as long
as no recovery is needed (i.e. de-pseudonymising data – which
is difficult in the case of MAC). MAC is also suggested as a
possible technique for interest-based display advertising, where
unique pseudonyms are used but advertisers do not need to know the
user's original identity;
- Assymetric (public key) encryption is not recommended for EAP,
because of the availability of the public key and because of the
possibility of dictionary attacks;
- Format preserving encryption allows pseudonymised data to
retain some utility (which is notably useful for EAP), but it is
important to avoid the emergence of patterns (and therefore to be
careful in the configuration of the format preserving encryption
mechanism);
- The best approach to pseudonymisation involves applying pseudonymisation to all data values, taking the whole dataset into account and ensuring that the resulting dataset keeps only the type of utility necessary for the purpose of processing.
The content of this article is intended to provide a general guide to the subject matter. Specialist advice should be sought about your specific circumstances.