An introduction to practical data privacy and security

30 October, 2022

Introduction

This post describes a series of practical actions you can take to increase the privacy and security of yourself and your data. It's not a comprehensive list, but provides a good starting point for ensuring that your data is safe, secure and recoverable.

There are usually several viable alternatives to the specific software or workflows that I outline here, and I'm not affiliated with any of them. Choose what makes the most sense for you.

Finally, I'm not going to dive into too much detail, but rather outline the high-level reasoning behind each component. I hesitate to use the term, but... do your own research if you're interested in learning more.

Threat model

Determining your threat model is the first prerequisite to designing a data privacy and security model. Basically, you ask yourself a few questions:

What am I trying to protect against?
What are the realistic threats against my security and privacy?
How much inconvenience am I willing to trade for increased security and privacy?

Your threat model is likely to be different to mine, which is likely different to that of a politician, a political dissident, a sportsperson, and so on. You don't need to define it explicitly; just keep in mind that there are always trade-offs, and you may want to be more or less strict than someone else in certain areas—there are no wrong answers, as long as you acknowledge the risks.

I believe that for most people, realistic risks include identity fraud (and associated financial or personal damages), compromised online accounts (including financial accounts, personal communications), and other data breaches. It's very easy to ignore these risks, but it's also pretty easy to implement processes that greatly reduce the probability that they occur. At a minimum, we just want to become a slightly harder target than the next person!

Password manager

A password manager, secured with a strong password and multifactor authentication, is the single most important component of your data security model. A password manager ensures that each password you use is unique, which in turn means that if one of your passwords is compromised, all of your other accounts are unaffected. They also generate long passwords that are difficult to brute force.

Your password for your password manager should be a passphrase, like the infamous correct horse battery staple. (Please check out that link if you haven't seen it before.)

Side note, for interested nerds, check out the latest NIST password standards. Some fun takeaways to pass along to your workplace's IT department: Regular password resets hurt more than they help; reset clues should be avoided; and no, you shouldn't require at least one uppercase letter, number, and punctuation symbol. Understanding what makes a strong password is a solved problem, but very rarely do organisations implement best practice in this area.

Bitwarden is an open-source password manager with a fully functional free tier, as well as a very cheap premium option. It has phone and computer applications (including a CLI if you're that way inclined), as well as browser integration.

Personally, I avoid using browser integrations or add-ons in general, and I don't use a password manager browser integration. While the risk is still low, I prefer to avoid relying on JavaScript for anything security critical.

On a related note, you also have the option of saving your password in your browser. However, not all passwords you have will need to be entered into a browser. So for that reason, the reason above, and for the sake of simplicity, I recommend you use a single password manager and do not allow the browser to save any password.

Multifactor authentication

Types

Secure your password manager, email account and anything vaguely valuable (such as bank logins) with multifactor authentication. This means that even if your password is compromised, the attacker needs to also have access to your second factor, which typically generates a time-based one-time password (TOTP).

The gold standard here is a physical hardware token, like the Yubikey. This requires you to physically possess a piece of hardware to authenticate yourself. The downsides of physical tokens are that they are more expensive than application-based tokens, and can be lost. But they are certainly the most secure multifactor authentication devices generally available.

There are a few standard options for application-based multifactor authentication, including Authy. Google, Microsoft, etc. also have fairly standard multifactor authentication applications.

Avoid SMS multifactor authentication, because it's extremely insecure. It's generally trivial for a malicious actor to implement a SIM-swap attack to gain control of your phone number, and hence any multifactor authentication codes sent to it.

Recovery

The extra security afforded by using multifactor authentication comes with a cost: There's now an additional critical component for which you must be able to recover access if something goes wrong (such as your phone breaking).

For that reason, I recommend you install your multifactor authentication application on at least two devices if possible: your phone and your computer. That way, if one of them is inaccessible, you can still access your accounts using the other.

Failing that, you are generally provided with a set of recovery codes when you implement multifactor authentication on an account. Here, you're presented with another trade-off: These codes may be critical for recovering your account, but they are also a huge security risk if they become compromised. You'll have to decide where to store them.

The best-case scenario is you can store them encrypted, but in an easy-to-access (and fully backed-up) place. Failing that, good old-fashioned printing and storing in a safe in your house is a pretty reasonable method.

Email

As the saying goes, if you aren't paying for it, you aren't the customer—you're the product. A Gmail account is free because they make their money by selling (some of) your data to advertisers. Your emails are not encrypted, and they can be read by Google staff and stored in perpetuity on Google servers. If you're serious about data security and privacy, you should be using a paid, end-to-end encrypted email provider, like Proton or Tutanota.

A quick note on Google. If you're only interested in security, rather than privacy, then using Google services is a fine idea. I think the general consensus is that Google products are very secure—just not very private.

Domains

This one's a little more niche, for sure, but it's actually relatively easy to buy a domain and create an email address using that domain. (For example, some-email@heds.nz.) One of the main advantages of owning your own domain is that you can never lose access to your email address. You may have heard horror stories of people (or businesses) suddenly and irreversibly losing access to their email or social media accounts. If you own your own domain and email address, then even if your email provider closes your account, you can just switch providers, and keep the same email address.

Data backup

Phew, okay, we've saved the biggest topic for last. At this point, a quick reminder to think about your threat model, because that's pretty important when designing your data backup strategy.

Perhaps your threat model is not concerned with your personal data being private, in which case you may be perfectly happy syncing all your data to Google Drive or some equivalent. At the other end of the spectrum, your threat model may include active surveillance by a state-sponsored entity, in which case your strategy will lean heavily toward encryption, at the cost of convenience and possible data redundancy.

My threat model seeks to minimise the chance that my personal data is leaked, while ensuring that it can be restored in the case of fire, theft or other disaster. The most likely risk to my data security and privacy is identity fraud or a data breach caused by a compromised account. So, my strategy is a compromise between security and practicality. I'm pretty sure I'm not under surveillance by the NSA or Unit 8200— and if I were, well, I'm probably already compromised. So, in light of my threat model, my backup strategy requires encryption at rest, and multiple copies.

3-2-1

It's good practice to adhere to the standard 3-2-1 advice:

3 copies;
2 different types;
1 offsite.

One of these copies is straightforward: typically your data is stored on your computer's disk. The second copy is also often straightforward: typically an external hard drive stored in your house. The offsite copy is where you'll have to do some soul-searching.

Offsite: cloud or offline?

Maintaining an offsite backup of your data is important in case of physical disaster such as your house burning down. The offsite copy could be in the cloud, or it could be a physical, offline copy stored elsewhere (like an external hard drive stored in a safe-deposit box or at a friend's house).

Storing data in the cloud can be very safe (and convenient) if appropriate encryption is used, but in most circumstances is unlikely to be more secure than equivalently encrypted data stored offline. The threat surface of cloud storage includes any malevolent actor with an internet connection, whereas the threat surface of a physical hard drive is limited to people physically present in your location, a tiny fraction of the former.

Furthermore, if you make a mistake in your encryption, or use an old, outdated encryption algorithm that isn't very secure, there's still no issue unless your backup is physically compromised. Conversely, using a vulnerable encryption algorithm on data stored in the cloud could be a significant risk.

Finally, if someone were to illegally obtain a physical copy of your data, and also have the requisite ability to brute-force decrypt it or exploit some sort of vulnerability in the algorithm, you're probably the specific target of a sophisticated criminal or state-sponsored organisation. In which case, why are you reading this? Personally, this scenario doesn't fit in with my threat model.

The main disadvantage of an offline-first approach is that it requires more work to update and store data using physical hardware, compared to syncing with cloud storage products (which can typically be fully automated).

There is not much difference in monetary cost between the two options. When storing data offline, you have the upfront cost of (ideally at least) two external drives, perhaps $150–200 in total. When storing data in the cloud, you'll likely be paying an annual figure based on the space required (remember, if you aren't paying, you're the product...), which probably equates to the cost of the external drives spread out over several years.

If you do decide to use cloud storage, and are a bit more technically minded, I can strongly recommend rsync.net as cloud provider. They basically give you an empty Unix filesystem to do whatever you want with via SSH, connect to literally any cloud provider, and are well-known for providing excellent customer service.

Encryption

This can seem intimidating at first, but there are some really good and relatively simple options out there. My recommendation is to use something like VeraCrypt. The idea is that you create an encrypted file container on an external hard drive, and regularly back up your data into the container. Anything transferred into the container is automatically encrypted, and you need to mount the container to decrypt its contents (which are decrypted on-the-fly in RAM). They have a pretty good tutorial to get started.

An excellent alternative is to use rclone with a crypt remote. This is more complex, but allows you, for example, to set up scheduled operations that encrypt and sync data to your cloud storage provider (or a connected external drive) every night/week/whatever. Andy Ibanez has written a great post on this.

If you do go with the offline method, you'll need to determine a frequency with which to update your offsite backup, given that it can't be done automatically via the cloud. This is a trade-off between convenience and coverage. Monthly is a reasonable compromise: It's not so frequent as to be annoying, and you probably don't generate a huge amount of data in one month such that losing it would be devastating. But again, figure out what works for you.

If your computer's operating system offers full disk encryption, use it! This means if someone steals your computer (and it's turned off), all your files will be encrypted. They can't just remove your hard drive and plug it in somewhere else to recover your data.

Finally, if you do encrypt your data and use a passphrase to decrypt it, the main weak point in your system is forgetting the password. Without the password, you will not be able to decrypt your data. The same considerations for storing backup multifactor authentication codes apply here!