ifndefJOSH's homepage

Let me start with some reassurance. jeppson.dev will never require you to make an account. My blog and homepage will always be free and open access (to humans). My homelab stuff, Nextcloud, Joplin, etc., does require an account, but no, you can’t make one. That is for my use only. Not to mention if I did open up Nextcloud to public use, I become potentially liable if, heaven forbid, someone uploads something illegal. That said, I do have the ability to see whatever anyone who uses my Nextcloud uploads, and if I did find anything below board I would go to the authorities about it. Gotta protect myself from liability after all.

But enough meandering. Let’s talk accounts.

TL;DR: avoid creating accounts if you can. Don’t re-use passwords if you do, and prefer access delegation like OAuth over actual account creation, though be wary with that, too.

OAuth verification vs. actual account creation

I’ve seen an increasing number of sites allow you to “sign in with Google” or “sign in with Facebook”. You can connect your Discord account to things, and on some sites, you can authenticate that you are, in fact, who you say you are, if you’re signed in with Outlook or GitHub, or Orcid. This is possible because of a technology called open authentication, or OAuth for short. The latest version currently is OAuth2, and you’re probably using it without even realizing it.

Still, many sites have classic account databases. You choose a username, set a password, hopefully set up a TOTP code or passkey, and bam, you’re in. Maybe OAuth is even used on first sign-in as a way to prevent bots or spammers from creating accounts but at the end of the day, your username and password (well, hopefully a salted and hashed password) are stored, along with the rest of your details, in a database somewhere.

Both OAuth/access delegation and classical account creation present different problems, and you should be wary of both, but for different reasons.

OAuth and Access Delegation

First, let’s talk about OAuth.

Essentially, OAuth is what’s called an access delegation protocol. Alice doesn’t know that the person claiming to be Bob is actually Bob, but Alice trusts Acme corporation, and Bob has an account with Acme corporation. Alice can ask Bob to verify his identity with Acme (and optionally share some information about him) with her. In practice, this all occurs directly within the browser, sometimes without even having to leave Alice’s website.

I don’t inherently have a problem with OAuth. In fact, my friend code triangle has set up a number of sites which use OAuth2 to authenticate Discord accounts. These are fun little projects he does, and the only information they read is my Discord username. Basically, he just wants to prove people are who they say they are.

But sometimes, services using OAuth can request access to more information about an account. And sometimes, they can even use it to modify your account. Google provides an API for this, since there are many legitimate uses for it, but it’s something to be as wary of, potentially moreso, than if you were just to create an account with a service.

Consider, for a second, a service that automatically backs up your phone’s photos to iCloud. I personally don’t use iCloud, but many people do, and the utility for such a service is evident. Yes, I know Apple already provides this feature, but consider that there’s something about the backup service that either works better, or is in some way preferable to the built-in solution provided by Apple. The bot would not only need to authenticate that a user is, in fact, an Apple customer with an iCloud, but it would need to have access to that iCloud, to modify it. Generally, large OAuth providers will have a page that pops up when a user tries to authorize a new service, saying something to the effect of: “Would you like to grant [SERVICE] access to your account? It will be able to read/write the following data:” of which it will then list the data.

A lot of users, unfortunately, will click through these menus as quickly as possible. Most of the time, when someone’s Instagram/TikTok/Facebook or whatever gets “hacked”, they’ve just fallen victim to something like this. Basic social engineering. Or maybe just laziness.

Lets contrive a more high-stakes example: say you manage a YouTube channel, or a very popular Instagram page, or a large Discord server. There are bots out there that you can connect to all of these services to filter out, for example, hate speech comments. Generally speaking, you don’t “sign into” these bots via vanilla OAuth (although it is still a form of access delegation), but you grant them access to the page/server/channel, which is similar in spirit. What happens if you connect a potentially malicious bot, or give it too much access?

Imagine the horrid possibilities.

As far as pros, though, most OAuth providers ensure that you can revoke access to a malicious service at any time. Google has a page in its account settings that lets you see all of the services you’ve connected, and what data they have access to. They even remind you sometimes to go “take out the trash” and revoke access to old services.

Also, access delegation reduces the chance that one of your passwords will be saved using an old hashing scheme, or, heaven forbid, in plain-text. You’re less likely to have that data in a data breach, and generally, OAuth providers are large enough to have teams of people doing cybersecurity audits.

At the end of the day, it is much safer to use OAuth than it is to create an account on every site you use. Although, that does mean Google, for example, can track every service you use them to authenticate with, and they can often glean some information about you from that.

But who among us hasn’t thought it convenient when our spying devices serve us up that oh-so-perfect ad?

User Accounts

In stark contrast, we have what sites have been doing since the dawn of the internet: letting you create user accounts.

When a site or app or service does this, your username and some information derived from your password are generally stored as an entry in a database table somewhere. The TOTP secret, which is used to generate your one-time passcodes, is there too.

While your username is stored in plain text, your password is generally stored as a hash. Hashing is like encryption, but it’s one way. You can hash the same input over and over again, and get the same output, but you ideally can’t determine what the input was from a hashed output. There are a number of possible ways to hash a password, but SHA-512, bcrypt, and blowfish are some “good enough” algorithms for now. Unfortunately, there are still sites out there using outdated/crackable hashing algorithms, like MD5, or even storing passwords in plain-text.

By the way, a good way to tell what kind of hashing a service is using is the character limit on passwords. Since the output of a hash is generally fixed-width, hashing functions can suffer from the birthday paradox. However, you want longer, more random passwords that are harder to guess. This means, you need a hashing algorithm with a large output space.

Well, technically, you want passwords with higher information entropy—a measure of uncertainty that translates into how hard your password is to guess and is also affected by things like the inclusion of common words or patterns—but generally speaking, longer passwords have higher entropy.

If you allow the input to be arbitrarily large for a short hashing output (i.e., an outdated hashing algorithm with a smaller output space), the probability of hashing collisions—where two inputs have the same hash—increases substantially. This is because you are trying to map many possibe passwords to fewer possible hashes. Modern hashes have much larger output spaces, and are therefore safer with larger character limits. A limit of, say, 15 characters on a password is a pretty good indication that something outdated is being used under the hood. A good hash should be able to support at least 64 character passwords.

Also, pure hashing presents a problem: what if two users have the same password? If you have access to all of the hashes, and two users’ password hashes are the same, you can guess that they have the same password. To solve this problem, competent services add a salt to the hash: some random information, stored with the hash, that goes into the hashing function with the password. This means that two users with the same password won’t have the same password hash.

You can’t guarantee a site is using a modern hash, or that they’re salting the hash, or that they’re even using a hash at all. If they’re not doing any one of these, their service is insecure.

But even if they are doing everything right, there is still a non-zero probability your data will be involved in a breach, and hashing isn’t an excuse to use bad passwords. Tools like hashcat and John the Ripper can perform dictionary attacks matching on hundreds of thousands of hashed passwords at once. They can even try common substitutions, $uch 4s l33t-sp34k (such as leet-speak), which is commonly used in passwords.

And that doesn’t even begin to get into the assumption you’re making that the service’s authentication system correctly restricts sensitive information to authenticated users. Or how persistent sign-on is or is not (i.e., do authentication tokens generated on sign-in expire?), or…

Well, you get the idea.

I hope I’ve made it obvious that user accounts face several, non-trivial problems. Non-exhaustively:

You are trusting that the service you create your account on is storing your information in a secure way.
You are trusting yourself not to use a password that is easily guessable, or near to your other passwords. If you change a few characters in your Website A password for Website B, and then Website B gets hacked, attackers have a much easier time getting into your account on Website A.
- Side note: if you are not using a password manager, you better start. I like offline ones such as KeePass (or KeePassXC if you’re on Linux). It makes it so the single-point-of-failure that is your master password isn’t always accessible to be poked and prodded.
You are trusting that whatever information you put into this service isn’t being sold to third parties (well, more accurately, you hope it isn’t being sold to too many third parties, or to less scrupulous ones).
By using many accounts on many websites, you are increasing the “attack surface” that you are going to be involved in a data breach. If there is a 1/100 chance that a service breaches your information, the chance that your data is in any breach increases with every service you sign up for.
You are trusting that the service isn’t leaking your information in non data-breachy ways. If an input somewhere doesn’t sanitize properly, a simple little '; SELECT * from 'users'; could leak your information to an attacker (though, something like that may be considered a full data breach).
- There was a Tom Scott video a few years back about a service that used incremental user IDs, and so you could pretty easily guess what the IDs of people who signed up around the same time as you were.

Okay, so having to create accounts on every service you use seems pretty crummy. I would honestly say avoid it if you can. If you can’t use a randomly generated password, and put in user information that remains as anonymous as possible. But sometimes, unfortunately, it is unavoidable.

The Job Market is Leaking Your Data

It seems like every job you apply for nowadays wants you to create an account on the company’s website. You’ve got a 1/30 chance of getting the job, but by golly, we’re gonna data-mine you for all you’re worth! Please put in your social security number, current address, phone number, email, and employment history. That’s not dangerous at all…

A year and a half ago (thereabouts) I applied for a highly competitve internship at some fancy New York bank. I think it was either JP Morgan or Goldman Sachs. Needless to say, I didn’t get it. Pity, too, because that salary looked really nice. But in order to apply, they made me create an account on their “Jobs” page. I had to put in my university email, indicate whether I had disabilities (I don’t), fill out information pertaining to a criminal record (I’m a law-abiding citizen, so that part went pretty quick), tell them my race/ethnicity, include all my previous employers, and a lot of other really personal stuff.

The reason I mention this anecdote specifically, is that after I didn’t get the job, I tried to delete my account. And do you know what? They said “no, you can’t do that”. I thought I was temporarily increasing the “attack surface” of my personal information, but apparently I was permanently increasing it.

Someday, I’ll write about how the US needs a GDPR.

You know, I’m not saying that potential employers shouldn’t have access to this kind of information. But I am saying that if they’re storing it, and it’s tied to an account, that’s a lot that can be leaked. Stuff like citizenship history or disabled vetran disclosures, which is required by US law, as well as background check information such as criminal records, can be used to blackmail people. If a potential employer is storing that information about someone, and the databases they’re using get breached, there is a massive danger for highly-targeted phishing, cyber-blackmail, and scamming attacks.

All of this being said, a company you’re applying for probably wants you to create an account to filter out spammers and bots. In my opinion, though, there are much better ways to do that.

Conclusion

Hopefully, you’ve found my wandering thoughts on this subject enlightening. But bear in mind that these are just my opinions. If you disagree with them, that is your right. If you think it’s totally fine to create accounts on every site you come across, and use the same password for all of them, be my guest. But don’t be surprised when you’re inevitably hacked.

Accounts, Accounts, Accounts, Oh My!

Do we really need to sign in to every website?

OAuth verification vs. actual account creation

OAuth and Access Delegation

User Accounts

The Job Market is Leaking Your Data

Conclusion