On March 22nd I gave a presentation to ASU's Software Developers Association which I called "What My Professors Didn't Teach Me - Developer Skills: What They Are, Why They're Important and What You Should Know About Them". The presentation provided an overview on seven different "developer skills" that I believe are necessary to succeed as a professional software developer. I define "developer skills" as those aptitudes between the "hard" technical skills of programming / hacking and the "soft" interpersonal skills of teamwork and communication. They are the things that a developer needs to know to succeed at his job outside of actually building software.
Over the next several weeks I will be writing a post on each of these proficiencies; each post will describe what the developer skill is, why I consider it to be necessary knowledge for the workplace, and what specific things you should learn. I plan to have links to tutorials or other websites that you can use to follow up.
The posts will be as follows:
Part 0: Introduction
Part 1: Version Control
Part 2: Ticketing Software
Part 3: Multi-Branch Development Workflow
Part 4: Libraries and Package Managers
Part 5: Working with Remote Computers
Part 6: Communication Between Software
Part 7: Securing Your Communication
This is the last post in our series, and it is a massive topic that really deserves its own series - that topic is, of course, security. It is a simple fact that anything you send across the wires of the Internet can be read by someone between you (the sender) and the recipient. The challenge of security is to keep you and your information safe in the face of this monumental threat.
I'm going to be talking about security from a very high level perspective, and from the point of view of someone who is developing applications, not necessarily a security / cryptography expert. I don't think that every developer needs to be an expert in security - but I do think they need to know the basics. Even a consistent application of basic security principles will go a very long way towards keeping you and your data safe. To this end, I will discuss a few of the basic techniques for securing communication (both communication with a remote computer and communication between software)
What Is This?
What does it mean for communication to be secure? Security is a big topic and there are many metrics, but the main three are as follows:
- Authentication: This means that you can verify that you are who you say you are. Client authentication means that the system you're talking to can verify your identity, while server authentication means that you can verify that the person you're talking to really is who they say they are (i.e., nobody in the middle is intercepting your communication and pretending to be the server). Note that this is different from authorization, which is the practice of verifying that you are allowed to do what you are requesting.
- Encryption: as mentioned before, once your communication goes out on the wires, it can be read by anyone listening "in the middle." Encryption is the practice of encoding your message so that it is difficult or impossible for someone reading your message to understand what you are saying if they don't know how to decrypt the message.
- Consistency: consistency means being able to confirm that a message you received was the message that was actually sent. This ensures that your message has not been altered as it traveled from the sender to the recipient. Note that both clients and servers have an interest in verifying the consistancy of each other's messages.
Basic diagram showing encryption. Image source: https://www.digicert.com/
Implementing a security protocol that features all three of these features is difficult. Fortunately, not every situation requires that you implement all three, and other situations don't require a perfect implementation of one or more metric. For example, if you are creating a web service that will respond to requests with public data, then you probably don't care about encryption or authentication - the data is public, so you don't care who knows it. Another possible situation is a web service that requires users to authenticate solely for antispam purposes - in this case, you may not need to have the most secure authentication strategy, as any authentication requirement will probably be enough to defeat most spammers. This may no longer be true, however, if your website suddenly becomes a high value target. You can see that there are a number of different considerations when looking at security.
Why Is This Important?
It may seem odd to defend the importance of security, but I'll be the first to admit that sometimes it's very easy to overlook it. Often, securing systems makes them much more complex, both for the person building the software, and for the person using it. The fact that we see major websites getting hacked weekly should remind us that security is hard; the fact that many of these websites had barely any security whatsoever, or even none at all, should remind us that it is very easy to overlook security.
With that out of the way, I will say this to justify security's importance: there is no quicker, more reliable way to destroy a company from the outside than a security breach. A security breach is not solely users losing access to your site briefly, or some data being deleted, though these are both common effects of a breach. No, a security breach - any breach at all - is a fundamental destruction of public trust in your company. Very often, this trust cannot be rebuilt. So, all developers should proceed under the following assumption: if a security breach occurs due to your negligence in implementing proper protection, you will be fired. If the breach is big enough, your name might even become permanently blacklisted in the industry.
I am not necessarily trying to scare you (maybe a little) - security is just that important. So, I beg of you - learn it, and use it. If this causes your projects to slow down, or makes it difficult for other people to work with your software, so be it. Anything is better than a security breach.
Things You Should Know
Here's the tricky part: the actual security you should know could fill a book. So, I will here describe three commonly used security techniques and describe what they do at a high level. It is your responsibility to follow up on each of these - learn more about it, try it out, and see for yourself how it works.
The first topic I'll discuss are simple encryption algorithms. As their name implies, these are simple programs that transform a human-readable message like "Hello world" into unreadable cyphertext like "SGVsbG8gd29ybGQ=". Naturally, these algorithms only provide encryption - they don't tell a recipient of an encrypted message anything about the message's consistency, nor does it authenticate the sender.
There are two types of encryption algorithms:
- Two-way encryption algorithms produce cyphertext that can easily be translated back into plain text, solely with another algorithm. base64 is a good example of a two-way encryption algorithm. If you copied the cyphertext "SGVsbG8gd29ybGQ=" into https://www.base64decode.org/, you could decode it right back into its original message. If you can do this, then the bad guys can of course do this as well, so two-way encryption algorithms are fundamentally insecure. Due to this, they are not used for any serious security measures today and are more commonly used simply to package a string of text for convenient transport (note how the base 64 text does not contain any special characters or spaces, making it nice for a URL variable).
- One-way encryption algorithms take both a message and a secret key to produce their cyphertext. This adds a layer of security - the only way to decode a message encoded with one-way encryption is to run the decryption algorithm on the message and use the same secret key as the message was encrypted with. If a different secret key is used, the output of the decryption algorithm will be unreadable. Theoretically, this means that the only people who can read the cyphertext produced by one-way encryption are those with the secret key. The sha family of hashing algorithms are good examples of one-way encryption algorithms.
I said "theoretically" above because, in practice, many one-way encryption algorithms have been "cracked" to allow people without the secret key to read hashed messages. These crackings occur when someone finds a weakness either in the mathematical function that powers the algorithm, or in the software implementation of that mathematical function. Either way, there are several one-way algorithms that are no longer considered secure after being cracked. sha1 and md5 are two well-known algorithms that are no longer secure. sha1 has been replaced with sha2 and sha3, both of which are (for now) more secure.
Most major programming languages have the major encryption algorithms built directly into the language, or available as libraries. You can use these encryption algorithms to obscure your data before sending it out to a web service, or you can build web services to require the data be encrypted in a certain way.
Basic two-way hash example. Image credit: http://www.unixwiz.net/
Private-public keypair cryptography (ppk) is an order of magnitude more complex than the aforementioned encryption algorithms. Not only does ppk-enabled communication encrypt messages, it also provides consistency to communication sent to the client. In other words, when you receive a message from a server encrypted with ppk technology, you know that the message has not been altered in transit. To some degree, ppk can also provide authentication.
How does it work? It all starts with the generation of the keys - you run special software on your computer to create both a private key file and a public key file. Both of these are simply long strings of text, but how they are used is very important. As the name indicates, the private key is a secret and should never be shared with anyone, while a public key can (and should) be sent out to anyone you wish to communicate with.
Messages sent with ppk encryption are encrypted with your public key before being sent to you. Due to the way the cryptography works, a message encoded with your public key can only be decrypted with your private key; furthermore, the only messages that can be decrypted with your private key are those that were first encoded with the public key. This means that if you can successfully decrypt a message with your private key, then you know it both was not tampered with (as this would cause the decryption to fail), and you know that the message came from you (since only you have your private key).
Due to its value, ppk is used in many places - perhaps the most well-known use is in key-protected ssh communication. You will often receive a public key the first time you attempt to ssh into a server; this is the server's public key. You can go a step further and give the server your own public key, so that the communication is safe both ways. In fact, this is the recommended method of authenticating yourself to a server - once you've given the server your public key, then it knows that if you can read messages sent with that key, then you must be who you say you are.
As with other security methods, ppk is not perfect - it can be cracked, or broken entirely if your private key is stolen. Still, it is one of the most well-used forms of security today, and it can be used not just for ssh communication but any communication. You can even build a web service that uses ppk for extra security.
Basic public-private key communication. Image credit: https://upload.wikimedia.org/
SSL / TLS:
If you've ever gone to a website that starts with "https", then you've used SSL / TLS without even knowing it. SSL / TLS is an advanced implementation of ppk with additional features that is used to encrypt, authenticate and ensure the consistency of data traveling over HTTP. SSL stands for "secure sockets layer" and TLS stands for "transport layer security".
These protocols are both very complex, too complex to explain here. I will thus mention one of the most major additions that SSL provides over plain ppk, which is the Certificate Authority (CA). A CA is a company that issues special certificates that are cryptographically guaranteed to have originated with that CA; that certificate is then included in HTTP communication from the website that owns that certificate. The value of this is extra trust - to see why, consider standard ppk private keys. While a private key may allow you to trust that the person you're talking to really is who they say they are, it tells you nothing about how trustworthy this person is. After all, anyone can easily generate a private key.
A certificate from a CA, on the other hand, is only issued after a series of investigations and background checks to ensure that the person requesting the cert from the CA is who they say they are, really does own the website they want to use the cert for, and so on. Some CAs require you to submit personal documentation before they'll issue a cert. And most certificates cost a decent amount of money. All of these things provide additional trust when communicating with the owner of a valid CA-issued cert: you know this person was investigated, and likely put some money on the table. This is generally taken to mean that the person is trustworthy enough to safely communicate with; this is why SSL generally runs automatically, without requiring you to manually verify each website you visit.
Of course, all of this is worthless unless the CA itself is trustworthy. This is verified by so-called "certificate chains" which verify not only the trustworthiness of the cert owner, but the CA who issued that cert, and the CA that verified that CA, and so on.
This is all just a small part of how SSL works; but the good news is that for most web developers, you don't really need to know how everything works under the hood - you simply need to use it! That means obtaining a certificate for any site that you want the public to use. Not only does this make you trustworthy, it provides an easy way to protect the data travelling to and from your site.
Basic SSL diagram. Image credit: http://vanish.org/
Those are the three major security protocols that I believe developers should know. In addition to knowing them and how to use them, devs need to keep up to date on their knowledge. Implementations of all three of these technologies have been cracked in different ways in the last few years. The only way to protect against this is to keep an eye on things and upgrade any insecure / cracked implementations as soon as possible. A developer's work is never done.
And with that, I conclude my series on skills that every developer should know, but won't learn in college. I encourage every developer reading this series to set a schedule for themselves to learn each of the technologies, systems and concepts mentioned herein in at least some detail. After you've learned about them, use them! Create a test project to try them out. Get to know them, and then implement them in the real world as part of your projects.
The good news is that a developer who has a good foundation in all of these skills is both a pretty good developer, and is very employable - the things mentioned here will make for a very good resume. The bad news is that this is necessary but not sufficient. The topics I've covered here are not that advanced - a good junior developer will have all of these, and a good senior developer will have all of these plus a whole lot more. So, to leave you with one more thing you probably won't learn in school: a developer can never stop learning. Your education isn't complete upon receiving your diploma. And in many ways, it's simply beginning.
To those who have read the whole series, thank you for your time; and if you have any questions, or any other topics you would like me to write about, feel free to email me about them.