At Mindgrub, the engineering team, like many, has found itself wondering just how good AI is at writing code. What do these things really know, and can they make us better, strong, and faster?
I’m here to tell you that it is pretty good, in some cases damn good. If you need a quick code snippet or find yourself wanting to convert existing code into a different language, tools like ChatGPT do a fantastic job. We’re also actively using and exploring tools like Github’s CoPilot – a development assistant that makes intelligence look antiquated. Our engineers are also investigating a metric ton of generative AI code tools like CodeWP for WordPress, AWS’s CodeWhisperer for, and X.
Generative code is impressive and quick, but is this code safe to run? After all, these AI tools open a brand new and wholly unexplored set of security concerns and unknown vectors for attacks by hackers. Most of these may not directly impact code security, but they point to the level of awareness we as technologists need to have as we explore the use of these platforms in our work lives.
Developers have already built AI tools that can brute force or perform login stuffing with an accuracy and speed that is impressive. All of this uses open-source tools like PassGAN, tools that are getting better every day. Some researchers have gotten clever and jailbroken or avoided AI safeguards to trick systems into writing code using known exploits or to write code that is used for nefarious reasons such as a DoS (Denial of Service attack). Others are creating advanced phishing systems that create highly personalized messages making truth and reality increasingly hard to differentiate.
Can we trust these types of tools to our most junior engineers or non-engineers to create code for production? Do we have a choice? We all know the reality is this is already happening and only going to increase. What we really need is to find ways to keep our AI-assisted code safe and secure.
So to help, I will first explore using a handful of these tools and give some hints along the way. I will also offer suggestions on what to look for and ways to keep the most generative code secure.
Many in my age range have joked about CoPilot being the replacement for Clippy, but both are assistants. However, CoPilot, unlike Clippy, is an assistant powered by OpenAI’s GPT artificial intelligence and trained on bajillions of code hosted by Microsoft’s GitHub. These days Github is the apparent elephant for public and private code repositories. It is also home to an incredible amount of open-source projects. If schools are code repositories, GitHub is Xavier’s school for the gifted or Hogwarts, without the riff-raff.
CoPilot integrates into many popular IDEs, such as IntelliJ and Visual Studio, to extend intelligence or auto-suggestion feedback. For most, it will feel like you’re getting a quick suggestion based on context – but these aren’t old-school suggestions. Often, you will find that CoPilot will have suggestions that are entire functions vs. finishing a line or two of code.
In this, CoPilot and tools like ChatGPT can be very different. CoPilot feels more like an assistant or peer programmer offering thoughts along the way. You pick and choose, but the architecture and direction of development are still very much you.
As a code generator, the results are mixed. In 2021 a DevSec engineer reviewing early results provided multiple examples of code that were prone to suggest code with several security issues. My experience is mixed. I’ve witnessed code snippets with SQL injection vulnerabilities or other minor problems. The more significant concern IMHO was not the quality of the code but the speed at which I accepted that the code would do what I anticipated.
OpenAI’s ChatGPT is what we now refer to as AGI or Artificial General Intelligence. For example, GitHub’s CoPilot has been primarily trained or concrete knowledge bases around programming and code giving it intelligence that is limited to a very particular realm. In short, it’s like a toddler who can tell you everything about Pokemon and nothing about the general makings of our world.
AGI makes writing code more of a hobby for ChatGPT but also gives it the ability to be a bit more creative in how it answers questions. It can add data from its general knowledge like we do to come to sometimes surprising conclusions.
ChatGPT, as a development tool, is an excellent starter. It excels at transforming example code into your preferred programming language. It can take code snippets and re-write them with additional features or adjustments. It also does a fantastic job of creating starter applications.
Weirdly the code response differed. Each prompt could generate a wildly different answer. My first attempt at my prompt displayed an application that hard-coded the database password and showed a noticeable lack of validation. That request failed to complete as if the AI hit a point that it knew invalidated the previous response. By not updating my prompt and allowing it to regenerate, I got a much better response that essentially fixed issues without me editing or asking.
Like Github’s CoPilot, the results came back mixed, but most of my code from ChatGPT required a bit more knowledge and cleanup to run. For example, ChatGPT suggested a make file and a SQL script for my user database table but did not help me actually do the task. It was much more of an accelerator, requiring me to reimplement a lot of what it provided.
How do we keep it secure?
AI generative code tools are super accelerators. These tools are also trained on our own lousy code and suffer from the human mistakes we are all prone to. For junior engineers and non-engineers alike, these tools provide incredible power, but they will not mean the end result is better, more secure, or better quality.
So how do we keep them secure? We do what we should have (or what we are already doing).
First, let’s keep following the best practices of software development. If you have a team, make sure you enforce peer reviews and merge reviews. As the saying goes, measure twice and cut once – more eyes and especially those of a lead or senior engineer, will only make your code better.
Second, good unit tests and code coverage are some of the best checks a developer can put in place. Unit tests require the engineer to understand the expected results of the code they write and to verify that the code reacts as anticipated. By requiring larger code coverage, our engineers can use more generative code, but we can safeguard the upper and lower limits of these operations with these tests.
Liscense and dependency management can accidentally pop into code when using ChatGPT it’s not uncommon for it to recommend libraries and incorporate those libraries into a larger code base. For production code, this can unexpectedly force code to accept a GPL license, open, sourcing a chunk, or introduce vulnerabilities in an older library. These days we can add analyzers to our CI/CD pipeline that check and warn for these scenarios and reduce unexpected risks.
Other tools in the CI/CD pipeline also open the door can also safeguard against bad code quality:
• Lints and code syntax checks help maintain code conformity and check for common mistakes in a language. These same tools can scan for passwords checked inline to a code repository and reject code not in the company’s agreed-upon format.
• Many companies offer code security analyzers that look for the common mistakes and prevent developers from
• Static code testers scan the executable binary generated from an application for
If you still find yourself adament that your company or project is not ready for generative AI, you can also look into several tools that help detect AI-generated code. As a warning, this can be a bit of an arms race. As new AI tools improve, the detectors will take time to adapt and identify the latest version of GPT or CoPilot.
For many of our dev shops AI will introduce a new wild card in how we build things – but that wild card can be a great accelerator that increases productivity and helps make junior engineers bigger contributors to production projects. Embracing the unknown can be scary, but with the proper safeguards in place, we can create a secure environment where our teams can thrive.
Reading about Anker’s recent security issues has been interesting. In reading I came across this great comment on The Verge’s article :
“why did this happen at all when Anker said these cameras were exclusively local and end-to-end encrypted?” and “why did it first lie and then delete those promises when we asked those questions?”
As a software developer, I can tell you with about 95% certainty what happened. The Anker software team screwed up and didn’t know about this security hole. They didn’t test this scenario. They just didn’t know. They probably don’t have enough security engineers and checks. It’s probably not a huge company.
As for the lies, the Anker PR/marketing people you talked to have no clue. They are probably just fed information from whoever in the company. They probably didn’t “lie”. Maybe the engineers were still investigating and weren’t sure so they told them that “no chance there’s a security hole”. Maybe a dev manager wanted to cover his/her ass and said, “there’s no way we didn’t test this”. Whatever the case, there’s a gap between reality (i.e. source code) and how the product that the marketing team is responsible for selling (welcome to software development!).
So yes… it’s fun to think of conspiracy theories like the Chinese government ordering Anker to leave a backdoor so that it could keep an eye on the front porch of Americans… but Occam’s razor chalks this up as careless software development and unresponsive marketing/PR(likely both a result of being a small’ish company).
This. Yes, this right here is in my own personal belief the true reality of the situation.
Mindgrub is not a huge company, but we spend a lot of time focused on the processes we need to create secure and scalable applications. We manage to do this because we are an engineering team of scale, and that requires us to set rules from branching strategy to mature continuous integration policies that our engineers can embrace as they move from project to project.
These processes are pretty good but nowhere near perfect, and I can tell you that the way we build applications is light years beyond many organizations I have worked with in the past.
Why? Because many best practices collapse when not run at scale. A singular developer can not peer review his or her own code. When you take any 2-4 person internal development shop 95% of the time you find cowboy coding happening on a regular basis. As all humans, we all make mistakes regardless of how amazing we may be as a singular developer. I can’t begin to tell you how often under a basic audit of code, infrastructure, or process that it becomes immediately obvious that this approach has created a technical debt of huge magnitudes.
What is more common in almost every one of these situations is a rift between the appointed chief engineer(s) and other teams like marketing and sales. Terms you may hear are this is what the customer really wants, we had to build it this way, or if you only understood how it worked.
Keep an eye out friends.
Every year all Mindgrub employees are required to complete our annual security training. This year we switched it up and moved to the well-received KnowBe4 training curriculum.
Watching and completing the ~45 min eLearning session seemed a bit surreal this holiday season. After all, LastPass completely failed, a house representative-elect lied about everything, and Anker was caught lying about its local-only cameras actually connecting to the cloud. All this without mentioning the many issues still circulating FTX being hacked and its founder running a billion-dollar company with little to know processes in place.
It really makes you stop and realize how hard it increasingly is to keep yourself safe. It’s one thing when we know we need to protect ourselves from those we might label as unsavory, but it becomes much more difficult to protect ourselves from the entities that we expect to protect us.
When I arrived at Mindgrub we made heavy use of LastPass. While we liked the tool, we found it lacked certain enterprise features we wanted and migrated to a different enterprise password manager. That tool is the password manager that, combined with our security processes, helps us limit access to only those who need it while also preventing team members from sharing passwords as text in tools like Slack or email.
Having a tool like LastPass hacked to a point that so many are at the mercy of a master password that now is a gatekeeper that hopefully can survive brute force attacks is a pill that is difficult to swallow. LastPass’s customers did everything right and trusted a company whose charter is securing your data better than your own.
The thing is, LastPass is just the most recent of these types of companies to let us down. Y’all remember Equifax, YouTube, Facebook, Marriott, Verizon, …? What is crazy is this is the list we know, and having spent decades working with security specialists, I can absolutely promise you that a very small percentage of companies ever publicly report most security incidents.
What we are facing is the reality that security is a team sport, and heck, maybe a village or country-wide sport. You or I can do everything correctly, however, as has been the case our entire lives, we all have dependencies on people, products, businesses, or governments, and we are all susceptible to the weakest link in this list. Just one chink in our combined armor, and the impacts are tremendous.
So consider this a reminder for all of us to keep being serious about the importance of security in our lives. Be diligent and make sure that we hold our IT and development teams to the security standards we expect of ourselves. Are you a developer? Find a security framework and make sure you and your team follow it.