Adopting AI for software development (Part 2): where AI tools fall short

In Part 1 of this article, we explored the benefits of adopting AI in software development, highlighting the most popular tools based on insights from our technical experts, industry research, and an analysis of feedback from 2,000 Reddit developers.

In Part 2, we’ll examine the potential downsides of using AI tools in development, including data security risks, concerns about system transparency, and their impact on DevOps metrics.

Introduction

Despite the widely discussed advantages of AI tools across many areas of software development, recent research reveals a steadily growing wave of skepticism. According to the 2024 study “I Don’t Use AI for Everything,” the majority of surveyed developers consider AI tools unreliable, citing poor reasoning capabilities, high hallucination rates, and a tendency to “memorize” rather than “generalize”.

The 2024 Stack Overflow global survey expands on this sentiment: only about 43% of developers say they trust their output. The percentage of those with a generally positive attitude toward AI tools also dropped, from 77% in 2023 to 72% in 2024. Additionally, 45% of professional developers believe AI tools don’t perform well on complex tasks.

Analysts and research institutions share these assessments. While AI can accelerate development and reduce cognitive load in low- to mid-level tasks, its uncontrolled use, especially without architectural constraints or clear guidelines, often leads to mounting technical debt and other downstream issues. This risk is particularly pronounced in scalable systems or when AI-generated code is deployed without rigorous human oversight.

Many developers share these concerns on Reddit, citing real-world issues with the quality, maintainability, and long-term impact of AI-generated code.

In the following sections, we’ll take a closer look at their assessments.

Considering AI but have doubts?

Get in touch

Software insecurity from overlooked code errors and data leak risks

In 2024, nearly 80% of the 1,000 CIOs surveyed by Gartner reported concerns about security risks associated with adopting AI in software development. The shift in sentiment is significant: just a year earlier, only 23% were classified as AI skeptics.

Among the most pressing security issues cited in the new report were the challenges of managing AI use responsibly (72%) and data privacy (71%) – the latter being particularly relevant in light of the well-known Samsung case. Many also flagged a growing shortage of qualified talent capable of using AI effectively.

A research team at Stanford University, meanwhile, found that participants using AI assistants were significantly more likely to write insecure code than those in the control group in four out of five assigned tasks. Most of the AI users also overestimated the security of the code they produced.

McKinsey's survey: share of gen AI outputs reviewed before usage

Meanwhile, Deloitte reports that current DevSecOps frameworks show a marked lack of preparedness for AI integration, particularly in regulated and risk-sensitive domains such as healthcare and finance.

To illustrate, Reddit user ShelBulaDotCom posted a strongly worded comment about AI-powered low-code development. She described the ongoing need to fix security flaws in such products, including numerous cases where passwords were stored in plain text on the client side. One real-world example involved a Facebook login button that didn’t actually authenticate users but instead collected usernames and passwords for over a year. “High time for data theft, though!” the commenter concluded.

Reddit discussion: will low-code AI democratize AI or lower software quality?

“I have yet to see a copilot spit out stuff that doesn’t need to be examined carefully. About 50% of the time, it needs correcting,” wrote another user, tazebot, in a discussion titled “AI Is Making Us Worse Programmers.” “The biggest problem I see is that it produces code that at first appears very credible, tempting inexperienced coders to just ‘take it’.”

Comment on Reddit: GitHub Copilot risks for coders without experience

The imbalance between simplicity and complexity in AI-generated code

AI tools can produce code that is either overly simplistic or excessively complex, inconsistent, and non-standard. Attempting to automate such “difficult” code rapidly can lead to both technical debt and overengineering, which may weigh down the codebase in the long term. As these codebases grow exponentially, they risk compromising system stability under increased load, ultimately leading to financial losses.

Overly simplistic AI-generated code, lacking the necessary “margin of safety,” also poses hidden risks that tend to surface under stress. For example, code that performs well on local machines may fail in production environments under high user loads and frequent API calls, potentially resulting in outages.

According to Deloitte’s report, the success rate of AI-generated code is closely tied to prompt complexity: simple prompts succeed about 90% of the time, while complex ones drop to around 42%. Less experienced developers are less likely to recognize this imbalance compared to their more seasoned peers. Several studies have also noted a decline in the effectiveness of AI tools, particularly ChatGPT, in generating functional code over time. Citing IEEE research, Deloitte reports that earlier versions of ChatGPT trained on 2021 data outperformed those trained on 2023 datasets, especially on non-standard tasks. Success rates dropped from 89% to 52% for simple tasks and from 40% to just 0.66% for complex ones, making manual code review a necessity, not an option.

Lack of standardization and unpredictability in AI output

While LLMs and similar tools can generate the expected templated output with precise prompts, it’s common for team members on the same project to receive up to 20 distinct variations of what is essentially the same response. This variability complicates the testing and evaluation of generative AI systems.

Trained prompt engineers may help address the issue in the future, though achieving fully standardized outputs is unlikely, and such expertise is still in short supply.

Reddit user kibblerz shared his own experience using AI for code generation. Despite his skepticism, he tried “vibe coding” under pressure from a former manager who criticized him for relying on traditional dev tools instead of AI-based solutions.

The post’s author shared that trying to build a React drawer menu took 50 cents in credits and “it was highly problematic.” Any libraries that had changed after the model’s training data collection were “a mess” and made for “altogether a very bumpy process.” According to kibblerz, all their routine work could have been automated using a few VM Macros, Grep, and Regex “in a sequential fashion that’s under full control.” In the end, the user called this vibe coding “a nightmarish dystopia.”

Reddit thread: frustration with using AI to write code

Maxim Leykin, Head of Engineering at Bamboo Agile
“In my view, a major issue is that AI doesn’t understand the coding standards specific to a company or project. Code isn’t written in a vacuum; it’s written for real clients and often follows concrete agreements. For example, a project might require all variable names to begin with an abbreviation that reflects the project name. So, while the AI-generated code might not be inherently bad, it has no awareness of such conventions.

Another issue is the high number of false positives when AI tools review code. They might report ten issues, and three of them aren’t actually errors but intentional deviations from generic standards with a specific purpose.”

Architecture turns into a black box and loses the “human touch”

As GenAI becomes increasingly integrated into the workflows of product and engineering leaders, helping codify business requirements, standardize design, support knowledge transfer, and even guide architectural standards, experts are warning about the risks of erosion of flexibility and creative thinking. There is growing unease that teams may end up stuck in a “vacuum” of repetitive patterns and decisions – the very ones on which AI models were trained.

On the business side, this might lead to inefficient but fixable strategies. On the engineering side, however, the consequences can be more serious: the architecture may gradually become a black box; it works, but no one can clearly explain how it does or predict what might happen if it’s changed. As a result, maintenance, scaling, and auditing become significantly riskier.

It’s worth mentioning that even cybersecurity experts have begun voicing similar concerns, warning that junior developers are increasingly struggling with deep system-level understanding – a trend they view as potentially hazardous for the field. “Many can generate functional code snippets but struggle to explain the underlying logic or defend them against real-world attack scenarios,” notes Om Moolchandani, co-founder and CISO/CPO at Tuskira, in a comment to CSO.

Analyzing 211 million lines of code from 2020 to 2024, GitClear researchers confirmed that since the introduction of coding assistants powered by AI, code quality, in measurable terms, had been on the decline. In 2024, for the first time on record, copied lines outnumbered moved lines – indicating a decline in refactoring practices. The share of moved code fell from 24% in 2020 to 9.5%, while copied code rose from 8.3% to 12.3%.The study also reported an eightfold increase in code blocks containing five or more duplicated lines. Of these, 57% of co-edited clones were associated with software defects, linking duplication to declining system stability. These findings align with Google’s DORA 2024 data report, discussed in the section below.

Interdependence of multimodal AI systems

Many companies now rely on publicly available large LLMs, which may increase the need for multimodal systems – solutions where multiple models or agents interact and share responsibilities, such as text generation, data processing, or visualization, depending on which agent performs each task most effectively.

In these setups, a critical error from one agent can propagate to another, potentially leading to system-wide failures or compromising data integrity.

According to Deloitte, organizations pursuing such strategies are already raising concerns about internal AI governance (62%), workforce training (60%), and auditability and compliance with regulations (55%).

DevOps: the paradoxical effect of AI on software delivery

Despite cautious optimism about AI in DevOps, some data points reveal a paradox: under certain conditions, AI adoption can reduce key performance indicators in areas where improvements were expected.

The Google DORA 2024 report, for instance, found that every 25% increase in AI usage was associated with a 7.2% drop in stability and a 1.5% decline in throughput.

DORA's scheme: impacts of AI adoption on delivery throughput and stability

These findings were unexpected, as improvements in areas such as documentation, code quality, and review speed have historically been correlated with higher performance. While these local aspects improved with AI (documentation quality rose by 7.5% and code quality by 3.4%), they had no measurable effect on overall delivery stability.

DORA's scheme: impacts of AI adoption on development quality

One possible explanation, the authors suggest, is a shift in the nature of releases themselves. Code autogeneration accelerates development but breaks with one of DevOps’s core principles: small, incremental iterations. The issue lies in the large code batches, which can increase the risk of failures even without factoring in typical AI-related errors and reduce deployment resilience.

It’s worth noting that AI demonstrates a positive impact at both the individual and team levels. According to the report, developer productivity increased by 2.1%, job satisfaction rose by 2.2%, team effectiveness improved by 1.4%, and overall company performance grew by 2.3%. However, product-level metrics, such as usability, latency, and security, did not show a statistically significant improvement.

DORA's scheme: impacts of AI adoption on organizational, team, and product performance

As a result, the DORA authors conclude that while AI can optimize specific parts of the development process, it doesn’t yet guarantee improved DevOps performance overall, especially when its core principles are compromised. In particular, some developers on Reddit point out that in areas like orchestration, hallucinations and errors still make it hard to choose “1,000 hours of prompt engineering” over “a couple of hours and some coffee with an experienced engineer.”

Thread on Reddit: will the demand of DevOps engineers be reduced?

Still, the report’s authors emphasize that the industry is only at the beginning of its journey with AI in DevOps and that many of its potential benefits (and drawbacks) have yet to fully emerge.

Maxim Leykin, Head of Engineering at Bamboo Agile
“I still have high hopes for AI in DevOps. Why? Because DevOps is highly standardized, it relies on a limited set of tools and technologies that can be assembled into a CI/CD pipeline, much like Lego blocks. Setting all of this up is often a repetitive routine that takes a lot of time: you forget something here, misconfigure something there, link Jira to GitHub incorrectly, and so on. These are exactly the kinds of tasks AI should be good at handling. This is exactly where I see AI making a real impact – and we’re already seeing early signs of it with tools like Copilot, GitHub Actions, and others. Eventually, I expect we’ll reach a point where you simply feed AI the input data and get a configured CI/CD pipeline in return.”

Conclusion

While AI tools will undoubtedly continue to unlock broader opportunities for accelerating and optimizing specific stages of development, their more systematic yet poorly thought-out adoption poses serious risks, from data security threats and architectural imbalance to the not yet fully understood impact on DevOps metrics.

In the next part of the article, we’ll broaden the scope and take a closer look at the social, ethical, and legal aspects of AI adoption – from tensions in professional culture and the risk of fragmentation within the tech community to concerns about research freedom and corporate monopolization.

And in the fourth part, we’ll present a practical guide on how to avoid the pitfalls of AI integration and strike the right balance between efficiency, control, and trust in AI tools.

Partner with a team that knows where AI helps – and where it hurts

Get in touch

Resources

1. “’I Don’t Use AI for Everything’: Exploring Utility, Attitude, and Responsibility of AI-empowered Tools in Software Development”, Shidong Pan et al., 2024, https://www.researchgate.net/publication/384245979_textitI_Don’t_Use_AI_for_Everything_Exploring_Utility_Attitude_and_Responsibility_of_AI-empowered_Tools_in_Software_Development

2. “Stack Overflow Developer Survey 2024”, Stack Overflow, 2024, https://survey.stackoverflow.co/2024/

3. “CIO Community Pulse on AI Adoption”, Evanta, Gartner, 2024, https://www.evanta.com/resources/cio/infographic/cio-community-pulse-on-ai-adoption

4.”Whoops, Samsung workers accidentally leaked trade secrets via ChatGPT”, Mashable, 2023, https://mashable.com/article/samsung-chatgpt-leak-details

5. “Do Users Write More Insecure Code with AI Assistants?”, Neil Perry et al., Stanford University, 2023, https://arxiv.org/pdf/2211.03622

6. “How can organizations develop quality software in age of gen AI”, Deloitte, 2024, https://www2.deloitte.com/us/en/insights/industry/technology/how-can-organizations-develop-quality-software-in-age-of-gen-ai.html

7. “No Need to Lift a Finger Anymore? Assessing the Quality of Code Generation by ChatGPT”, Zhijie Liu et al., IEEE, 2024, https://ieeexplore.ieee.org/document/10507163

8. “The risks of entry-level developers over relying on AI”, CSO Online, 2025, https://www.csoonline.com/article/3951403/the-risks-of-entry-level-developers-over-relying-on-ai.html#:~:text=As%20AI,blind%20spots%20organizations%20shouldn%27t%20ignore

9. “AI Copilot Code Quality: 2025 Look Back at 12 Months of Data”, GitClear, 2025, https://www.gitclear.com/ai_assistant_code_quality_2025_research

10. “DORA 2024 State of DevOps”, Google, 2024, https://services.google.com/fh/files/misc/2024_final_dora_report.pdf