You Ask, We Answer: What are the Core Problems with MySQL’s Architecture and Scalability?
Here at Sirius, we often get asked, "What are the biggest problems with MySQL?" This is a critical question, and one that deserves a clear, honest answer. We understand that when making a major infrastructure decision, we all tend to worry more about what might go wrong than what will go right.
We want to be fiercely transparent right from the start: MySQL’s popularity often obscures its fundamental architectural limitations. While it is an excellent database for many use cases, for high-scale, mission-critical operations, relying on its "free" open-source core transfers complexity and risk directly onto your operational teams. If the marketplace believes a product has problems, they will find out, and hiding this information only erodes trust.
This article will pull back the curtain on these "elephants in the room"—covering the systemic challenges that mandate highly specialized staff, the structural deficiencies in scalability and analytical performance, and the hidden financial risks related to licensing and compliance. Our goal is to educate you thoroughly, enabling you to make the most informed decision possible for your specific needs.
Section 1: The Core Architectural Constraint – Scaling and the OLTP/OLAP Divide
MySQL’s design, rooted in its single- node paradigm, creates systemic operational bottlenecks when facing concurrent demands in high-scale enterprise environments.
1. The Single-Node Bottleneck and the Necessity of Sharding
MySQL is architecturally optimized for Online Transaction Processing (OLTP), focusing on high-frequency, small-scope transactions. The system relies primarily on a single server to handle all database operations.
- The Problem: As application workloads or datasets grow to Terabyte (TB) and Petabyte (PB) scale, this single-node paradigm becomes an insurmountable bottleneck.
- The Operational Cost: To achieve horizontal scalability (distributing load across multiple servers), organizations must implement complex sharding techniques (partitioning data). Managing operations that cross shard boundaries requires application developers to manually reinvent core database functionality, severely increasing development difficulty and overhead. This necessity ultimately transfers complexity and cost to specialized Database Reliability Engineers (DREs) and application developers.
2. Deficiencies in Analytical Processing (OLAP)
MySQL’s use of a rowstore format—storing data row by row—is optimal for transactional integrity but highly inefficient for read-heavy analytical queries that scan large portions of columns.
- The Problem: The design inherently hinders performance for Online Analytical Processing (OLAP). MySQL lacks critical advanced features essential for high-performance analytics, such as native column-store indexes, Materialized Views, and vectorized query execution.
- The Consequence: This forces enterprises to adopt complex two-database strategies (OLTP in MySQL, OLAP in a dedicated system) or adopt proprietary solutions like Oracle’s HeatWave platform, which accelerates OLTP/OLAP convergence.
Section 2: Operational Fragility, Consistency Risks, and Replication Issues
Standard asynchronous replication, the backbone of most High Availability (HA) MySQL topologies, is inherently fragile, leading to persistent challenges regarding lag and data consistency during failures.
1. Replication Lag and Instability
Replication lag (the delay between a transaction committing on the primary and applying on the replica) is a complex challenge driven by issues on both ends.
- Causes: Lag can stem from long-running transactions holding locks on the primary, or resource bottlenecks (I/O constraints) and long-running queries on the replica itself.
- A Simple Flaw: Critically, simple flaws like missing primary keys on tables force the replica to use slower full table or secondary index lookups to find rows to update, significantly contributing to lag.
2. The Critical Risk of Data Loss During Failover
Automated failover solutions (like Orchestrator) must handle the scenario where the primary fails while replicas are lagging. This process is hazardous because the ecosystem’s defaults often prioritize availability over consistency.
- The Default Danger: Orchestrator’s default configuration allows promotion of a lagging member as the new master. This means the promotion proceeds "flawlessly" but results directly in data loss on the newly promoted master because it missed transactions from the old primary.
- Mitigation Requirement: For mission-critical systems, DREs must manually override this default (e.g., by enabling DelayMasterPromotionIfSQLThreadNotUpToDate) to ensure the new master is fully synchronized, prioritizing data integrity over speed of promotion.
3. Insufficient Native Monitoring
The reliability of HA failover is undermined by the inadequacy of native MySQL monitoring tools. The standard metric, seconds_behind_master, is notoriously unreliable, often reporting "null" during actual failure events.
- The Fix: Reliable HA decisions necessitate the deployment and maintenance of external, community-developed tools. For instance, specialized tools like pt-heartbeat (part of Percona Toolkit) are required to measure absolute delay in sub-second resolution.
Section 3: The Performance and Configuration Expertise Tax
Optimizing MySQL performance is a continuous engineering process that demands deep expertise in engine internals and continuous, workload-specific parameter calibration.
1. Internal Contention and Excessive Locking
While InnoDB provides row-level locking, complex transaction management introduces contention points. MySQL's default transaction isolation level (REPEATABLE READ) employs Next-Key Locks (a record lock and a gap lock) to prevent "phantom rows".
- The Problem: This proactive locking of index gaps often leads to excessive and unnecessary lock contention in high-write transactional systems, increasing the likelihood of deadlocks.
- The Trade-Off: This design choice is often overly restrictive, forcing DREs in high-throughput environments to often violate the default by switching to the less-safe READ COMMITTED isolation level to regain acceptable performance.
2. Query Optimizer Shortcomings
The query optimizer suffers from inherent statistical and cost model deficiencies that frequently lead to suboptimal plan choices.
- Crude Estimates: The optimizer uses crude index statistics that fail to account for uneven data distribution (data skew). Furthermore, it rarely accounts for the
LIMITclause during cost estimation, resulting in plans that process unnecessary volumes of data. - The Consequence: When the optimizer fails to select the correct index, DREs are forced to manually override its decision using index hints (FORCE INDEX). This introduces technical debt and promotes the fragility of query tuning, undermining the core benefit of the cost-based optimizer.
3. The Continuous Tuning Burden
Performance management is not a static configuration exercise but a perpetual tuning mandate.
- Memory Management: The critical
innodb_buffer_pool_size(IBP) often consumes 50% to 75% of dedicated system RAM, and any major change in this allocation requires a full server restart, linking resource allocation directly to system availability challenges. - Thread Management: Tuning the
innodb_thread_concurrencyvariable is a delicate balancing act needed to minimize context switching overhead; incorrect settings can lead to wasteful CPU saturation or unnecessary underutilization.
Section 4: Functional Gaps, Governance, and Licensing Financial Risk
Beyond technical problems, organizations face structural risks related to licensing compliance and the direction of the product under Oracle’s ownership.
1. Functional Gaps vs. Competitors
When assessed against feature-rich alternatives, MySQL exhibits limitations in modern data support and indexing.
- Indexing Deficits: MySQL lacks support for advanced index types like Expression Indexes (optimizing function results) and Partial Indexes (indexing only a subset of rows). This establishes an architectural feature ceiling, preventing optimizations essential for complex filtering.
- Vendor Lock-In in JSON: Although MySQL supports the JSON data type, its operations often rely on non-standard SQL extensions. Utilizing these non-standard functions creates a dependency on MySQL's specific implementation, significantly increasing the difficulty, time, and cost of migrating the application to a different relational database in the future.
2. Governance Risk and Feature Stagnation
Oracle’s acquisition of MySQL led to community concerns and the creation of the MariaDB fork.
- The Strategic Pivot: Oracle has substantially reduced commitment to the open-source core, pivoting resources toward its proprietary AI and cloud offerings, notably MySQL HeatWave on Oracle Cloud Infrastructure (OCI).
- The Risk: This divergence means enterprises relying on the open-source GPL core must either accept the risk that the core product will evolve slowly or face strategic lock-in by adopting the proprietary cloud solution.
3. Commercial Licensing and Audit Risks (The Hidden Financial Liability)
The widespread misconception that the Community Edition is "always free" constitutes a severe financial penalty of ignorance, which falls under the "Problems" category of the Big 5.
- GPL Non-Compliance: The Community edition is available under the strict terms of the General Public License version 2 (GPLv2). Organizations that modify the source code but choose not to release those modifications under the GPL are operating in license violation. Such violations technically require the organization to purchase an expensive commercial subscription.
- Feature Misuse: Using proprietary commercial features (like high availability or clustering features found in the expensive Cluster Carrier Grade edition) without the corresponding subscription is a significant financial risk. Oracle is known to be increasingly focusing on compliance and audits for MySQL, compounded by the complexity of license counting (based on sockets/processors).
Conclusion: The Expertise Tax and Tooling Reliance
The accumulation of MySQL’s systemic challenges—scaling limitations, HA fragility, and optimization difficulty—demonstrates that solving these problems relies heavily on a fragmented, external ecosystem of specialized tools, imposing a significant expertise tax on operations.
When scaled, MySQL ceases to function as a simple RDBMS and transforms into a complex system of single-node servers managed by a custom-engineered distributed framework (proxies, orchestrators, external monitors).
Managing this risk requires highly skilled DREs capable of mastering and maintaining this complex, fragmented, non-native tool chain, including specialized tools like Vitess (for scaling), ProxySQL (for routing), and Percona Toolkit (for reliable HA monitoring). This reliance on specialized labor is the single largest component of Total Cost of Ownership over a system’s lifetime.
In short, if you choose the free Community Edition, you are trading a zero-dollar license fee for a mandatory, high-cost investment in specialized internal human expertise to build the necessary reliability layers yourself.