Muestra las diferencias entre dos versiones de la página.
| Ambos lados, revisión anterior Revisión previa Próxima revisión | Revisión previa | ||
|
db:sql [2010/07/16 16:48] alfred |
db:sql [2020/05/09 09:25] (actual) |
||
|---|---|---|---|
| Línea 197: | Línea 197: | ||
| insert into c (xa, xb) select ca, sum(cb) from b group by ca; | insert into c (xa, xb) select ca, sum(cb) from b group by ca; | ||
| </code> | </code> | ||
| + | |||
| + | |||
| + | |||
| + | |||
| + | |||
| ==== Errores comunes ==== | ==== Errores comunes ==== | ||
| Línea 222: | Línea 227: | ||
| Query time: 0.016s | Query time: 0.016s | ||
| </code> | </code> | ||
| - | + | * **No securizar las consultas**, el usuario podría introducir código malicioso. | |
| - | + | * **No utilizar //prepared statements//**, estos son consultas compiladas en la DB a las que les pasas parámetros (por ejemplo, según la plataforma, utilizar ''SELECT * FROM users WHERE username = ?'' o ''SELECT * FROM users WHERE username = :username'' en vez de ''SELECT * FROM users WHERE username = 'bob'''). Cada vez que una DB moderna recibe una nueva consulta la ha de compilar, si la consulta ya "la ha visto antes" podrá cachear la ejecución además de que puede ser importante a la hora de realizar estadisticas sobre las consultas utilizadas y evita el código malicioso. | |
| - | + | * **No realizar analisis de rendimientos**, como el de las consultas anteriores. | |
| - | + | * **Utilizar condiciones con OR en consultas**, las condiciones AND restingen los datos mientras que los OR los hacen crecer y esto desfavorece al motor de la DB. | |
| - | + | <code> | |
| - | + | Bad: | |
| - | + | ... WHERE a = 2 OR a = 5 OR a = 11 | |
| - | 6. Not simplifying complex queries through views | + | Better: |
| - | + | ... WHERE a IN (2, 5, 11) | |
| - | Not all database vendors support views but for those that do, they can greatly simplify queries if used judiciously. For example, on one project I used a generic Party model for CRM. This is an extremely powerful and flexible modelling technique but can lead to many joins. In this model there were: | + | </code> |
| + | * **No simplificar consultas complejas mediante vistas**, en las DB que soportan vistas es de gran ayuda utilizarlas | ||
| + | <code> | ||
| + | Model: | ||
| * Party: people and organisations; | * Party: people and organisations; | ||
| * Party Role: things those parties did eg Employer and Employer; | * Party Role: things those parties did eg Employer and Employer; | ||
| Línea 238: | Línea 245: | ||
| Example: | Example: | ||
| - | |||
| * Ted is a Person, being a subtype of Party; | * Ted is a Person, being a subtype of Party; | ||
| * Ted has many roles, one of which is Employee; | * Ted has many roles, one of which is Employee; | ||
| Línea 257: | Línea 263: | ||
| And suddenly you have a very simple view of the data you want but on a highly flexible data model. | And suddenly you have a very simple view of the data you want but on a highly flexible data model. | ||
| - | + | </code> | |
| - | 7. Not sanitizing input | + | * **Utilizar //exclusive arcs//**, estos consisten en crear una tabla con dos o más foreign keys de las cuales una pueda ser not-null, esto hace que sea más complejo mantener la integridad referencial. |
| - | + | * **No normalizar suficiente**, la normalización consiste en optimizar el diseño de la DB y organizar correctamente los datos en tablas. | |
| - | This is a huge one. Now I like PHP but if you don't know what you're doing it's really easy to create sites vulnerable to attack. Nothing sums it up better than the story of little Bobby Tables. | + | * **Normalizar demasiado**. Para cuatro conceptos no deberías tener once tablas. |
| - | + | Sacado de [[http://stackoverflow.com/questions/621884/database-development-mistakes-made-by-app-developers/621891#621891|Stack Overflow]] | |
| - | Data provided by the user by way of URLs, form data and cookies should always be treated as hostile and sanitized. Make sure you're getting what you expect. | + | |
| - | + | ||
| - | 8. Not using prepared statements | + | |
| - | + | ||
| - | Prepared statements are when you compile a query minus the data used in inserts, updates and WHERE clauses and then supply that later. For example: | + | |
| - | + | ||
| - | SELECT * FROM users WHERE username = 'bob' | + | |
| - | + | ||
| - | vs | + | |
| - | + | ||
| - | SELECT * FROM users WHERE username = ? | + | |
| - | + | ||
| - | or | + | |
| - | + | ||
| - | SELECT * FROM users WHERE username = :username | + | |
| - | + | ||
| - | depending on your platform. | + | |
| - | + | ||
| - | I've seen databases brought to their knees by doing this. Basically, each time any modern database encounters a new query it has to compile it. If it encounters a query it's seen before, you're giving the database the opportunity to cache the compiled query and the execution plan. By doing the query a lot you're giving the database the opportunity to figure that out and optimize accordingly (eg by pinning the compiled query in memory). | + | |
| - | + | ||
| - | Using prepared statements will also give you meaningful statistics about how often certain queries are used. | + | |
| - | + | ||
| - | Prepared statements will also better protect you against SQL injection attacks. | + | |
| - | + | ||
| - | 9. Not normalizing enough | + | |
| - | + | ||
| - | Database normalization is basically the process of optimizing database design or how you organize your data into tables. | + | |
| - | + | ||
| - | Just this week I ran across some code where someone had imploded an array and inserted it into a single field in a database. Normalizing that would be to treat element of that array as a separate row in a child table (ie a one-to-many relationship). | + | |
| - | + | ||
| - | This also came up in Best Method for Storing a List of User IDs: | + | |
| - | + | ||
| - | I've seen in other systems that the list is stored in a serialized PHP array. | + | |
| - | + | ||
| - | But lack of normalization comes in many forms. | + | |
| - | + | ||
| - | More: | + | |
| - | + | ||
| - | * Normalization: How far is far enough? | + | |
| - | * SQL by Design: Why You Need Database Normalization | + | |
| - | + | ||
| - | 10. Normalizing too much | + | |
| - | + | ||
| - | This may seem like a contradiction to the previous point but normalization, like many things, is a tool. It is a means to an end and not an end in and of itself. I think many developers forget this and start treating a "means" as an "end". Unit testing is a prime example of this. | + | |
| - | + | ||
| - | I once worked on a system that had a huge hierarchy for clients that went something like: | + | |
| - | + | ||
| - | Licensee -> Dealer Group -> Company -> Practice -> ... | + | |
| - | + | ||
| - | such that you had to join about 11 tables together before you could get any meaningful data. It was a good example of normalization taken too far. | + | |
| - | + | ||
| - | More to the point, careful and considered denormalization can have huge performance benefits but you have to be really careful when doing this. | + | |
| - | + | ||
| - | More: | + | |
| - | + | ||
| - | * Why too much Database Normalization can be a Bad Thing | + | |
| - | * How far to Take Normalization in Database Design ? | + | |
| - | * When Not to Normalize your SQL Database | + | |
| - | * Maybe Normalizing Isn't Normal | + | |
| - | * The Mother of All Database Normalization Debates on Coding Horror | + | |
| - | + | ||
| - | 11. Using exclusive arcs | + | |
| - | + | ||
| - | An exclusive arc is a common mistake where a table is created with two or more foreign keys where one and only one of them can be non-null. Big mistake. For one thing it becomes that much harder to maintain data integrity. After all, even with referential integrity, nothing is preventing two or more of these foreign keys from being set (complex check constraints notwithstanding). | + | |
| - | + | ||
| - | From A Practical Guide to Relational Database Design: | + | |
| - | + | ||
| - | We have strongly advised against exclusive arc construction wherever possible, for the good reason that they can be awkward to write code and pose more maintenance difficulties. | + | |
| - | + | ||
| - | 11. Not doing performance analysis on queries at all | + | |
| - | + | ||
| - | Pragmatism reigns supreme, particularly in the database world. if you're sticking to principles to the point that they've become a dogma then you've quite probably made mistakes. Take the example of the aggregate queries from above. The aggregate version might look "nice" but it's performance is woeful. A performance comparison should've ended the debate (but it didn't) but more to the point: spouting such ill-informed views in the first place is ignorant, even dangerous. | + | |
| - | + | ||
| - | 12. Over-reliance on UNION ALL and particularly UNION constructs | + | |
| - | + | ||
| - | A UNION in SQL terms merely concatenates congruent data sets, meaning they have the same type and number of columns. The difference between them is that UNION ALL is a simple concatenation and should be preferred wherever possible whereas a UNION will implicitly do a DISTINCT to remove duplicate tuples. | + | |
| - | + | ||
| - | UNIONs, like DISTINCT, have their place. There are valid applications. But if you find yourself doing a lot of them, particularly in subqueries, then you're probably doing something wrong. That might be a case of poor query construction or a poorly designed data model forcing you to do such things. | + | |
| - | + | ||
| - | UNIONs, particularly when used in joins or dependent subqueries, can cripple a database. Try to avoid them whenever possible. | + | |
| - | + | ||
| - | 13. Using OR conditions in queries | + | |
| - | + | ||
| - | This might seem harmless. After all, ANDs are OK. OR should be OK too right? Wrong. Basically an AND condition restricts the data set whereas an OR condition grows it but not in a way that lends itself to optimisation. Particularly when the different OR conditions might intersect thus forcing the optimizer to effectively to a DISTINCT operation on the result. | + | |
| - | + | ||
| - | Bad: | + | |
| - | + | ||
| - | ... WHERE a = 2 OR a = 5 OR a = 11 | + | |
| - | + | ||
| - | Better: | + | |
| - | + | ||
| - | ... WHERE a IN (2, 5, 11) | + | |
| - | + | ||
| - | Now your SQL optimizer may effectively turn the first query into the second. But it might not. Just don't do it. | + | |
| - | + | ||
| - | 14. Not designing their data model to lend itself to performant solutions | + | |
| - | + | ||
| - | This is a hard point to quantify. It is typically observed by its effect. If you find yourself writing gnarly queries for relatively simple tasks or that queries for finding out relatively straightforward information is not performant, then you probably have a poor data model. | + | |
| - | + | ||
| - | In some ways this point summarizes all the earlier ones but it's more of a cautionary tale that doing things like query optimisation is often done first when it should be done second. First and foremost you should ensure you have a good data model before trying to optimize the performance. As Knuth said: | + | |
| - | + | ||
| - | Premature optimization is the root of all evil | + | |
| - | + | ||