Herramientas de usuario

Herramientas del sitio


db:sql

¡Esta es una revisión vieja del documento!


SQL

  • Comentarios, como en C: /* … */

Edición de bases de datos

Crear

CREATE DATABASE nombre

CREATE DATABASE bookstore;

Edición de tablas

Crear

CREATE TABLE nombre (columna tipo propiedades, columna tipo propiedades…)

CREATE TABLE books (
  rec_id INT, 
  title VARCHAR(50),
  author VARCHAR(50)
);

Modificar

ALTER TABLE nombre modificiación, modificación…
Las modificaciones posibles:

  • ADD COLUMN columna tipo propiedades
  • CHANGE COLUMN columna nuevo_nombre tipo propiedades
ALTER TABLE books
  CHANGE COLUMN rec_id rec_id INT AUTO_INCREMENT PRIMARY KEY,
  CHANGE COLUMN author author_id INT,
  ADD COLUMN description BLOB;

Tipos de datos

  • Enteros: BIGINT → 8bytes, INT → 4bytes, MEDIUMINT → 3bytes, SMALLINT → 2bytes, TINYINT → 1byte
  • Reales: DECIMAL[(precisión)] → precisión + 2bytes, = DEC, DOUBLE[(tamaño de muestra, precisión)] → 8bytes, FLOAT → 4bytes
  • Texto: VARCHAR(tamaño), LONGTEXT, TEXT, CHAR, TINYTEXT,
  • Binarios: BLOB = TEXT BINARY, LONG BLOB = LONGTEXT BINARY
  • Tiempo: DATE → yyyy-mm-dd (3bytes), DATETIME → yyyy-mm-dd hh:mm:ss (8bytes), TIME → hh:mm:ss (3bytes), TIMESTAMP → yyyymmddhhmmss (4bytes), YEAR → yyyy (1byte)
  • Enumeraciones: ENUM, SET

Las enumeraciones se asignan entre parentesis un grupo de valores que aceptan, estos entre comillas simples:

  • ENUM ('yes', 'no')
  • ENUM('novel','poetry','drama')

Si se le asigna un valor que no existe provoca un warning e inserta un valor en blanco. Cada valor insertado recibe un número, siendo el primero 1, el segundo 2… Y el valor en blanco un 0. Por lo que puedes hacer selects preguntando por el valor numérico o el valor dado.
Los ENUM ocupan: 1 byte si tienen menos de 255 elementos o 2 bytes si tienen menos 65535. Los SET, en cambio, ocupan mucho más (1-8 elementos: 1byte, 9-16 elementos: 2 bytes… 33-64 elementos: 8 bytes).

Propiedades

  • AUTO_INCREMENT
  • PRIMARY KEY
  • NOT NULL
  • UNSIGNED (numéricos)
  • ZERO FILL (numéricos)
  • BINARY (texto)

Las foreign keys

La sintaxis para definir una foreign key en MySql es:

[CONSTRAINT símbolo] FOREIGN KEY (nombre_columna, ...)
  REFERENCES nombre_tabla (nombre_columna, ...)
  [ON DELETE {CASCADE | SET NULL | NO ACTION | RESTRICT}]
  [ON UPDATE {CASCADE | SET NULL | NO ACTION | RESTRICT}]
CREATE TABLE publicaciones (isbn VARCHAR(10) PRIMARY KEY, id_editorial INT, FOREIGN KEY (id_editorial) REFERENCES editoriales(id));

Un ejemplo para añadir una foreign key tras haber creado la tabla:

ALTER TABLE venta ADD FOREIGN KEY(id_cliente) REFERENCES cliente(id_cliente);

Eliminar

DROP TABLE nombre

DROP TABLE books;

Opcionalmente puedes añadir IF EXISTS tras TABLE, de esa forma si la tabla no existe el comando no dará error:

DROP TABLE IF EXISTS books;

Ejemplos

  • Creación de una tabla con varias primary keys y varias foreign keys:
CREATE TABLE rel_autores_publicaciones (id_autor INT, isbn VARCHAR(10), PRIMARY KEY (id_autor, isbn), FOREIGN KEY (id_autor) REFERENCES autores(id), FOREIGN KEY (isbn) REFERENCES publicaciones(isbn));
 
CREATE TABLE diccionarios (isbn VARCHAR(10) PRIMARY KEY, id_idioma1 INT NOT NULL, id_idioma2 INT, FOREIGN KEY (isbn) REFERENCES publicaciones(idiomas), FOREIGN KEY (id_idioma1) REFERENCES idiomas (id), FOREIGN KEY (id_idioma2) REFERENCES idiomas(id));

Edición de datos

Insertar

INSERT INTO tabla (campo1, campo2…) VALUES (valorNumerico, 'valorString'…)

INSERT INTO authors (author_last, author_first, country) VALUES('Vernon','Olympia','USA');

Modificar

UPDATE tabla SET campo1=valor1, campo2=valor2, … [WHERE columnaN=valorN]

UPDATE City SET Population=5000000 WHERE Name='JoanBrossa';
UPDATE City SET Population=5000000 WHERE Population<1000000;

Eliminar

Consulta de datos

Selects

SELECT campos FROM tablas [WHERE condicion] [ORDER BY campoOrden tipoOrden] [LIMIT límite]

  • Los campos pueden ser expresados como:
    • Nombres de campos de una tabla separados por comas.
    • tabla.campo si queremos hacer un select de campos de distintas tablas.
    • Usaremos * si queremos decir todos.
    • Si queremos que un campo, cuando se muestre, su cabecera sea distinta a la que tiene realmente haremos: campo AS nombre.
  • Las tablas…
    • Si indicamos que se muestren campos de varias tablas, todas estas tablas deben de aparecer aquí.
    • Si queremos usar un alias para los nombres de tablas para que así al elegir los campos sea más sencillo usaremos tabla AS otro_nombre, luego en los campos tenemos que usar otro_nombre.campo.
  • Las condiciones…
    • Las separaremos por AND (&&), OR (||), NOT (!), XOR.
    • Realizaremos las comprobaciones con:
      • Operadores: =, !=, <>, ⇐, >=, <, >
      • valor BETWEEN valor1 AND valor2
      • valor IN (valor1, valor2…) o con NOT IN
      • valor1 LIKE valor2 o con NOT LIKE
      • [En: MySQL] Expresiones Regulares: REGEXP, RLIKE o NOT REGEXP.
  • Para la ordenación de campos…
    • Se elige el campo por el cual se ordenará la select.
    • El tipoOrden pueden ser: ASC, DESC, si no se indica tipoOrden este será ASC (ascendente).
    • Puedes indicar varias ordenaciones separando estas por comas.
  • El límite es un número que indica el máximo de elementos que han de mostrarse.
    • Podemos indicar dos números, el primero los registros anteriores a los mostrados y el segundo cuantos mostraremos. Por ejemplo, LIMIT 3, 2, mostraría el 4º y el 5º registro.
  • Grupos…
    • Utilizando la clausula GROUP BY nos mostrará los resultados agrupados por el campo indicado tras esta.

Ejemplos

SELECT * FROM books;
SELECT title, description FROM books;
SELECT title AS titulo FROM books;
SELECT rec_id, title, description FROM books WHERE genre = 'novel';
SELECT * FROM books ORDER BY title, pub_year LIMIT 20, 10;
SELECT books.title, a.name FROM books, authors AS a;
SELECT * FROM autores WHERE nombre REGEXP 'r{2}';

Selects avanzados

joins

Imaginemos una BD donde tenemos una relación de películas\director, para sacar películas por director haremos:

SELECT title, name FROM movies m, directors d WHERE m.director = d.id;

Con JOIN sería:

SELECT title, name FROM movies m INNER JOIN directors d ON (m.director = d.id);

Los joins funcionan de la siguiente manera:

  • Teniendo 2 tablas (T1 y T2) con unos campos relacionados vamos a ver lo que devolverían los distintos joins (la parte azul de las tablas).

  • INNER JOIN

  • OUTER JOIN

  • LEFT JOIN

  • LEFT OUTER JOIN

Selects anidados

En el siguiente ejemplo se buscan todos los contactos que puede ver el trabajador 1 (los suyos y los que son públicos (visibilidad = 1)):

SELECT * FROM Contactos 
  WHERE Contactos.id IN (SELECT Contactos.id FROM Contactos, RelContTrab WHERE RelConTrab.idCon = Contactos.id AND RelConTrab.idTrab = 1)
  OR Contactos.id IN (SELECT Contactos.id FROM Contactos WHERE Contactos.Visibility = 1);

También podemos pedir campos que estén o no en una lista:

SELECT * FROM tags WHERE id NOT IN (SELECT id FROM Elements);
SELECT * FROM tags WHERE id IN (3, 4, 5);

Funciones

  • count devuelve el número de registros de una serie de campos en una tabla:
SELECT COUNT(*) FROM users;
  • sum devuelve la suma de los valores de un campo numérico en una consulta:
SELECT ciudad.nombre, SUM(pueblo.hab) FROM pueblo, ciudad WHERE (pueblo.id_ciudad = ciudad.id) GROUP BY (ciudad.nombre);

Notas

Tips

Insertar los datos de un select

Teniendo dos tablas:

CREATE TABLE b (ca INT, cb INT);
CREATE TABLE c (xa INT, xb INT);

Insertamos en la tabla b unos datos cb con un identificador ca (que puede estar repetido), ahora queremos insertar en la tabla c la suma de los valores cb con un mismo identificador:

INSERT INTO c (xa, xb) SELECT ca, SUM(cb) FROM b GROUP BY ca;

Errores comunes

  • No usar correctamente los índices; Deberían de tener índices obligatoriamente las foreign keys, los campos utilizados en WHERE…
  • No apoyarse en la integridad referencial, si la DB lo permite (por ejemplo las foreign keys) esta debería ser usada siempre.
  • Utilizar claves naturales, estas son claves únicas del mundo real (código de la seguridad social, código de productos…) pero no tienen sentido en la DB, se inventaron para identificar el elemento fuera de la DB y es preferible utilizar campos de autoincremento.
  • Escribir consultas que requieran DISTINCT para funcionar, no es que la clausula DISTINCT sea incorrecta es que apunta a que pueden haber datos duplicados en la base de datos.
  • Preferir agregación sobre joins, las operaciones de agregaciones (como GROUP BY) son sumamente lentas comparadas con joins. Un ejemplo:
SELECT userid
FROM userrole
WHERE roleid IN (1, 2, 3)
GROUP by userid
HAVING COUNT(1) = 3

Query time: 0.312s
------------------------------------------------------

SELECT t1.userid
FROM userrole t1
JOIN userrole t2 ON t1.userid = t2.userid AND t2.roleid = 2
JOIN userrole t3 ON t2.userid = t3.userid AND t3.roleid = 3
AND t1.roleid = 1

Query time: 0.016s

6. Not simplifying complex queries through views

Not all database vendors support views but for those that do, they can greatly simplify queries if used judiciously. For example, on one project I used a generic Party model for CRM. This is an extremely powerful and flexible modelling technique but can lead to many joins. In this model there were:

  • Party: people and organisations;
  • Party Role: things those parties did eg Employer and Employer;
  • Party Role Relationship: how those roles related to each other.

Example:

  • Ted is a Person, being a subtype of Party;
  • Ted has many roles, one of which is Employee;
  • Intel is an organisation, being a subtype of a Party;
  • Intel has many roles, one of which is Employer;
  • Intel employs Ted, meaning there is a relationship between their respective roles.

So there are five tables joined to link Ted to his employer. You assume all employees are Persons (not organisations) and provide this helper view:

CREATE VIEW vw_employee AS SELECT p.title, p.given_names, p.surname, p.date_of_birth, p2.party_name employer_name FROM person p JOIN party py ON py.id = p.id JOIN party_role child ON p.id = child.party_id JOIN party_role_relationship prr ON child.id = prr.child_id AND prr.type = 'EMPLOYMENT' JOIN party_role parent ON parent.id = prr.parent_id = parent.id JOIN party p2 ON parent.party_id = p2.id

And suddenly you have a very simple view of the data you want but on a highly flexible data model.

7. Not sanitizing input

This is a huge one. Now I like PHP but if you don't know what you're doing it's really easy to create sites vulnerable to attack. Nothing sums it up better than the story of little Bobby Tables.

Data provided by the user by way of URLs, form data and cookies should always be treated as hostile and sanitized. Make sure you're getting what you expect.

8. Not using prepared statements

Prepared statements are when you compile a query minus the data used in inserts, updates and WHERE clauses and then supply that later. For example:

SELECT * FROM users WHERE username = 'bob'

vs

SELECT * FROM users WHERE username = ?

or

SELECT * FROM users WHERE username = :username

depending on your platform.

I've seen databases brought to their knees by doing this. Basically, each time any modern database encounters a new query it has to compile it. If it encounters a query it's seen before, you're giving the database the opportunity to cache the compiled query and the execution plan. By doing the query a lot you're giving the database the opportunity to figure that out and optimize accordingly (eg by pinning the compiled query in memory).

Using prepared statements will also give you meaningful statistics about how often certain queries are used.

Prepared statements will also better protect you against SQL injection attacks.

9. Not normalizing enough

Database normalization is basically the process of optimizing database design or how you organize your data into tables.

Just this week I ran across some code where someone had imploded an array and inserted it into a single field in a database. Normalizing that would be to treat element of that array as a separate row in a child table (ie a one-to-many relationship).

This also came up in Best Method for Storing a List of User IDs:

  I've seen in other systems that the list is stored in a serialized PHP array.

But lack of normalization comes in many forms.

More:

  • Normalization: How far is far enough?
  • SQL by Design: Why You Need Database Normalization

10. Normalizing too much

This may seem like a contradiction to the previous point but normalization, like many things, is a tool. It is a means to an end and not an end in and of itself. I think many developers forget this and start treating a “means” as an “end”. Unit testing is a prime example of this.

I once worked on a system that had a huge hierarchy for clients that went something like:

Licensee → Dealer Group → Company → Practice → …

such that you had to join about 11 tables together before you could get any meaningful data. It was a good example of normalization taken too far.

More to the point, careful and considered denormalization can have huge performance benefits but you have to be really careful when doing this.

More:

  • Why too much Database Normalization can be a Bad Thing
  • How far to Take Normalization in Database Design ?
  • When Not to Normalize your SQL Database
  • Maybe Normalizing Isn't Normal
  • The Mother of All Database Normalization Debates on Coding Horror

11. Using exclusive arcs

An exclusive arc is a common mistake where a table is created with two or more foreign keys where one and only one of them can be non-null. Big mistake. For one thing it becomes that much harder to maintain data integrity. After all, even with referential integrity, nothing is preventing two or more of these foreign keys from being set (complex check constraints notwithstanding).

From A Practical Guide to Relational Database Design:

  We have strongly advised against exclusive arc construction wherever possible, for the good reason that they can be awkward to write code and pose more maintenance difficulties.

11. Not doing performance analysis on queries at all

Pragmatism reigns supreme, particularly in the database world. if you're sticking to principles to the point that they've become a dogma then you've quite probably made mistakes. Take the example of the aggregate queries from above. The aggregate version might look “nice” but it's performance is woeful. A performance comparison should've ended the debate (but it didn't) but more to the point: spouting such ill-informed views in the first place is ignorant, even dangerous.

12. Over-reliance on UNION ALL and particularly UNION constructs

A UNION in SQL terms merely concatenates congruent data sets, meaning they have the same type and number of columns. The difference between them is that UNION ALL is a simple concatenation and should be preferred wherever possible whereas a UNION will implicitly do a DISTINCT to remove duplicate tuples.

UNIONs, like DISTINCT, have their place. There are valid applications. But if you find yourself doing a lot of them, particularly in subqueries, then you're probably doing something wrong. That might be a case of poor query construction or a poorly designed data model forcing you to do such things.

UNIONs, particularly when used in joins or dependent subqueries, can cripple a database. Try to avoid them whenever possible.

13. Using OR conditions in queries

This might seem harmless. After all, ANDs are OK. OR should be OK too right? Wrong. Basically an AND condition restricts the data set whereas an OR condition grows it but not in a way that lends itself to optimisation. Particularly when the different OR conditions might intersect thus forcing the optimizer to effectively to a DISTINCT operation on the result.

Bad:

… WHERE a = 2 OR a = 5 OR a = 11

Better:

… WHERE a IN (2, 5, 11)

Now your SQL optimizer may effectively turn the first query into the second. But it might not. Just don't do it.

14. Not designing their data model to lend itself to performant solutions

This is a hard point to quantify. It is typically observed by its effect. If you find yourself writing gnarly queries for relatively simple tasks or that queries for finding out relatively straightforward information is not performant, then you probably have a poor data model.

In some ways this point summarizes all the earlier ones but it's more of a cautionary tale that doing things like query optimisation is often done first when it should be done second. First and foremost you should ensure you have a good data model before trying to optimize the performance. As Knuth said:

  Premature optimization is the root of all evil
db/sql.1279298912.txt.gz · Última modificación: 2020/05/09 09:24 (editor externo)