A common pattern I still haven't solved is this: I have table A, and now I want to add table B and give it one row for each row in table A. Then I want to add a foreign key column to table A pointing at the row I just added.
Of course this is easy to do with a loop, but can you do it all in pure SQL? If you have a solution I would love to see it.
Here is my history of attempts:
DROP TABLE IF EXISTS dogs;
DROP TABLE IF EXISTS doghouses;
DROP SEQUENCE IF EXISTS dogs_id_seq;
DROP SEQUENCE IF EXISTS doghouses_id_seq;
BEGIN;
CREATE SEQUENCE dogs_id_seq;
CREATE SEQUENCE doghouses_id_seq;
CREATE TABLE dogs (
id INTEGER PRIMARY KEY DEFAULT nextval('dogs_id_seq'),
name TEXT NOT NULL
);
INSERT INTO dogs
(name)
VALUES
('Sparky'),
('Spot')
;
-- Now we want to give each dog a doghouse:
CREATE TABLE doghouses (
id INTEGER PRIMARY KEY DEFAULT nextval('doghouses_id_seq'),
name TEXT NOT NULL
);
ALTER TABLE dogs ADD COLUMN doghouse_id INTEGER REFERENCES doghouses (id);
/*
-- ERROR: syntax error at or near "INTO"
UPDATE dogs AS d
SET doghouse_id = (
INSERT INTO doghouses
(name) VALUES (d.name)
RETURNING id
)
;
*/
/*
-- ERROR: WITH clause containing a data-modifying statement must be at the top level
UPDATE dogs AS d
SET doghouse_id = (
WITH x AS (
INSERT INTO doghouses
(name) VALUES (d.name)
RETURNING id
) SELECT * FROM x
)
;
*/
/*
-- ERROR: missing FROM-clause entry for table "dogs"
WITH homes AS (
INSERT INTO doghouses
(name)
SELECT name
FROM dogs
RETURNING doghouses.id AS doghouse_id, dogs.id AS dog_id
)
UPDATE dogs
SET doghouse_id = homes.doghouse_id
FROM homes
WHERE dogs.id = homes.dog_id
;
*/
-- ERROR: syntax error at or near "INTO"
UPDATE dogs AS d1
SET doghouse_id = h.id
FROM dogs d2
INNER JOIN LATERAL (
INSERT INTO doghouses
(name) VALUES (d2.name)
RETURNING id
)
ON true
WHERE d1.id = d2.id
;
COMMIT;
This script works in SQL Server - I would think you could do the same in Postgres, but I'm not as familiar as I'd like to be with Postgres.
Note that it does require 2 separate steps rather than 1 as you appear to desire, so may not work for you:
IF OBJECT_ID('dbo.dogs', 'U') IS NOT NULL
DROP TABLE dbo.dogs
GO
IF OBJECT_ID('dbo.doghouses', 'U') IS NOT NULL
DROP TABLE dbo.doghouses
GO
CREATE TABLE dogs (
id INTEGER IDENTITY (1, 1) PRIMARY KEY
,name VARCHAR(MAX) NOT NULL
);
-- Now we want to give each dog a doghouse:
CREATE TABLE doghouses (
id INTEGER IDENTITY (1, 1) PRIMARY KEY
,name VARCHAR(MAX) NOT NULL
);
GO
ALTER TABLE dogs ADD doghouse_id INTEGER REFERENCES doghouses (id);
GO
INSERT INTO dogs (name)
VALUES
('Sparky'),
('Spot')
;
DECLARE @DogsAndHouses TABLE (
doghouse_id INT
,name VARCHAR(MAX)
);
INSERT INTO dbo.doghouses (name)
OUTPUT INSERTED.id, INSERTED.name INTO @DogsAndHouses
SELECT
d.name
FROM
dbo.dogs d;
UPDATE d
SET
doghouse_id = dah.doghouse_id
FROM
dogs d
JOIN @DogsAndHouses dah ON d.name = dah.name;
GO
Yup, the OUTPUT keyword is the key to making this work. Before this was added to TSQL, the only way I knew of to make this work was by using a cursor and handling each record independently.
You could add a temporary column to Table B that stores the primary key of column A during the insert, then a 2nd query can insert the primary key of Table B into the corresponding rows of Table A by matching against the temporary column in B.
For example:
INSERT INTO tableB (whatever, tempIdFromTableA) SELECT whatever, id FROM tableA;
UPDATE tableA SET yourForeignKeyColumn = (SELECT id FROM tableB WHERE tempIdFromTableA=tableA.id)
... or instead of that update you could do it via a JOIN:
UPDATE tableA INNER JOIN tableB ON tableA.id=tableB.tempIdFromTableA SET tableA.yourForeignKeyColumn=tableB.id;
Or you could use a temp table to achieve similar results (if you don't want to have to add and then remove the temp column from table B).
Your examples are confusing and I'm still not sure what you want to achieve.
From the description it sounds like you want to have two tables with 1:1 relationship (every dog has a single doghouse). Although this is different than typical 1:1, because both entries supposed to always match. In that scenario you should actually have a single table with two columns (dog name and doghouse name). It'll also be more performant (no need to use joins to fetch the data). There's no benefit to have two tables, especially when you're duplicating the data (doghouse.name = dog.name).
If every dog has to have dog house, and no dogs can share one, then this information is useless and there's no point for storing it.
If for example there are dogs without a doghouse, you can still use one table and have a column with boolean field to store that information.
If multiple dogs can share same doghouse, then you have a foreign key for dog, kind of the way you set it up, but you can't blindly insert data to two tables, because you still need to know which doghouse is shared by which dogs.
If you have multiple dogs and multiple dog houses and need to match one house with one dog, you use 3 tables (one for dogs, one for doghouses, and the third one containing primary keys for both of them that performs the matching. In that scenario you first need to populate dogs and doghouses tables and then you do the matching, so once again you shouldn't insert all data at the same time. This pattern also allows you to easily change which dogs reside in which doghouses.
If you absolutely need to insert to two tables at the same time (your example doesn't show justification for that) then you can do two inserts within a single transaction. You can wrap it in stored procedure or use triggers.
Predictably you're criticizing the question rather than answering it. That's boring. Dogs and doghouses is supposed to be sufficiently contrived that you don't get hung up on whether it's the correct model.
Although initially I'm creating one doghouse for each dog, the model will let dogs start to share doghouses. It's a one-to-many relationship, but to migrate the old data I need to do something, so I'm starting out by giving each dog its own doghouse.
Of course I can use stored procedures, but I'm asking if there is any way to avoid that.
"you can't blindly insert data to two tables, because you still need to know which doghouse is shared by which dogs." That is indeed the crux of the question. :-)
Of course this is easy to do with a loop, but can you do it all in pure SQL? If you have a solution I would love to see it.
Here is my history of attempts: