Jeffrey Yasskin’s blog

3/28/2007

Why move and copy are distinct, non-primitive revision control operations.

Filed under: General — Jeffrey Yasskin @ 3:47 pm

Perforce implements move in terms of p4 integrate (basically copy) + p4 delete. Darcs doesn’t implement copy at all. Both are wrong.

Implementing move in terms of copy mostly works, so it seems like the obvious choice, but it runs into problems when someone is editing a file that someone else is moving. Two different things go wrong depending on who commits their changes first.

Mover submits first
The editor syncs and discovers that the file has been deleted. Generally there is no indication of where the file moved, so the editor has to look through that file’s history for a likely copy operation to find out where to make their changes. Then they have to manually merge their edits into that new file since the RCS has no idea that the file was logically moved.
Editor submits first
This is even worse since the RCS may not even consider a delete on top of an edit a conflict. It’s very easy to lose the last couple of edits to the moved file. Even if the mover notices, it’s painful to manually merge those edits into their copy, and the RCS winds up thinking they made the changes.

Implementing copy in terms of move is impossible (just ask a quantum physicist), but who copies code anyway? We branch repositories, but that can be handled as its own operation (and clearly was in Perforce, given the name of the copy command). Maybe we don’t need copy at all. I mainly use copy for splitting a file in half. I want to retain the version control history from the original file in both halves, so I copy one to the other and then delete large chunks from both. The inverse of this, joining two files into one, isn’t supported by any RCS I know of. So instead of implementing copy as a primitive, I think that split and join deserve to be primitives. As a bonus, you can implement both copy and move in terms of split.

What information do you need to do split right? When you split a source file A into target files B and C (split(A=>B,C)) (where B or C may ==A), each line from A may go to B, C, or both. Then most of the time, you’ll make some edits to both new files. We need an algorithm to guess the split and edits from the contents of A, B, and C, but I haven’t written it. I’m not sure whether “neither” should be a possible destination for a line from A: clearly you do sometimes delete a line in a split, but the goal here is to make it easy to merge edits of those deleted lines, and to do that, it probably helps to have them assigned to one or both of the new files. Merging an edit with a split looks easy: apply an edit line-wise to the file(s) now containing that line. An edit that crosses a “boundary”—somewhere the split changes targets—should probably be considered a conflict.

A move is implemented as a split with all lines going to one target, with the empty target/source deleted. A copy is implemented as a split with all lines going to both targets. A nice side-effect of this definition is that an edit is merged with a copy by applying it to both copies, not just the original as perforce does.

I use join(A,B=>C) much less often than split, but I think it’s necessary in order to roll back splits. It needs to assign a destination in C to every line in A and B. Edits to A and B are then applied to the appropriate location in C. Again, the trick will be in the algorithm to derive a join from three files.

No Comments »

No comments yet.

RSS feed for comments on this post. TrackBack URI

Leave a comment

Powered by WordPress