An essential feature of Arch, and one which is particularly hard for new users to grasp is the id-tagging system. Note that id tags are different from branch tags. This page is all about the former, but often you have to determine what kinds of "tags" are being discussed from the context they're used in.

Arch uses file id-tags to track files over their lifetime. In the CVS world, there is only one way to identify files -- by their name. When a file is added, the command "cvs add" is necessary to tell CVS about the change, and when a file is removed "cvs delete"; true file renaming is not possible. In the Arch world, three different methods of tracking files are available for developers to choose from.

The relevant section in the tutorial is Inventory Ids for Source

Summary of id-tagging methods

Files within a source tree can be simply identified by their name: src/hello.c. However, over the life of a project, a file might be renamed or moved: perhaps from ./hello.c into ./src/. Although the name of the file has changed, it is still the same conceptual entity and we would like to track history across that move. Arch does this by giving each file an "id tag", which can remain the same even when the file is moved.

You can think of this as being a little bit like inode numbers of Unix files: they remain the same when the file is renamed, but change if it's deleted/recreated.

By contrast, CVS always identify files just by their name in a particular revision.

The id-tagging method is how arch associates files with internal identifiers (known as "tags" or "id tags"). By noticing that the filename associated with a particular id-tag has changed (or an id-tag has disappeared, or a new id-tag has appeared), arch can automatically record file renames, deletions, and additions. You can get and set the id-tagging method with "tla id-tagging-method".

The names tagging method is most similar to CVS and Subversion: the id tag of a file is directly determined by its current filename. By definition, renames can never be represented in this method. All the other methods give the file an arbitrary id which remains constant across renames to represent its identity.

Changing the id-tagging method can be quite costly -- it may require all files in the branch to be removed and added again (resulting in a huge changeset), and makes all the files appear to be new, which makes merging with old branches pretty much impossible. Happily, this is only true of some kinds of changes. See How to switch id-tagging methods below for details.

Names

In the "names" id-tagging method, files are identified by their name relative to the project tree root. Renaming is not supported (renaming a file results in a changeset that deletes the file under the old name, and adds it under the new name -- just like renaming in CVS).

Explicit

This is default tagging method for new trees.

Files are identified by explicit id-tags, these are files stored in .arch-ids directories. Explicit id-tags require the use of tla for some operations:

create

the new file must be id-tagged with "tla add-id".

move

the file may be moved with "tla mv", if it has an explicit id-tag, it will be moved along. If you move the file by any other mean, you need move the id using "tla move-id".

delete

the id-tag for the removed file must be deleted with "tla delete-id".

With the tagline tagging-method, files containing a tagline are created, moved and deleted without ever using tla add-id, move-id or delete-id.

Note about directories

Since the id-tag for a directory dir is stored in dir/.arch-ids, no special extra action needs to be done when the directory is moved or deleted. The id-tag is naturally moved or deleted along with the directory.

If you do not use the names id-tagging method, all directories must have explicit id-tags.

Tagline

This tagging-method works by putting the tag in the file content itself. It is not the default because explicit is more familiar to users of other version control systems.

File may be identified by an explicit id-tag or may contain a tagline, that is a line of the form:

<punctuation> arch-tag: <identifier>

For example, for various programming languages:

Tagline identifiers must be worldwide unique and never change. They should not be in a natural language, look like a CVS substitution, or be anything which one would like to edit. We recommend using the output of uuidgen(1).

Some people also like to put some useful information like creation date, author, ... in their arch-tag. This may give something like James Hacker Thu Mar 18 10:27:13 MET 2004 (hello.c) But this can cause people who later change the file to think that they ought to update the line to correspond to their changes. This would be the wrong thing to do, because then arch would lose the connection between the old and new versions of the file. Also, a search and replace could accidentally match the tag and change it -- people sometimes change their name. For these reasons, we suggest using something that doesn't look editable: like a UUID.

Files which contain a tagline can be created, moved and deleted without using special tla commands. The unique identifiers allows Arch to figure out renames all by itself. Of course, you have to make a new tag when you copy a file containing a tagline, to avoid duplicate tags!

Actually you should not be able to use add-id on files containing a tagline and you cannot use move-id and delete-id on file that you have not previously tagging with add-id. On the other hand, tla mv is safe to use on all files whether or not they are associated to an explicit id-tag.

Files which do not contain a tagline must have an explicit id-tag and are handled as with the explicit tagging method.

Template files

Some files are commonly used as templates: another text file is generated by substituting some parts of the template and leaving the rest untouched. Common examples are configure.in files used by autoconf.

Sometimes, like in the case of configure.in, the template files do not provide a comment syntax that:

When you want the generated file to be considered source, you will end up with a id-tag conflict, since both the template and the output will contain the same tagline.

A solution to this problem is to use explicit id-tags for both the template and the output. An alternative is to classify the generated file as precious, using =tagging-method or .arch-inventory, since it's not actually a source file. For generated Makefiles, this will prevent specific local compiler settings from being stored as part of the archive.

Which method to choose

Generally, the right tagging-method to use is tagline.

If, for some obscure reason, you decide to use the names tagging method instead, the rest of this section is not relevant to you. Names may be useful if you're using Arch for non-programming storage, as a way of keeping snapshots of a (mostly text) directory.

The main issue is knowing which files must use an explicit id-tag or a tagline.

Administrative files like .arch-inventory, .cvsignore (in case you are working with CVS too and decide to archive these files) may not be renamed, so there is no benefit in using a tagline. They should use an explicit id-tag.

Printable pure text files generally have very conventional names, like ./README, ./INSTALL, ./COPYING and are very unlikely to be moved. In addition, you would like to avoid cluttering them with Arch goop. So explicit id-tagging is a good choice.

(It would be nice if "tla id-tagging-method -H" showed implicit as deprecated.)

Explicit tags create an additional subdirectory and file per source directory, and an additional file per tagged source file. On filesystems that do not handle small files well, this can use a significant amount of disk space -- sometimes as much as 20% more. (See Filesystem Considerations.)

How to switch id-tagging methods

Each id-tagging method produce internal IDs in a different name space. That means that if you switch a file from explicit to tagline id-tagging, the change will be archived as a file deletion under the old ID and a file creation under the new ID. This is annoying because conflicts will occur when applying changesets created from trees using the old ID.

For the same reason, if you switch from names to either explicit or tagline id-tagging method, the created revision changeset will effectively delete and recreate all source files.

Similarly, if you switch from tagline to explicit, you will have to add-id every file which previously used a tagline and commit will record a changeset which effectively deletes and recreates all these files.

However, you can smoothly switch a tree from explicit to tagline id-tagging method. All existing source files will keep their explicit id-tag and you will be able to use taglines for new files.

Optionally, if you want to switch an existing file from explicit to tagline id-tagging, use "tla delete-id" to remove the old explicit id-tag and edit the file to add the new tagline. Once again, that change will be archived as file deletion plus a file creation.

Files classification

Arch asks you to classify your files in different categories. This is done by defining regexps in {arch}/=tagging-method (tree-wide) or .arch-inventory, locally for one directory.




Why I Went From Loving Tagline To Disliking It, by Colin Walters

When I first heard of "tagline" tagging, I thought it was extremely cool. It is something that no other revision control system I had used offered. The thought of being able to rename files with just "mv" (or e.g. Nautilus), and have the system notice the rename, seemed very appealing.

The thing to realize though about tagline is there is an implicit downside. That downside is that for medium-sized or larger projects, you are likely not going to be able to use tagline for everything, because you'll have binary files which cannot include tags, or weirder files which don't have a comment syntax, etc. So the price you pay for using tagline is that you can *almost always* rename files using mv. Other times you have to remember to use "tla mv". And it was this fact that made me decide I'd rather use explicit, always. I'd rather always type "tla mv" and have it work 100% of the time, than just use "mv" and have it work 95% of the time. The other annoying thing about tagline is that you often have to work to avoid it being included in generated files.

For smaller projects though, tagline could be very nice.

ID-tagging methods (last edited 2008-07-22 13:25:25 by 64)