An essential feature of Arch, and one which is particularly hard for new users to grasp is the id-tagging system. Note that id tags are different from branch tags. This page is all about the former, but often you have to determine what kinds of "tags" are being discussed from the context they're used in.
Arch uses file id-tags to track files over their lifetime. In the CVS world, there is only one way to identify files -- by their name. When a file is added, the command "cvs add" is necessary to tell CVS about the change, and when a file is removed "cvs delete"; true file renaming is not possible. In the Arch world, three different methods of tracking files are available for developers to choose from.
The relevant section in the tutorial is Inventory Ids for Source
Summary of id-tagging methods
Files within a source tree can be simply identified by their name: src/hello.c. However, over the life of a project, a file might be renamed or moved: perhaps from ./hello.c into ./src/. Although the name of the file has changed, it is still the same conceptual entity and we would like to track history across that move. Arch does this by giving each file an "id tag", which can remain the same even when the file is moved.
You can think of this as being a little bit like inode numbers of Unix files: they remain the same when the file is renamed, but change if it's deleted/recreated.
By contrast, CVS always identify files just by their name in a particular revision.
The id-tagging method is how arch associates files with internal identifiers (known as "tags" or "id tags"). By noticing that the filename associated with a particular id-tag has changed (or an id-tag has disappeared, or a new id-tag has appeared), arch can automatically record file renames, deletions, and additions. You can get and set the id-tagging method with "tla id-tagging-method".
The names tagging method is most similar to CVS and Subversion: the id tag of a file is directly determined by its current filename. By definition, renames can never be represented in this method. All the other methods give the file an arbitrary id which remains constant across renames to represent its identity.
Changing the id-tagging method can be quite costly -- it may require all files in the branch to be removed and added again (resulting in a huge changeset), and makes all the files appear to be new, which makes merging with old branches pretty much impossible. Happily, this is only true of some kinds of changes. See How to switch id-tagging methods below for details.
Names
In the "names" id-tagging method, files are identified by their name relative to the project tree root. Renaming is not supported (renaming a file results in a changeset that deletes the file under the old name, and adds it under the new name -- just like renaming in CVS).
Explicit
This is default tagging method for new trees.
Files are identified by explicit id-tags, these are files stored in .arch-ids directories. Explicit id-tags require the use of tla for some operations:
- create
the new file must be id-tagged with "tla add-id".
- move
the file may be moved with "tla mv", if it has an explicit id-tag, it will be moved along. If you move the file by any other mean, you need move the id using "tla move-id".
- delete
the id-tag for the removed file must be deleted with "tla delete-id".
With the tagline tagging-method, files containing a tagline are created, moved and deleted without ever using tla add-id, move-id or delete-id.
Note about directories
Since the id-tag for a directory dir is stored in dir/.arch-ids, no special extra action needs to be done when the directory is moved or deleted. The id-tag is naturally moved or deleted along with the directory.
If you do not use the names id-tagging method, all directories must have explicit id-tags.
Tagline
This tagging-method works by putting the tag in the file content itself. It is not the default because explicit is more familiar to users of other version control systems.
File may be identified by an explicit id-tag or may contain a tagline, that is a line of the form:
<punctuation> arch-tag: <identifier>
For example, for various programming languages:
C
/* arch-tag: <identifier> */
shell
# arch-tag: <identifier>
lisp
;;; arch-tag: <identifier>
Tagline identifiers must be worldwide unique and never change. They should not be in a natural language, look like a CVS substitution, or be anything which one would like to edit. We recommend using the output of uuidgen(1).
Some people also like to put some useful information like creation date, author, ... in their arch-tag. This may give something like James Hacker Thu Mar 18 10:27:13 MET 2004 (hello.c) But this can cause people who later change the file to think that they ought to update the line to correspond to their changes. This would be the wrong thing to do, because then arch would lose the connection between the old and new versions of the file. Also, a search and replace could accidentally match the tag and change it -- people sometimes change their name. For these reasons, we suggest using something that doesn't look editable: like a UUID.
Files which contain a tagline can be created, moved and deleted without using special tla commands. The unique identifiers allows Arch to figure out renames all by itself. Of course, you have to make a new tag when you copy a file containing a tagline, to avoid duplicate tags!
Actually you should not be able to use add-id on files containing a tagline and you cannot use move-id and delete-id on file that you have not previously tagging with add-id. On the other hand, tla mv is safe to use on all files whether or not they are associated to an explicit id-tag.
Files which do not contain a tagline must have an explicit id-tag and are handled as with the explicit tagging method.
Template files
Some files are commonly used as templates: another text file is generated by substituting some parts of the template and leaving the rest untouched. Common examples are configure.in files used by autoconf.
Sometimes, like in the case of configure.in, the template files do not provide a comment syntax that:
- prevents comments from being copied to the output file;
match the <punctuation> pattern of tla.
When you want the generated file to be considered source, you will end up with a id-tag conflict, since both the template and the output will contain the same tagline.
A solution to this problem is to use explicit id-tags for both the template and the output. An alternative is to classify the generated file as precious, using =tagging-method or .arch-inventory, since it's not actually a source file. For generated Makefiles, this will prevent specific local compiler settings from being stored as part of the archive.
Which method to choose
Generally, the right tagging-method to use is tagline.
I'm going to have to disagree. I think that tagline is only good under certain cases, and that explicit (or at least tagline where taglines aren't really used) is the right method. For more info, see Colin Walter's note at the end.
If, for some obscure reason, you decide to use the names tagging method instead, the rest of this section is not relevant to you. Names may be useful if you're using Arch for non-programming storage, as a way of keeping snapshots of a (mostly text) directory.
Since tagline allows (natch, depends on) explicit ids, the explicit tagging-method isn't usually useful.
The implicit tagging-method is deprecated.
The names tagging-method does not handle renaming files.
The main issue is knowing which files must use an explicit id-tag or a tagline.
directories must use explicit id-tags;
binary files must use explicit id-tags;
programming language source files should use taglines.
Administrative files like .arch-inventory, .cvsignore (in case you are working with CVS too and decide to archive these files) may not be renamed, so there is no benefit in using a tagline. They should use an explicit id-tag.
Printable pure text files generally have very conventional names, like ./README, ./INSTALL, ./COPYING and are very unlikely to be moved. In addition, you would like to avoid cluttering them with Arch goop. So explicit id-tagging is a good choice.
(It would be nice if "tla id-tagging-method -H" showed implicit as deprecated.)
Explicit tags create an additional subdirectory and file per source directory, and an additional file per tagged source file. On filesystems that do not handle small files well, this can use a significant amount of disk space -- sometimes as much as 20% more. (See Filesystem Considerations.)
How to switch id-tagging methods
Each id-tagging method produce internal IDs in a different name space. That means that if you switch a file from explicit to tagline id-tagging, the change will be archived as a file deletion under the old ID and a file creation under the new ID. This is annoying because conflicts will occur when applying changesets created from trees using the old ID.
For the same reason, if you switch from names to either explicit or tagline id-tagging method, the created revision changeset will effectively delete and recreate all source files.
Similarly, if you switch from tagline to explicit, you will have to add-id every file which previously used a tagline and commit will record a changeset which effectively deletes and recreates all these files.
However, you can smoothly switch a tree from explicit to tagline id-tagging method. All existing source files will keep their explicit id-tag and you will be able to use taglines for new files.
Optionally, if you want to switch an existing file from explicit to tagline id-tagging, use "tla delete-id" to remove the old explicit id-tag and edit the file to add the new tagline. Once again, that change will be archived as file deletion plus a file creation.
Files classification
Arch asks you to classify your files in different categories. This is done by defining regexps in {arch}/=tagging-method (tree-wide) or .arch-inventory, locally for one directory.
source is the category of files that should be added to the archive. A file matching the source regular expression should have an inventory-id (either explicit or tagline), or tree-lint and commit will complain.
junk is for files you don't care about losing. Note that the current implementation of tla will usually not delete or clobber junk files except for ',,*' files (',*' is hardcoded as junk), but you can't be sure this will still be the case for future versions. A relevant regular expression for junk also allows you to clean up your working directory with tla inventory --junk | xargs rm.
backup is of course well suited for text editor backup files.
precious is for files you don't want to lose, but that won't go in the archive.
unrecognized is a special category for files that should never be in your local tree. You can't commit while one file matching this regexp is in your local tree. One possible use is to make \.rej$ match this regexp, so that you can't commit while you have unresolved conflicts in your tree.
exclude: Files in the exclude category must also belong to another category, different from 'unrecognized'. This just means the file shouldn't be listed by tla inventory, unless the --all option is specified. It is used to hide arch's internal files, but is usually not useful to the user for another purpose. The default regular expression is ^(.arch-ids|\{arch\}|\.arch-inventory)$, to which you must add '.' and '..', which are hardcoded in tla. You can extend it, but not override the hardcoded part.
Category
archive?
copy locally?
never clobbered?
junk
no
no
no
backup
no
no
yes
precious
no
yes
yes
source
yes
yes
yes
Why I Went From Loving Tagline To Disliking It, by Colin Walters
When I first heard of "tagline" tagging, I thought it was extremely cool. It is something that no other revision control system I had used offered. The thought of being able to rename files with just "mv" (or e.g. Nautilus), and have the system notice the rename, seemed very appealing.
The thing to realize though about tagline is there is an implicit downside. That downside is that for medium-sized or larger projects, you are likely not going to be able to use tagline for everything, because you'll have binary files which cannot include tags, or weirder files which don't have a comment syntax, etc. So the price you pay for using tagline is that you can *almost always* rename files using mv. Other times you have to remember to use "tla mv". And it was this fact that made me decide I'd rather use explicit, always. I'd rather always type "tla mv" and have it work 100% of the time, than just use "mv" and have it work 95% of the time. The other annoying thing about tagline is that you often have to work to avoid it being included in generated files.
For smaller projects though, tagline could be very nice.
- On the other hand, when it doesn't work, the downside is not very large. When check your tree before committing you see that the id has got detached, and you can fix it up. (Or maybe one of the arch helper scripts can do this automatically?) So as you say, a choice between something automatic that almost always works, or something that works for everything.
- The downside to mistakes with the tagline method may not be very large but repeated multiple times they could be annoying. If you use the tagline method you have to either (a) memorize which files need to be renamed with "tla mv", (b) just use "tla mv" on everything (which defeats the whole purpose of tagline anyway), or (c) manually fix it up whenever you make a mistake. -a- sounds okay for small projects where almost all files can utilize taglines but it's potentially both error-prone (forcing one to use -c- instead) and time consuming. -c- is what you are arguing for and which sounds fine to me if you have very few files which can't use taglines and which you move around very seldom. However, -b- by far seems the best to me, especially since the benefits of using taglines (i.e. being able to use "mv" instead of "tla mv" in certain cases and not having a file storing the explicit id) seem tiny to me. But maybe that's also because I tend to store many files which aren't programs and thus don't have a "comment" syntax, making both -a- and -c- more painful to me.
- Of course "tla mv" works for tagline files too, so if you include taglines different developers can use what they want.
- The point about allowing other developers who might be working on your project to use what they want is noted, but that's the only reason I can see for enabling taglines. This method would also cause more work as then you have all the extra effort of taglines and you lose all the benefits of using them. (Well, technically there is the side-benefit of not needing the explicit id stored in a separate file but that appears to be of questionable utility to me.)
- The utility of avoiding explicit ids is quite significant on most common filesystems. Adding explicit ids to the Linux kernel source on ext3 takes up 68M. On Reiser3, explicit ids add 4.5M.
I guess it depends on what you feel is significant and how old your hard drive is. Personally, I don't think a source tree that is 68M larger is all that big a deal (yes, it may be in a relative sense since the linux kernel source is 226M, but all the source code combined on my hard disk isn't going to come anywhere close to filling my hard drive up). If the repository were also that much larger then it'd bother me, but all versions/changelogs are stored in the repository as zipped tarballs; so unless I'm mistaken, the size of the repository should not depend much at all on whether explicit ids or taglines are used.
- The utility of avoiding explicit ids is quite significant on most common filesystems. Adding explicit ids to the Linux kernel source on ext3 takes up 68M. On Reiser3, explicit ids add 4.5M.
- The point about allowing other developers who might be working on your project to use what they want is noted, but that's the only reason I can see for enabling taglines. This method would also cause more work as then you have all the extra effort of taglines and you lose all the benefits of using them. (Well, technically there is the side-benefit of not needing the explicit id stored in a separate file but that appears to be of questionable utility to me.)
