[C/R v20][PATCH 38/96] c/r: dump open file descriptors

Daniel Lezcano daniel.lezcano at free.fr
Mon Mar 22 01:40:32 PDT 2010


Oren Laadan wrote:
>
>
> Daniel Lezcano wrote:
>> Serge E. Hallyn wrote:
>>> Quoting Jamie Lokier (jamie at shareable.org):
>>>  
>>>> Matt Helsley wrote:
>>>>    
>>>>>> That said, if the intent is to allow the restore to be done on
>>>>>> another node with a "similar" filesystem (e.g. created by rsync/node
>>>>>> image), instead of having a coherent distributed filesystem on all
>>>>>> of the nodes then the filename makes sense.
>>>>>>         
>>>>> Yes, this is the intent.
>>>>>       
>>>> I would worry about programs which are using files which have been
>>>> deleted, renamed, or (very common) renamed-over by another process
>>>> after being opened, as there's a good chance they will successfully
>>>> open the wrong file after c/r, and corrupt state from then on.
>>>>     
>>> Userspace is expected to back up and restore the filesystem, for
>>> instance using a btrfs snapshot or a simple rsync or tar.
>>>
>>>   
>> That does not solve the problem Jamie is talking about.
>> A rsync or a tar will not see a deleted file and using a btrfs to 
>> have the CR to work with the deleted files is a bit overkill, no ?
>
> Let's separate the issues of file system snapshot and deleted files.
>
> 1) File system snapshot:
> ------------------------
> The requirement is to preserve the file system state between the time
> of the checkpoint and the time of the restart, because userspace will
> expect it to remain the same.
>
> The alternatives are:
>
> a) Use capable file system, like brfs, or (modified) nilfs.
>
> b) Userspace saves the state e.g. w/ tar or rsync (maybe incremental)
>
> c) Assume/expect that the file system isn't modified between checkpoint
> and restart (e.g. if we use c/r to suspend a user's session)
>
> d) Expect userspace to adapt to changes if they occur, e.g. by having
> the application be aware of the possibility, or by providing a wrapper
> that will do some magic prior to restart (by looking at the checkpoint
> image).
>
> Options a,b,c are all transparent to the application, while option
> d required that applications become aware of c/r. That's ok, but our
> primary goal is to be generic enough to unmodified applications.
>
> 2) Deleted files:
> -----------------
> The requirement is that at restart we'll be able to restore the file
> point in the kernel to a deleted file with same properties and contents
> as it was at the time of the checkpoint.
>
> The alternatives we considered are:
>
> e) For each deleted file, save the contents of that file as part of
> the checkpoint image;
> At restart - create a new file, populate with the contents, open it
> (to get an active file pointer), and finally unlink it, so it is -
> again - deleted.
>
> f) At checkpoint time, create a file (from scratch) in a dedicated
> area of the file system (userspace configurable?), and copy the
> contents of the deleted file to this file. Only save the file system
> state after this is done.
> At restart, open the alternative file instead, and then immediately
> delete it.
>
> g) At checkpoint time, re-link the file to a dedicated area of the
> file system. This requires support from the underlying file system,
> of course. For instance, it's trivial for ext2,3 but IIRC will need
> help for ext4. Re-linking is essentially attaching a new filename
> to an existing inode that is still referenced but is otherwise not
> reachable - and make it reachable again.
> At restart, open the re-linked file and then immediately delete it.
>
>> I have another question about the deleted files. How is handled the 
>> case when a process has a deleted mapped file but without an 
>> associated file descriptor ?
>>
>
> It works the same as with non-deleted files (assuming that we know
> how to handle delete files in general, e.g. options e,d,f above):
>
> To checkpoint a task's mm we loop through the vma's and checkpoint
> them. For a vma that corresponds to a mapped file, we first save
> the vma->vm_file. In turn, for a file pointer we save the filename,
> properties, credentials. A file pointer is saved as an independent
> object - and is assigned a unique id - objref. The state of the vma
> will indicate indicate this objref.
>
> At restart, we will first see the file pointer object, and will
> open the file to create a corresponding file pointer. Later when
> we restore the vma, we'll locate the (new) file pointer using the
> objref and use it in mmap.
>
> Oren.
>

Thanks Oren for the detailed answer.


More information about the Containers mailing list