Azure Functions: backup your OneDrive at rocket speed!

Hello,

OneDrive is a great tool and I’m for sure not somebody that needs to be convinced about that.

Anyway, as Office 365 admin for instance, when your end-user leaves the company, for some company policies you might need to keep the data for a long term, like years…

When on premise it was easy take the storage and done, but what about if you’re in Office 365?

Azure Functions to the rescue !

In this post I will teach you how to create very performant Azure functions that will backup your OneDrive for Business content to an Azure Blob Storage.

I’m assuming that you know :

  • How CSOM is working
  • What is an Azure Storage Account , how to create it and manage it (blob & queue)
  • How to configure with the UI an Azure Functions

First we might have to deal with huge OneDrive, a lot of files and a big capacity. Unfortunately if we use Azure Functions, we will have to deal with timeouts :

  • 5 minutes for auto scaled plan
  • 7 minutes for “dedicated” plan

With this solution I was able to back up a OneDrive of 12.15Gb – 13479 files in +- 30 min.

What to do ?

I have split into 3 functions and 3 queues :

  • InitiationFunction triggered by a queue
  • DiscoveryFunction triggered by a queue
  • CopyFunction triggered by a queue with an output in a blob storage

It should like this :

blogpost

Let’s go deeper in the code

In initiationQueue, we will put the URL of a OneDrive like https://tenant-my.sharepoint.com/personal/first_last_tenant_com. This URL will be used to build our ClientContext object.

Then the InitiationFunction will look like this :


using System;
using Microsoft.Online.SharePoint.TenantAdministration;
using Microsoft.SharePoint.Client;
using System.Security;
using Microsoft.WindowsAzure.Storage;
using Microsoft.WindowsAzure.Storage.Blob;

public static void Run(string myQueueItem, ICollector<string> myOutputQueueItems, TraceWriter log)
{
	log.Info($"C# Queue trigger function processed: {myQueueItem}");
	ListItemCollectionPosition position = null;
	using (ClientContext clientContext = new ClientContext(myQueueItem))
	{
		SecureString ss = new SecureString();
		foreach (char c in "MYPWD") { ss.AppendChar(c); }

		log.Info($"get doc lib info");
		clientContext.Credentials = new SharePointOnlineCredentials("MYACCOUNT", ss);
		List migrateLib = clientContext.Web.Lists.GetByTitle("Documents");
		clientContext.Load(migrateLib);
		clientContext.ExecuteQuery();       

		DateTime d = DateTime.Now;
		string containerName = d.Day + "-" + d.Month + "-"+d.Year + "-"+d.Hour+"-"+d.Minute+"-"+d.Second + "-" + myQueueItem.Replace("https://MYTENANT-my.sharepoint.com/personal/", "").Replace("_","-");
		myOutputQueueItems.Add(containerName+";null;"+myQueueItem);
		while (true)
		{
			log.Info($"retrieve items position");
			//Create the query to retrieve items
			CamlQuery cq = CamlQuery.CreateAllItemsQuery();
			cq.ViewXml = @"
							<View Scope='RecursiveAll'>
								<Query>
								</Query>
							  <RowLimit Paged='TRUE'>4000</RowLimit>
						   </View>";

			//set up the position of the iteration
			if (position != null)
				cq.ListItemCollectionPosition = position;
			ListItemCollection AllItems = migrateLib.GetItems(cq);
			clientContext.Load(AllItems, Items => Items.ListItemCollectionPosition);
			clientContext.ExecuteQuery();

			//update the position of the iteration
			position = AllItems.ListItemCollectionPosition;
			if(position != null)
				myOutputQueueItems.Add(containerName+";"+position.PagingInfo+";"+myQueueItem);

			if (position == null)
				break;
		}
	}
}

As you can see I’m performing a CamlQuery with 4000 RowLimit value (to speed up and avoid the timeout). I will then push to the next queue : DiscoveryPositionQueue, the following type of message : ContainerName;PositionPagingInfo;ODUrl

We’re defining the container name already now, because we will have to push all the OneDrive content in the same blob storage container. So we will have to re-use it later on. I have formatted the container name like “day-month-year-hour-minute-second-email”, but up to you to format it as you wish.

The next function (DiscoveryFunction) will be triggered by the message in the queue and will look like this :

using System;
using Microsoft.Online.SharePoint.TenantAdministration;
using Microsoft.SharePoint.Client;
using System.Security;
using Microsoft.WindowsAzure.Storage;

public static void Run(string myQueueItem,  ICollector<string> myOutputQueueItems, TraceWriter log)
{
    log.Info($"C# Queue trigger function processed: {myQueueItem}");

    string[] myQueueItemSplit = myQueueItem.Split(';');
    string containerName = myQueueItemSplit[0];
    string positionPaged = myQueueItemSplit[1];
    string urlOneDrive = myQueueItemSplit[2];

    ListItemCollectionPosition position = new ListItemCollectionPosition();
    position.PagingInfo = positionPaged;
    using (ClientContext clientContext = new ClientContext(urlOneDrive))
    {
        SecureString ss = new SecureString();
        foreach (char c in "MYPWD") { ss.AppendChar(c); }

        log.Info($"get doc lib info");
        clientContext.Credentials = new SharePointOnlineCredentials("YOURACCOUNT", ss);
        List migrateLib = clientContext.Web.Lists.GetByTitle("Documents");
        clientContext.Load(migrateLib);
        clientContext.ExecuteQuery();                

        log.Info($"retrieve items");
        //Create the query to retrieve items
        CamlQuery cq = CamlQuery.CreateAllItemsQuery();
        cq.ViewXml = @"
                        <View Scope='RecursiveAll'>
                            <Query>
                            </Query>
                            <RowLimit Paged='TRUE'>4000</RowLimit>
                        </View>";

        //set up the position of the iteration
        cq.ListItemCollectionPosition = position;
        ListItemCollection AllItems = migrateLib.GetItems(cq);
        clientContext.Load(AllItems, Items => Items.Include(i => i.Id), Items => Items.ListItemCollectionPosition);
        clientContext.ExecuteQuery();

        foreach (ListItem item in AllItems)
        {
            log.Info($"preparing copy an item...");
            string outString = containerName+";"+item.Id+";"+urlOneDrive;
            log.Info($"Sending to queue item "+item.Id);
            myOutputQueueItems.Add(outString);
        }

    }
}

I’m now taking all the item ID’s contained in the position-paging-info I have retrieved. It is of course important to use the same RowLimit’s value

Then this function will push to the queue ItemsToCopy the following type of message :

ContainerName;ItemID;ODUrl

The last function is CopyFunction and is triggered by this message and will look like this :

using System;
using Microsoft.Online.SharePoint.TenantAdministration;
using Microsoft.SharePoint.Client;
using System.Security;
using Microsoft.WindowsAzure.Storage;
using Microsoft.WindowsAzure.Storage.Blob;

public static void Run(string myQueueItem, TraceWriter log)
{
    log.Info($"preparing copy an item...");

    string[] myQueueItemSplit = myQueueItem.Split(';');
    string containerName = myQueueItemSplit[0];
    string itemID = myQueueItemSplit[1];
    string urlOneDrive = myQueueItemSplit[2];

    // Retrieve storage account from connection string.
    CloudStorageAccount storageAccount = CloudStorageAccount.Parse("CONNECTIONSTRINGHERE");

    // Create the blob client.
    CloudBlobClient blobClient = storageAccount.CreateCloudBlobClient();

    // Retrieve a reference to a container.
    CloudBlobContainer container = blobClient.GetContainerReference(containerName);

    // Create the container if it doesn't already exist.
    container.CreateIfNotExists();

    string fname = null;
    using (ClientContext ctx2 = new ClientContext(urlOneDrive))
    {
        SecureString ss = new SecureString();
        foreach (char c in "YOURPWD") { ss.AppendChar(c); }
        ctx2.Credentials = new SharePointOnlineCredentials("YOURACCOUNT", ss);

        ListItem TargetItem = ctx2.Web.Lists.GetByTitle("Documents").GetItemById(itemID);
        ctx2.Load(ctx2.Web);
        ctx2.ExecuteQuery();
        ctx2.Load(TargetItem);
        ctx2.Load(TargetItem.File);
        ctx2.Load(TargetItem.ContentType);
        ctx2.ExecuteQuery();                      

		//Check this is not a folder (we save only files)
        if (TargetItem.ContentType.Name != "Folder")
        {
            log.Info($"Not a folder");

            fname = string.Concat(
                TargetItem["FileDirRef"].ToString().ToLower().Replace(urlOneDrive.Remove(0, urlOneDrive.IndexOf("/personal")), ""), "/",
            TargetItem.File.Name.ToLower());

            log.Info($"copy the item...");

			//Take the content from OneDrive
            var content = Microsoft.SharePoint.Client.File.OpenBinaryDirect(ctx2, TargetItem.File.ServerRelativeUrl);

			//Upload the content to the blob
            CloudBlockBlob blockBlob = container.GetBlockBlobReference(fname);
            blockBlob.UploadFromStream(content.Stream);
        }
    }
}

This function will then take the item, check if it is a folder or not based on the content type and copy its content to the blob storage.

I didn’t use the output blob storage from the Azure Function in order to use the super method : UploadFromStream. This is why we have to retrieve all the information about our Azure storage in the function.

The OneDrive will be saved and we dealed with Azure Function’s timeouts.

Advertisements
This entry was posted in Azure, C# Solutions, OneDrive and tagged , , , . Bookmark the permalink.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s