Hello,
OneDrive is a great tool and I’m for sure not somebody that needs to be convinced about that.
Anyway, as Office 365 admin for instance, when your end-user leaves the company, for some company policies you might need to keep the data for a long term, like years…
When on premise it was easy take the storage and done, but what about if you’re in Office 365?
Azure Functions to the rescue !
In this post I will teach you how to create very performant Azure functions that will backup your OneDrive for Business content to an Azure Blob Storage.
I’m assuming that you know :
- How CSOM is working
- What is an Azure Storage Account , how to create it and manage it (blob & queue)
- How to configure with the UI an Azure Functions
First we might have to deal with huge OneDrive, a lot of files and a big capacity. Unfortunately if we use Azure Functions, we will have to deal with timeouts :
- 5 minutes for auto scaled plan
- 7 minutes for “dedicated” plan
With this solution I was able to back up a OneDrive of 12.15Gb – 13479 files in +- 30 min.
What to do ?
I have split into 3 functions and 3 queues :
- InitiationFunction triggered by a queue
- DiscoveryFunction triggered by a queue
- CopyFunction triggered by a queue with an output in a blob storage
It should like this :
Let’s go deeper in the code
In initiationQueue, we will put the URL of a OneDrive like https://tenant-my.sharepoint.com/personal/first_last_tenant_com. This URL will be used to build our ClientContext object.
Then the InitiationFunction will look like this :
using System; using Microsoft.Online.SharePoint.TenantAdministration; using Microsoft.SharePoint.Client; using System.Security; using Microsoft.WindowsAzure.Storage; using Microsoft.WindowsAzure.Storage.Blob; public static void Run(string myQueueItem, ICollector<string> myOutputQueueItems, TraceWriter log) { log.Info($"C# Queue trigger function processed: {myQueueItem}"); ListItemCollectionPosition position = null; using (ClientContext clientContext = new ClientContext(myQueueItem)) { SecureString ss = new SecureString(); foreach (char c in "MYPWD") { ss.AppendChar(c); } log.Info($"get doc lib info"); clientContext.Credentials = new SharePointOnlineCredentials("MYACCOUNT", ss); List migrateLib = clientContext.Web.Lists.GetByTitle("Documents"); clientContext.Load(migrateLib); clientContext.ExecuteQuery(); DateTime d = DateTime.Now; string containerName = d.Day + "-" + d.Month + "-"+d.Year + "-"+d.Hour+"-"+d.Minute+"-"+d.Second + "-" + myQueueItem.Replace("https://MYTENANT-my.sharepoint.com/personal/", "").Replace("_","-"); myOutputQueueItems.Add(containerName+";null;"+myQueueItem); while (true) { log.Info($"retrieve items position"); //Create the query to retrieve items CamlQuery cq = CamlQuery.CreateAllItemsQuery(); cq.ViewXml = @" <View Scope='RecursiveAll'> <Query> </Query> <RowLimit Paged='TRUE'>4000</RowLimit> </View>"; //set up the position of the iteration if (position != null) cq.ListItemCollectionPosition = position; ListItemCollection AllItems = migrateLib.GetItems(cq); clientContext.Load(AllItems, Items => Items.ListItemCollectionPosition); clientContext.ExecuteQuery(); //update the position of the iteration position = AllItems.ListItemCollectionPosition; if(position != null) myOutputQueueItems.Add(containerName+";"+position.PagingInfo+";"+myQueueItem); if (position == null) break; } } }
As you can see I’m performing a CamlQuery with 4000 RowLimit value (to speed up and avoid the timeout). I will then push to the next queue : DiscoveryPositionQueue, the following type of message : ContainerName;PositionPagingInfo;ODUrl
We’re defining the container name already now, because we will have to push all the OneDrive content in the same blob storage container. So we will have to re-use it later on. I have formatted the container name like “day-month-year-hour-minute-second-email”, but up to you to format it as you wish.
The next function (DiscoveryFunction) will be triggered by the message in the queue and will look like this :
using System; using Microsoft.Online.SharePoint.TenantAdministration; using Microsoft.SharePoint.Client; using System.Security; using Microsoft.WindowsAzure.Storage; public static void Run(string myQueueItem, ICollector<string> myOutputQueueItems, TraceWriter log) { log.Info($"C# Queue trigger function processed: {myQueueItem}"); string[] myQueueItemSplit = myQueueItem.Split(';'); string containerName = myQueueItemSplit[0]; string positionPaged = myQueueItemSplit[1]; string urlOneDrive = myQueueItemSplit[2]; ListItemCollectionPosition position = new ListItemCollectionPosition(); position.PagingInfo = positionPaged; using (ClientContext clientContext = new ClientContext(urlOneDrive)) { SecureString ss = new SecureString(); foreach (char c in "MYPWD") { ss.AppendChar(c); } log.Info($"get doc lib info"); clientContext.Credentials = new SharePointOnlineCredentials("YOURACCOUNT", ss); List migrateLib = clientContext.Web.Lists.GetByTitle("Documents"); clientContext.Load(migrateLib); clientContext.ExecuteQuery(); log.Info($"retrieve items"); //Create the query to retrieve items CamlQuery cq = CamlQuery.CreateAllItemsQuery(); cq.ViewXml = @" <View Scope='RecursiveAll'> <Query> </Query> <RowLimit Paged='TRUE'>4000</RowLimit> </View>"; //set up the position of the iteration cq.ListItemCollectionPosition = position; ListItemCollection AllItems = migrateLib.GetItems(cq); clientContext.Load(AllItems, Items => Items.Include(i => i.Id), Items => Items.ListItemCollectionPosition); clientContext.ExecuteQuery(); foreach (ListItem item in AllItems) { log.Info($"preparing copy an item..."); string outString = containerName+";"+item.Id+";"+urlOneDrive; log.Info($"Sending to queue item "+item.Id); myOutputQueueItems.Add(outString); } } }
I’m now taking all the item ID’s contained in the position-paging-info I have retrieved. It is of course important to use the same RowLimit’s value
Then this function will push to the queue ItemsToCopy the following type of message :
ContainerName;ItemID;ODUrl
The last function is CopyFunction and is triggered by this message and will look like this :
using System; using Microsoft.Online.SharePoint.TenantAdministration; using Microsoft.SharePoint.Client; using System.Security; using Microsoft.WindowsAzure.Storage; using Microsoft.WindowsAzure.Storage.Blob; public static void Run(string myQueueItem, TraceWriter log) { log.Info($"preparing copy an item..."); string[] myQueueItemSplit = myQueueItem.Split(';'); string containerName = myQueueItemSplit[0]; string itemID = myQueueItemSplit[1]; string urlOneDrive = myQueueItemSplit[2]; // Retrieve storage account from connection string. CloudStorageAccount storageAccount = CloudStorageAccount.Parse("CONNECTIONSTRINGHERE"); // Create the blob client. CloudBlobClient blobClient = storageAccount.CreateCloudBlobClient(); // Retrieve a reference to a container. CloudBlobContainer container = blobClient.GetContainerReference(containerName); // Create the container if it doesn't already exist. container.CreateIfNotExists(); string fname = null; using (ClientContext ctx2 = new ClientContext(urlOneDrive)) { SecureString ss = new SecureString(); foreach (char c in "YOURPWD") { ss.AppendChar(c); } ctx2.Credentials = new SharePointOnlineCredentials("YOURACCOUNT", ss); ListItem TargetItem = ctx2.Web.Lists.GetByTitle("Documents").GetItemById(itemID); ctx2.Load(ctx2.Web); ctx2.ExecuteQuery(); ctx2.Load(TargetItem); ctx2.Load(TargetItem.File); ctx2.Load(TargetItem.ContentType); ctx2.ExecuteQuery(); //Check this is not a folder (we save only files) if (TargetItem.ContentType.Name != "Folder") { log.Info($"Not a folder"); fname = string.Concat( TargetItem["FileDirRef"].ToString().ToLower().Replace(urlOneDrive.Remove(0, urlOneDrive.IndexOf("/personal")), ""), "/", TargetItem.File.Name.ToLower()); log.Info($"copy the item..."); //Take the content from OneDrive var content = Microsoft.SharePoint.Client.File.OpenBinaryDirect(ctx2, TargetItem.File.ServerRelativeUrl); //Upload the content to the blob CloudBlockBlob blockBlob = container.GetBlockBlobReference(fname); blockBlob.UploadFromStream(content.Stream); } } }
This function will then take the item, check if it is a folder or not based on the content type and copy its content to the blob storage.
I didn’t use the output blob storage from the Azure Function in order to use the super method : UploadFromStream. This is why we have to retrieve all the information about our Azure storage in the function.
The OneDrive will be saved and we dealed with Azure Function’s timeouts.