Download a Sharepoint File with GraphServiceClient (Microsoft Graph API)

By . Updated on

Demons protect a PowerPoint presentation from developers trying to access it.

There is a stunningly simple way to get a file out of sharepoint and I'll get to that soon (or just skip to the very end of the post).

I have been automating the shit out of a lot of routine work in Microsoft Teams recently. Teams is the result of Skype and Sharepoint having too much to drink at the Microsoft holiday party. It often shows. One annoyance is that channel threads are ordered by the time that someone last responded. Useful for quickly seeing the latest gossip but a pain when you need to keep an eye on each individual thread. After listlessly scrolling around trying to keep up with the flow I came up with a dumb solution - I sync the channel to Obsidian (my choice of note app, could be anything) and then I can just check there for new threads. It's a small convenience but has meaningully improved my life.

Unfortunately I got greedy. These messages usually have a PowerPoint presentation attached to them and so why not have an LLM summarize this while updating my notes?

It doesn't look like Copilot has a useful API yet. You can build plug-ins, but I don't want to talk to Copilot about presentations, I just want it to do the heavy lifting while I sleep so I can read the summary in the morning. Hopefully in the future there will be a simple way to say hey, Copilot, summarize this PPTX. Not yet.

So the outline of a solution here is download the presentation, send it ChatGPT, generate a summary and stick that in Obsidian. This felt like a half hour type of project. And it should have been - getting GPT4 Turbo to summarize a PPTX file took about ten minutes. Downloading the file has taken days and sent my self esteem back to primary school.

You would think that downloading a file would be the Graph API's bread and butter. Especially as I have a ChatMessage from the channel that includes attachments and links. The link is for a logged in human, but it must be easy to translate from this to an API call, right?

It turns out that all you need is the site ID, the drive ID and the item ID.

These IDs are not in the attachment URL or the ChatMessageAttachment. It would be pretty RESTful to include the obvious next resource I'm going to need in that return type. No dice though.

I tried ChatGPT which helpfully suggested API calls that looked really plausible and helpful but that did not in fact exist. So I then read probably hundreds of blogs and forum posts from equally confused and desperate developers. Here is a typical example:

"Now how can I upload and download files to this library with the help of Graph API (GraphServiceClient)."

To which Microsoft, terrifyingly, reply:

"We are currently looking into this issue and will give you an update as soon as possible."

Before eventually suggesting:

"await graphClient.Drives["{drive-id}"].Items["{driveItem-id}"].Content.GetAsync();"

Ignoring the sharepoint part and glossing over where that drive ID is coming from. Other documentation suggests that you can lookup your site by the URL, and then download a list of drives to go looking for the right one. Well, the first page in paginated drive collection anyway implying that just finding the ID might get you a call from the quota police.

I know Microsoft is looking after a lot of files for a lot of organizations, but how can it be this hard?

It isn't. It's just hidden. I eventually found this post from Alex Terentiev that points out that you just need to base64 encode the sharing url, swap some characters around and then call:

"GET https://graph.microsoft.com/v1.0/shares/{sharing-url}/driveItem"

If Google was doing its job right this would be the top result. I should be grateful they're still serving results at all and not just telling me that my pastimes are all harmful.

The documentation is here and Microsoft should link to it on every page that discusses drives and DriveItems. For GraphServiceClient the call to get to an actual stream is:

"graphClient.Shares[encodedUrl].DriveItem.Content.GetAsync()"

Winter Solstice 2023

Winter Solstice 2023 in Catfood Earth

Winter starts right now for those of us at the top of the planet. It's summer time down under. Winter Solstice 2023 rendered in Catfood Earth (03:28 on December 22, 2023 UTC).

Catfood Earth 4.40

By . Updated on

Most of the layers enabled in Catfood Earth 4.40.

Catfood Earth 4.40 is now available to download.

With this release Catfood Earth is 20 years old! This update includes version 2023c of the Time Zone Database and the following bug fixes.

The National Weather Service changed one letter in the URL of their one hour precipitation weather radar product. It needs to be BOHA instead of BOHP. Presumably just checking that data consumers are paying attention? Weather radar is working again.

Not to be left out the Smithsonian Institution Global Vulcanism Program has decided to drop the www from their web site. The convention here is to redirect but they're content with just being unavailable at the former address. Recent volcanoes are working again as well.

The final fix is to the locations layer. Editing a location was crashing. This was due to a new format in the zoneinfo database that was not contemplated by the library that I use. As far as I can tell this isn't maintained any more since the death of CodePlex. While working on this update I started using GitHub Copilot, their AI assistant based on GPT 3.5. I was amazed at how helpful it was figuring out and then fixing this rather fiddly bug. The locations layer is back to normal, and I have regenerated all the time zone mapping as well.

Rob 2.0

By . Updated on

A robot head

If I'm going to be replaced with AI then I may as well be the person to do it. I need an AI Rob that I can be proud of and that's going to take some work.

My approach so far is to generate some training data. I've answered lots of questions in a spreadsheet. This is an ongoing project and there will be dot releases as I work towards a usable product (one that I can just plug into email or Teams). Probably this is going to require a mix of fine tuning and retrieval augmented generation (RAG). To start with I'm just fine tuning GPT 3.5 Turbo from OpenAI.

Fine tuning was painless. As usual the difficult part was randomly trying different versions of Python to find one that would coexist with some stubborn dependency (tiktoken in this case, which will live with Python 3.11 but is very unhappy with Python 3.12).

You can try this below - just leave a comment and Rob 2.0 will reply. Anything you post goes through the regular moderation system, this is just to stop spam. any legitimate questions are fair game (and likely to make it into the training corpus if the answer is no good!).

Due to safety systems it doesn't swear like the real thing. That might require a different model / corporate host at some point in the future. I'll update this post as I make progress.

Updated 2023-12-20 00:46:

I had most of a day spare today and so decided to get a little closer to my own personal singularity. Rob 2.1 is live and answering your questions in the comments below.

The first thing I did was add a few hundred more questions and answers to my training data set. I then fine tuned GPT 3.5 on the new data.

I wanted to get the LLM trinity - prompt, retrieval augmented generation (RAG) and fine turing. Initially I thought that I could just use the OpenAI assistant API to get there, and I got as far as coding the whole thing up before stubbing my toe on a harsh reality. It only supports retrieval for gpt-3.5-turbo-1106 and gpt-4-1106-preview. Hopefully this changes at some point but no way to get everything I need from assistants yet.

Not a big deal - I rolled up my sleeves (and also GitHub Copilot's sleeves) and added my own RAG based on the Q&A training data and refined my prompt to include the most relevant answer as well as some more specific instructions. It's pretty basic - whatever you ask is compared to the existing question library using cosine distance of OpenAI embeddings. Maybe I'll add a vector database if I have the patience to answer enough questions about myself, but a brute force in memory search works fine for now.

Updated 2026-01-17 23:20:

Rob 2.2 is live with some small updates. I've dropped fine tuning so there is a big leap forward from GPT 3.5 Turbo to 5.2. This has become a lot more verbose and while not emdashy it isn't quite nailing the right style. I have also been working on a version to send to meetings. A combination of Attendee and Tavus is frighteningly effective.

Catfood Earth for Android 4.30

By . Updated on

Catfood Earth for Android 4.30

Catfood Earth for Android now supports random locations. The slice of Earth displayed will change periodically throughout the day. You can still set a manual location or have Catfood Earth use your current location. Install from Google Play, existing users will get this update over the next few days.

Summer Solstice 2023

Summer Solstice 2023

Summer Solstice 2023 is at 14:58 UTC on June 21. The image above shows the exact moment of the Solstice as rendered in Catfood Earth. It's the official if not sartorial start of Summer in the Northern Hemisphere and Winter if you find yourself on the other side of the Equator.

Shipping a website in a day with Generative AI

By . Updated on

Can you tell me a story about a shop?

It usually takes me a few weeks to get a new website up and running. Last weekend I tried an experiment with Cloudflare Pages and generative AI.

I have wanted to find an excuse to test Pages for a while. It's a pretty awesome product. I'm not doing anything too fancy with it - I have a local generator app that creates the pages for my site. Committing to the right branch in git automatically deploys to Cloudflare's edge network. It seems to do the right thing with all the file types I've thrown at it so far. My only complaint at this point is that it doesn't handle subdirectories. Everything needs to hang off the root unless you want to write some code. I think this is possible with Cloudflare Workers but that's for another day.

The generative piece is automatically writing content for review and publication. For each generated page I'm creating a prompt to write the post, and then another prompt to summarize it for meta descriptions and referencing it from other pages. I also create an embedding to use for interlinking related posts. Finally I create a third prompt to gin up an appropriate image. The site generator stitches these together into HTML and as soon as I commit, the updates are live.

The site is not yet a work of art, and there is plenty to optimize and add, but the basic thing was working in a few hours. It's all ridiculously cheap as well. I'm more than a little frightened for Google given how much of this must be going on right now. And then the next generation of LLMs will be trained on the garbage produced by the current crop.

My super rapid site is called Shop Stories, collecting / dreaming takes of ecommerce heroics. I'll report back if anyone goes there.