Scanning multiple pages into a PDF file

Updated on Thursday, November 12, 2015

PdfScan is a simple tool for scanning pages into a PDF file. You can scan single pages from a flatbed scanner or several pages from a document feeder. The page size applies to both the scan and the page(s) added to the PDF.

I wrote PdfScan because I know I'm going to be scanning a lot of documents over the next couple of weeks. Previously I used a tool called ScanToPDF from O Imaging but their licensing pissed me off so much that I'd rather waste time reinventing the wheel than pay them for another copy.

PdfScan - Scan pages to a PDF

This is a beta — it works with my scanner and my documents. There's no installer, so extract the ZIP file and run the EXE to use it. PdfScan requires the .NET 4.0 Framework. If you get an error when you run PdfScan.exe try installing .NET 4 and then run it again.

If enough people use this I'll make it a bit more friendly, add an installer and release it through Catfood. If you like it leave a comment below. If it doesn't work for you leave a comment or email me and I'll try to help.

(Update September 12, 2010: I've tided PdfScan up and released it through Catfood Software. Download from Catfood PdfScan.)

PdfScan uses PDFsharp from empira Software. Thanks chaps!

Scanning from the ADF using WIA in C#

Updated on Sunday, September 30, 2018

Scanner ready for WIA image acquisition

I've been going nuts trying to scan from the document feeder on my Canon imageClass MF4150. Everything worked as expected from the flatbed, no dice trying to persuade the ADF to kick in. I found some sample code but it was oriented towards devices that can detect when a document is available in the feeder. Evidently my Canon doesn't expose this and so needs to be told the source to use.

The way to do this is to set the WIA_DPS_DOCUMENT_HANDLING_SELECT property to FEEDER. You then read WIA_DPS_DOCUMENT_HANDLING_STATUS to check that it's in the right mode and initiate the scan. This did not work for toffee.

After much experimentation I discovered a solution. I had been setting device properties and then setting item properties before requesting the scan. Switching the order - item then device - made everything work.

Here's the function to scan one page:

A few notes — XImage is a type from PDFSharp. I wrote this as part of a PDF scanner that I'll post next so the scanned images are saved and then loaded into an XImage for rendering to the PDF document. The magic numbers come from WiaDef.h in the Platform SDK. If the ADF is out of pages this method sets the return image to null and eats the exception. This is because the function is called repeatedly to scan in pages until the ADF is empty if _adf is true (otherwise it grabs one image from the flatbed). 

If you've been banging your head against a wall trying to get WIA to work with a document feeder I hope this helps.

Updated 2015-05-20: Full source code at https://github.com/abfo/pdfscan

Use WPF Dispatcher to invoke event handler only when needed

Updated on Saturday, September 29, 2018

After floundering a bit with the WPF Dispatcher I've come up with a simple way to make sure an event handler executes on the UI thread without paying the overhead of always invoking a delegate.

This has the benefit (for me at least) of being very easy to remember. Hook up the event handler and then if there's a chance it could be called from a different thread wrap it using the pattern above. It's easier to read than an anonymous delegate and much faster than defining a specific delegate for the event in question.

I haven't tested the various methods to see which is the fastest yet… will get round to this at some point.

Top 5 reasons to hate the Facebook like button

Updated on Thursday, November 12, 2015

5. Validation

The metadata required to use the like button looks like this:

[code:html]





[/code]

But the property attribute isn't valid html or xhtml. The “Open” Graph Protocol says that it's inspired by Dublin Core. DC manages to get by using the name attribute like any other meta tag - why can't Open Graph? It's not the worst problem but it just seems needlessly irksome. Facebook has published a presentation describing their design decisions. This would be great, but it's in that Lessig one word per slide style and so it's attractive but completely useless without the presenter.

4. Fragility

Facebook's documentation is frustratingly sparse. For example you need to specify the owner of the page using a Facebook ID, and once you've chosen a name for your profile this is hard to find. The information vacuum has been filled with many erroneous blog posts saying to use the name, or some number from a shared photo (the best source is http://graph.facebook.com/robert.ellison, substituting your own username). Once you've got the admin ID wrong, you can't correct it - the first admin specified is fixed forever. What happens if a site is hacked and a bad actor sets themselves up as the admin? Surely something like the Google Webmaster Tools authentication scheme could have been used instead?

3. Pages with more than one object

Describing the object being liked in the head element limits you to one object per page. For some sites this is perfect, but what about a blog where you have many posts on the home page? It would be useful to have a like button per post, pointing at the permalink for the post in question. I've worked around this by having a like button for the blog on the home page, and a like button for each post on the post pages. Not ideal. I'm using the iframe version of the gadget, possibly there's some more flexibility with the XBML variant.

2. Duplicating existing pages

Let's say you've spent the past couple of years building up a Facebook page for your site/band/blog/movie and have thousands of fans. When you click your new like button for the first time you create a whole new page. There's no way to tell the like button about the existing page or the existing page about the like button. You now have at least two pages to worry about managing and potentially many, many more. You're also starting from scratch on the ‘like’ count, so even if your brand is already popular on Facebook it's back to Billy no-mates for you.

I can't believe this won't be fixed at some point. As with admin authentication above there must be a better way to establish ownership of various objects in the social graph.

1. Vocabulary

Doctors defend genital nick for girls

For better or worse Facebook has the inexorable pull to start making the semantic web a reality. Given this, and that there are something like twenty-four thousand verbs in the English language it's time for more expressiveness than ‘like’. You also can't comment on the ‘liked’ item in your stream (yet) so no clarification or discussion is possible.

--

Having said all that, if you enjoyed this post please click the ‘like’ button above ;)

Reviews and links for April 2010

Updated on Friday, May 22, 2020

The Spire by Richard North Patterson

3/5

A good enough holiday read and nice to see Patterson return to a straight psychological thriller rather than the last few OpEds loosely wrapped with some plot.

 

Advanced .NET Debugging (Addison-Wesley Microsoft Technology Series) by Mario Hewardt

5/5

Comprehensive introduction to low level .NET debugging - when you need to fire up WinDbg to check out the state of the managed heap, or debug a crash dump from the field you'll find this book invaluable. I wish it had been available when I started figuring out how to use SOS.

 

The Complete Stories of J. G. Ballard by J.G. Ballard

5/5

Wonderful collection of all of Ballard's short stories. It's a huge book with surprisingly few duds. My favorites include The Illuminated Man, clearly the inspiration for The Crystal World, which includes meaning bombs like "It's almost as if a sequence of displaced but identical images were being produced by refraction through a prism, but with the element of time replacing the role of light." and The Ultimate City (which isn't using ultimate in the sense of being good...). I've read most of Ballard's novels but not many of the short stories before. They're well worth the time.

 

Links

- Microsoft Agrees With Apple And Google: “The Future Of The Web Is HTML5″ from TechCrunch (Which makes it all the more tragic that a huge number of clients will still be running IE6 :().

- Comedian criticises BBC 'rebuke' from BBC News | News Front Page | World Edition (The problem isn't that it was anti-Semitic, it's that it wasn't funny.).

- UK 'has a high early death rate' from BBC News | News Front Page | World Edition (That'll be the deep fried mars bars and chips.).

- Oklahoma, where women's rights are swept away from All Salon (Competing with AZ to be the most fucked up state? Sigh :().

- Cameras capture 'Highland tiger' from BBC News | News Front Page | World Edition (Tabbs was bigger than that (a house cat)).

- MI5 dumps staff lacking IT skills from BBC News | News Front Page | World Edition (MI5 has staff without computer skills?).

- The Internet Provides. from jwz (Disturbing).

- Who Really Spends The Most On Their Military? from Information Is Beautiful (Click through to the Guardian blog post, interesting reading.).

Loose Lips...

Updated on Thursday, November 12, 2015

LooseLips

XamlParseException and 256x256 icons

Updated on Saturday, September 29, 2018

When testing out a WPF app on XP I got an unhelpful XamlParseException error report. 

I was a little puzzled because I was hooking up error reporting in App.xaml.cs:

My error handler was attempting to create a XAML window to report the error, and evidently this was bombing out as well triggering the good doctor Watson. I added a MessageBox call instead and discovered that the XamlParseException was wrapping a FileFormatException and the stack trace indicated that the problem was with setting the icon for the window. After removing the icon the app started up fine. Weird.

It turns out that WPF chokes on a compressed 256x256 icon on XP and Vista (Windows 7 seems to cope fine). Saving the icon without compression fixes the problem. I use IcoFX and you can set this at Options -> Preferences -> Options -> Compress 256x256 images for Windows Vista. Of course the consequence is that the icon is a couple of hundred kilobytes larger.

Space and multibyte character encoding for posting to Twitter using OAuth

Updated on Saturday, November 2, 2019

I've spent the last day learning how to use OAuth and XAuth to post to Twitter. There are rumblings that Twitter will start to phase out basic authentication later this year, and more importantly you can only get the nice “via...” attribution if you use OAuth (for new apps, old ones are grandfathered in).

I coded up my own OAuth implementation, referring to Twitter Development: The OAuth Specification on Wrox and the OAuthBase.cs class from the oauth project on Google Code. Both are great references, but both fail with multibyte characters. The problem is that each byte needs to be separately escaped. OAuthBase.cs encodes characters as ints rather than breaking out the bytes and the Wrox article incorrectly suggests using Uri.EscapeDataString(). 

Here's a method to correctly encode parameters for OAuth:

NoEncode chars is a list of the permitted characters:

An impact of this encoding is that spaces must be encoded as %20 rather than plus. I was worried that each space would end up counting as three characters towards the 140 character limit. I tested this and it isn't true, so use HttpUtility.UrlEncode() to calculate the number of characters in the post OAuthUrlEncode() or similar to actually encode post parameter.

Reviews and links for March 2010

Updated on Friday, February 24, 2017

Juliet, Naked by Nick Hornby

3/5

Classic Hornby. It's fairly close to High Fidelity with it's themes of love and music obsession-ism and so feels slightly too comfortable but certainly worth a read if you're a fan. 3/24/2010 2:00:00 AM

 

The Girl with the Dragon Tattoo (Millennium, #1) by Stieg Larsson

3/5

Slow, but highly atmospheric mystery. The first half of the book is dedicated to setting the scene and then the pieces start to fit into place like a glacier melting. The pace makes the occasional punctuation of extreme sexual violence all the more shocking. Fun enough, so I'll probably read the rest of the trilogy and try to catch the film (which has to be a profoundly truncated version).3/22/2010 2:00:00 AM

 

Practical WPF Charts and Graphics by Jack Xu

4/5

Be aware that this book is 90% code, 5% mathematics and 5% explanation. This isn't a criticism, Dr. Xu builds up a complete charting library that includes 2D, WPF 3D and manual 3D methods. The mathematics covers the theory and practice of 2D and 3D transforms as well as techniques for smoothing, interpolating and trending data. It's a fast read to get a sense of the content and then a great reference work to dip back into as needed. 3/14/2010 3:00:00 AM

 

C# Design and Development: Expert One on One by John Paul Mueller

1/5

This book is just atrocious. Each section sells itself as providing all the information you need about a certain topic, then provides trivial and often incorrect or at least highly subjective details. A couple of examples:

The chapter on error handling makes the point that you should catch the most specific Exception possible, but then goes on to demonstrate catching a FormatException, a DivideByZero exception and then just System.Exception. The whole point is to avoid catching Exceptions that you can't handle. There's a legitimate debate here between trying to plaster up the cracks with general catches and letting the application die with a useful stack, however this book doesn't discuss it. There's also very brief coverage of creating your own derived Exception but it doesn't touch on serialization.

Serializing an XML file is somehow included in the section on "Special Coding Methodologies", and labors over calling both .Flush() and .Close() on a StreamWriter. Despite the fact that you only need to call Close(), and that StreamWriter is IDisposable and so a using statement is really the way forward for this example.

I could go on, but won't. Avoid. 3/8/2010 2:00:00 AM

 

Links

- Dorothy Erskine Park Exists from Spots Unknown (Must go find this park.).

- Casttoo from jwz (I want to break my arm again...).

- Woman murdered over Facebook photo from BBC News | News Front Page | World Edition (Somehow I don't think the photo being on Facebook was the important part of the story...).

- Petition against Pope's UK visit from BBC News | News Front Page | World Edition (A better petition would be to get the Pope and Dawkins together on Question Time.).

- 'Heart risk' at football stadiums from BBC News | News Front Page | World Edition (Surprisingly few are equipped to remove gall stones as well.).

- Postal Service's emerging model: Never on Saturday from SFGate: Top News Stories (How about once a week. While you're at it recycle the junk at the post office and don't bother hauling it out for delivery.).

Email marketing - don't shoot yourself in the foot

Updated on Friday, February 24, 2017

If you send email to customers it's important that you let them know where the email will come from and then use then use this address consistently. Using different email addresses is a recipe for getting trapped in spam filters. This is equally important for marketing and other messaging like bills and canceled flights.

I bring up flights because I'm flying to the UK later today and was planning to return on Sunday. British Airways' Cabin Crew is going on strike this weekend and my return flight has been canceled. Instead of sending a text message BA tried to notify me by email. This would have been fine if they used the address they've used for years, but instead they used a new address and a new domain. In fact in the process of canceling and re-booking I (eventually) got email from britishairways.com, my.ba.com, email.ba.com and pop3.amadeus.net.

Since I've had the same email address for twelve years now I get a fair amount of spam. I use SpamArrest to keep myself sane:

94.9% of my email is spam. Since I started using the service SpamArrest has eaten 482,494 messages for me. I'm far from alone in using white list based email filtering so if you want your message to get through transparency and consistency are the way to go.