Thursday, May 29, 2008

Icon formats for Windows Mobile

About 6 months ago, I created a custom icon for my Windows Mobile application, FeedFly. I think I used one of those "We'll make your icon for free from an image! It's totally FREE!!!" web sites. I chased that down with a trial edition of an icon editor (it might have been Microangelo) to make a couple modifications.

My icon turned out great, or at least I thought it did until I deployed it to my device. The transparent background wasn't transparent. It was either white (as shown on the start menu graphic below) or this weird smoky gray color (shown on HTC's Home Today Screen plug-in next to the fixed one). I tried opening the icon in several different applications (Visual Studio, Infranview, and even old MS Paint). Each one told me I actually had a transparent background (in more or less words).

FeedFly-SmallIconComparison FeedFly-LargeIconComparison

Since I'm working on a new release for FeedFly, I thought I'd once again look for a solution to my icon transparency woes. A couple days ago, I stumbled upon IcoFX and installed it to see if it could be of any help. Well, it was.

It turns out that the handy web site I used to generate the icon provided me with a 32 bit color icon along with the alpha transparency included within the original image. Using IcoFX, I converted my icon to 256 color (8 bit) and removed the alpha transparency. After deploying to my device, it is shown correctly now.

On a related note, let me start by saying I realize that the target audience for a tool such as an icon editor is a small slice of the general user population. However, for an operating system that uses icons so extensively, I still can't believe an icon editor is not included in Windows. I'm on a budget, and I can't justify purchasing something like Photoshop just to edit an icon. I'm nowhere near an artist, so I definitely won't make icon editing a habit (I get pretty frustrated as it is), but I do end up fiddling with an icon once every other year it seems (mostly favicons lately).

Finally, this post isn't meant to imply that all the high bit counts won't work for icons (read this for more information), but to hopefully help you avoid some frustration trying to track this issue down. Just make sure your icon is not 32 bit with alpha.

Technorati Tags: ,

Monday, May 26, 2008

Importing files into a SharePoint document library using regular expressions and WebDAV

I just finished writing a utility to export a folder hierarchy of files from my existing custom extranet to a SharePoint document library. The custom extranet was database-driven and allowed the user to name a file or folder whatever he or she wished up to a maximum of 500 characters. When I wrote this extranet 6 years ago in classic ASP, I'd just HTML encode whatever name the user wished and store it in the database. Whenever a folder or file was retrieved, it was always by using the ever-so-not-user-friendly URL parameter "id=".

I already knew I would need to remove restricted characters from my folder and files names that SharePoint does not allow. Furthermore, SharePoint's document libraries actually display the full folder path in the URL, which means I'll need to be concerned about the total path length.

My migration plan was to build a physical folder hierarchy for staging the files, then use WebDAV (SharePoint's explorer view for document libraries) for importing the hierarchy into SharePoint within Windows. This method will allow me to keep the utility focused on a simpler task than actually importing the files into SharePoint and make sure I don't have to worry about server timeouts.

Naming restrictions

SharePoint has naming restrictions for sites, groups, folders and files. Since I'm only interested in folders and files, only the following restrictions will be considered.

  • Invalid characters: \ / : * ? " ' < > | # { } % ~ &
  • You cannot use the period character consecutively in the middle the name
  • You cannot use the period character as the first or the last character

Someone already familiar with this topic will notice that I added the apostrophe to the official restricted character list. During my own testing, SharePoint complained when I uploaded a file with an apostrophe, so I added it to the list.

Length restrictions

Besides naming restrictions, SharePoint also has the following length restrictions (from KB 894630).

  • A file or a folder name cannot be longer than 128 characters.
  • The total URL length cannot be longer than 260 characters.
128 character limit for folders and files

Regarding the 128 character limit, you can't use SharePoint's UI to get to this limit. The text box's maxlength property is set to 123 for both folders and files. I don't have any inside sources, but my guess is that the SharePoint team did this to make sure the total file name would not exceed 128 characters if the extension was 4 characters (as is the case with Office 2007 file formats like docx and xlsx). The odd thing is that the folder text box is limited to 123 characters as well. However, if you put the document library into Explorer view, you can rename a folder to allow the full 128 characters. I bet there's some reuse going on between the data entry screens for the file and the folder in this case (also something a programmer on the SharePoint team might want to do).

260 character limit for URLs

I've done some WebDAV importing to this particular SharePoint farm in the past, and I'm pretty sure I ran into paths close to the 260 character limit, so I investigated this. I found several instances where the total URL exceeded 260 characters.

KB 894630 mentioned above also says:

To determine the length of a URL, .... convert the string in the URL to Universal Character Set (UCS) Transformation Format 16 (UTF-16) format, and then count the number of 16-bit characters in the string.

However, it should probably say something like "decode the URL first, then count the characters" to make it easier to understand. I created a folder hierarchy to test out the 260 character limit. Following is a URL (notice the %20 space codes) to a test file copied from the address bar of the browser. When the URL is encoded, it contains 346 characters.

http://intranet.xyzco.com/sites/Testing/Documents/A%20longer%20than%20 usual%20folder%20name%20for%20testing/Subfolder%201%20also%20has%20 a%20long%20name/3rd%20level%20subfolder%20about%20related%20 documents/4th%20level%20subfolder%20about%20more%20specific%20 documents/5th%20level%20subfolders%20are%20possible%20in%20this%20 hierarchy/1234567.txt

The decoded URL is:

http://intranet.xyzco.com/sites/Testing/Documents/A longer than usual folder name for testing/Subfolder 1 also has a long name/3rd level subfolder about related documents/4th level subfolder about more specific documents/5th level subfolders are possible in this hierarchy/1234567.txt

Counting the characters in the URL gave me 284. To get closer to 260, I subtracted the 25 characters for the web application:

284 – 25 (Length of http://intranet.abcco.com) = 259 characters

I didn't get a perfect 260, but it's close enough for me to believe that the web application host header name is not included in the limit. This is just a guess on my part, though.

Why the 260 character limit?

A 260 character limit on the URL is interesting, considering both Windows and most internet browsers support paths much longer. It's not merely a coincidence that 260 also just so happens to be the value of the infamous MAX_PATH constant from the Windows API. .NET uses MAX_PATH because .NET relies on the Windows API behind the scenes. There are API workarounds, as discussed on the BCL team blog, but I think it's safe to assume that this limit is imposed on SharePoint by .NET in some way.

Removing invalid characters and patterns using a regular expression

The String object's Replace method doesn't contain an overload for replacing an array of strings, so I looked into using a regular expression to clean folder and file names.

Regular expressions have their own special characters that must be escaped if used for searching:

[ \ ^ $ . | ? * + ( )

Out of these, the following are also SharePoint's invalid characters: * ? | \ These are the characters that will need to be escaped in our regular expression.

After a bit of fiddling, I came up with the following 4 expressions:

  1. [\*\?\|\\/:"'<>#{}%~&] for removing invalid characters
  2. \.{2,} for replacement of consecutive periods
  3. ^[\. ]|[\. ]$ for removing spaces and periods from the beginning and end of a folder or file name
  4. " {2,}" for replacement of consecutive spaces (enclosed by quotation marks so you can see the space)

I added a couple of rules to these expressions because of my migration strategy. Since I'm using WebDAV and building a physical folder hierarchy in Windows, I also need to be concerned about any additional restrictions imposed by the OS (a folder or file name can't end with a space). Also, I'm replacing consecutive spaces with a single space.

All expressions are used by Regex.Replace(). Expressions 1 and 3 are replaced by String.Empty. 2 and 4 are replaced by a period and a space, respectively. In regards to the order of the replacements, it's important that the invalid character replacement is applied first. Combining these expressions and replacing at once might create a problem after invalid characters are replaced. For example, the name %.afile.txt would become .afile.txt if done all at once, violating the rule that a period cannot be the first character.

After all replacements have been made, it's still possible to have one of the rules violated. For example, a folder named "Folder one . and . " (ends with space, period, space) would still be invalid after 1 pass of expression 3. It would still be invalid after a 2nd pass. Because of this, the beginning and end rule should be used in a loop until no matches are found. This doesn't help performance, but I was willing to compromise since my largest extranet (9000 files and hundreds of folders) was processed within a minute. Plus, I know the minute I post this someone's going to read it and say, "What was he thinking? It's so much faster to do it this way...".

Fixing length restrictions

To make sure you include as many characters from the original folder or file name as possible, the naming restrictions should be enforced before the length restrictions.

To know how long a file name can be, it's important to know how close we are to the maximum allowed path length. Since I'm using a physical file hierarchy to stage the files, I can simply check the current folder's path length. Instead of going into too much detail about this, take a look at the maxLength integer in the following code listing. maxLength is what I used to determine how long a folder or file could be given the current path length.

An example method in C#

Following is the method I ended up with, along with some global variable initializations. You'll notice I added the tab character to the invalid characters list. During an export, I found a file name with embedded tab characters, so it was added to the list as well.


private const int MAXFOLDERLENGTH = 128, MAXFILELENGTH = 123;
private int MAXURLLENGTH = 259;

private Regex invalidCharsRegex =
    new Regex(@"[\*\?\|\\\t/:""'<>#{}%~&]", RegexOptions.Compiled);

private Regex invalidRulesRegex = 
    new Regex(@"\.{2,}", RegexOptions.Compiled);

private Regex startEndRegex = 
    new Regex(@"^[\. ]|[\. ]$", RegexOptions.Compiled);

private Regex extraSpacesRegex = 
    new Regex(" {2,}", RegexOptions.Compiled);

/// <summary>
/// Returns a folder or file name that 
/// conforms to SharePoint's naming restrictions
/// </summary>
/// <param name="original">
/// The original file or folder name.  
/// For files, this should be the file name without the extension. 
/// </param>
/// <param name="currentPathLength">
/// The current folder's path length
/// </param>
/// <param name="maxItemLength">
/// The maximum allowed number of characters for this file or folder.
/// For a file, it will be MAXFILELENGTH.
/// For a folder, it will be MAXFOLDERLENGTH.
/// </param>
private string GetSharePointFriendlyName(string original
    , int currentPathLength, int maxItemLength)
{
    // remove invalid characters and some initial replacements
    string friendlyName = extraSpacesRegex.Replace(
        invalidRulesRegex.Replace(
            invalidCharsRegex.Replace(
                original, String.Empty).Trim()
            , ".")
        , " ");

    // assign maximum item length
    int maxLength = (currentPathLength + maxItemLength > MAXURLLENGTH)
        ? MAXURLLENGTH - currentPathLength
        : maxItemLength;

    if (maxLength <= 0)
        throw new ApplicationException(
            "Current path is too long for importing into SharePoint");

    // return truncated name if length exceeds maximum          
    if (friendlyName.Length > maxLength)
        friendlyName = friendlyName.Substring(0, maxLength - 1).Trim();

    // finally, check beginning and end for periods and spaces
    while (startEndRegex.IsMatch(friendlyName))
        friendlyName = startEndRegex.Replace(
            friendlyName, String.Empty);

    return friendlyName;
}

A typical call to this method would look similar to the following. In this listing, parent is a DirectoryInfo object pointing to the current folder.

fileName = GetSharePointFriendlyName(fileName
    , parent.FullName.Length + 1, MAXFILELENGTH);
folderName = GetSharePointFriendlyName(folderName
    , parent.FullName.Length + 1, MAXFOLDERLENGTH);

Testing the import to SharePoint using empty files

The best test would be to actually upload the files via WebDAV to a staging environment. However, if you receive an error message because of name restrictions or path length during the process, it's difficult to pick back up where the error occurred.

To quickly preview an upload, I modified my export utility to create empty files instead of building the folder hierarchy with the actual files. You can use these for a mock import in WebDAV even though SharePoint's UI will not allow you upload an empty file. The following line was used to create the files.

using (StreamWriter sw = File.CreateText(fileName.ToString())) { };

The using statement makes sure the StreamWriter is closed after the file is created. I learned this the hard way when the OS threw an exception about a file being locked.

Another benefit of using empty files is to preview the migration for your users. They can browse the document library and offer their approval. Since we've had to remove some characters and possibly truncate names, this could be very important to the success of the migration.

Export Utility

Just to offer some eye candy for this post, I ended up with something that looked like this:

Export utility screenshot

Wednesday, May 21, 2008

Creating a batch of Active Directory accounts for SharePoint with the help of Excel

I recently deployed an extranet using SharePoint and Active Directory (AD). At my company, when a new extranet is requested, the request is typically accompanied with a list of new user accounts that should be created as well. Sometimes this list can contain over 20 accounts.

Using the Active Directory Users and Computers applet is not my favorite interface for creating accounts, and especially not for a large batch of them. I found myself having to edit an account 3 times to get it to appear the way I wanted in AD and in SharePoint. I looked into my scripting options, and ran across dsadd. This command has everything I need in order to create AD accounts in one step. However, it's very tedious to have to type the command given the customization I wanted and to remember the syntax rules.

Using Excel to create dsadd commands

To make this process easier, I created an Excel spreadsheet to generate the dsadd command based on a collection of cells. I've used Excel several times in the past to generate batches of commands or SQL statements. Even if you don't need a dsadd command, this spreadsheet is a useful reference for building other batches of commands.

Screenshot of dsadd spreadsheet - Click to download

After downloading the spreadsheet, you'll need to fill in the data for your company's Active Directory and the OU you'll be creating the accounts in. You'll then probably want to extend the cell formulas beyond the single row that's included. After you've filled in everything, copy the cells in the script column and paste it into a text editor like Notepad. Save it as a cmd file, and you can double click it to execute all the scripts.

You can add or remove more columns to accommodate the other dsadd parameters. Regarding additional information fields, I was only interested in company and department.

Tip: If you'd like to save the results of the commands, add an append redirection operator to the call to the script file from a console screen. This would be helpful to find out what went wrong if some of your commands failed.

For example,

createaccounts.cmd >> c:\dsadd.log 2>&1

will send all the commands and any errors to the dsadd.log file instead of the console. It's important to remember to add 2>&1 at the end, since dsadd sends errors to stderr, not stdout.

Adding a batch of user accounts to a SharePoint site collection

After the accounts have been added to AD, you can reuse the spreadsheet to add multiple accounts to SharePoint. Following are the steps I've used to translate a column of user account names into a semicolon-delimited list for the add user screen in SharePoint.

  1. Copy the column of cells that have the account name (Domain\username) and Paste Special into Word to avoid it creating a table. You only want unformatted text.
  2. Replace each line feed character (type ^p into the find box) with a semicolon.
  3. Copy and paste into the SharePoint add user screen.

Sunday, May 11, 2008

A Windows Mobile feed reader: FeedFly

FeedFly logo - Windows Mobile feed reader Yesterday I published my first open source project, FeedFly. FeedFly was the focus of my senior project course last semester. This class is required to graduate with a bachelor in Computer Science at my university.

We were given the choice to write any kind of software application, but it had to get enough votes in order to form a team. I was lucky since my idea got a couple votes. About 300 man hours later, we gave our final presentation and I nervously gave a demo. Besides ActiveSync messing with my device, the demo was fine. I had to leave it on, since I was relying on it for my data connection. The room we were presenting in was like a steel box with no windows.

Including myself, we were a team of 3 programmers. I attempted to manage the timeline of the project using a modified version of Scrum. It was as effective as it could be given we couldn't do daily stand-ups. Plus, it made the Gannt chart something very easy to look at:

Project Plan Schedule

This chart shows our 1 week architecture sprint at the beginning, 3 2-week development sprints, and a 1-week documentation and presentation sprint. However, generating the schedule wasn't easy. I had to enter our weird college work hours and account for our school holidays for Microsoft Project to get the end date just right.

I'm very happy with how FeedFly turned out. If I wasn't, it wouldn't have been published as an open source project. I learned a lot about the .NET Compact Framework during development, and I think the project serves as a good example application that implements some best practices. If you're a compact framework developer, I'd love to get your feedback about it.

Oh, and I got an A on the project by the way.

Friday, May 2, 2008

RSS can't catch up to email yet

When I first learned about XML, I thought it sounded like a great idea. I had a hard time coming up with an excuse to use it in my earlier programming days (would have involved a rewrite, or couldn't cost-justify the implementation of that cool, new self-describing configuration system), but .NET changed all of that (for me, being a Microsoft-experienced dev) and now XML is extremely easy to work with. Even Microsoft is using it in Office 2007 for all their new file formats. For example, did you know you could rename a docx file to zip then unpack and inspect it? What you'll see is a folder hierarchy of XML files, which could be edited in Notepad if you're so inclined. If you eat XML for breakfast, you don't even need Office to create Office documents. I think we would all agree that XML has definitely arrived and is here to stay.

I mention RSS in the title because this particular XML web feed standard has become so popular it has become synonymous with "web feed" itself. The following video pretty much sums up the goals of using web feeds instead of the typical models of gathering your information from the internet.

In short, web feeds are a perfect application of the theory of XML. It makes personal blogs just as "subscribe-able" as our magazines and radio shows available at any time via podcasts. I should probably spend an entire post praising podcasts since I'm always listening to them now. Ever since my phone became my mp3 player, my car stereo hasn't been turned on (I have one of those after-market ones that plays mp3 disks, too. It always reads "Standby"). I fear this introduction was a bit lengthy, but I'm trying make everyone in my audience happy (and may it never change).

Why the negative title about RSS? My point is that even though web feeds are more popular now than ever before, the barrier of entry is still too high to participate in all of its glory. I'm sure you know plenty of non-technical computer users out there. Are they subscribing to feeds? Probably not. Last semester, I graduated with my Computer Science bachelor's degree (finally - it took me 7 years with my day job). My senior project was a Windows Mobile feed reader named FeedFly (In process of creating an open source project for this - stay tuned). During our final presentation, in a room full of technical experts and industry advisors, we asked how many of them actually subscribed to blogs (indicating they regularly use a feed reader). I would say the response was less than 10 percent, and I was one of those with a raised hand.

I created a training session on blogging for my company 3 years ago. I was able to convince one of the teams to replace their email newsletter with a blog, and boy did it take off. It's still on the first page of a Google search without having to pay any SEO "specialist" vendors (I don't like most of these - another post). In this training, I predicted that web feeds would take off once Internet Explorer 7 and Outlook 2007 started shipping with built-in feed readers. I was wrong. Once I used the Outlook 2007 RSS client, I understand why. It's nowhere near as integrated as it needs to be in order to get everyone to use it. Plus, it's not easy to work with once you have a lot of subscriptions. If you're curious what reader I use, it's Snarfer with the Bloglines synchronization feature. Oh, and Doppler for the podcasts.

What's the solution for the lack of adoption so far? The best solution would be to purchase your own domain name for hosting your blog, so you have the freedom of moving it to different hosts. Then, offer an email subscription feature for those that don't use readers. I'm sure future versions of feed readers will be much more user-friendly, and I'm still convinced it will take off. FeedBurner is one of several free services that offers email distribution. With email distribution, your subscribers will receive an email with your new posts to the feed. Now, you won't become frustrated when people get tired of opening their feed readers you set up for them so they can check if your blog has a new post.

Even though FeedBurner is quite useful for email, I see it's primary purpose as an abstraction of a feed, so the feed source can be changed independent of all subscriptions. If you didn't have this abstraction layer set up and you moved your feed to a new hosting platform, you'd have to post a request on your old feed informing your subscribers to change their subscription to the new feed address. I've seen these posts several times in the feeds I've subscribed to over the years.

Let's be honest. If you don't have more than a couple of feeds subscribed, it's not worth setting up a blog reader at all. The simple action of opening a reader is step 1. Step 1 could have been to just visit the web site instead. If you have 2 feeds subscribed, you save yourself a whole step. That's a lot of software installation or configuration just to save a step. Maybe web feeds will never take off because of this. They only offer a significant advantage to someone that is spending too much time looking around for updated information.