Finding An Internet Presence Provider What type of server? Which web server software? What's the monthly charge? What's the charge for domain registration/transfer? How much space do I get? Bandwidth restrictions Access statistics What kind of bandwidth does the provider have? Common Gateway Interface (CGI) Security Downtime Minor "nice" features E-mail Support Developer resources DBMS access Warning shots A key point Search References Other Conclusion This paper is designed to help you select an appropriate Internet Presence Provider for your Web site. Recently, there has been an explosion in the number of facilities offering to host Web sites. Rates and requirements vary wildly. Choosing a host company is not a simple task. I would preface this list with a simple thought --- know what your goal is before you try to select a host for your Web site. The provider that is excellent for an experienced Web site designer and/or reseller is often not very good for a neophyte who is just learning how to do things. Likewise, the provider offering higher prices, 24 -hour customer support, and 1 MB Web sites is probably not appropriate for your large, well-designed corporate Web site. If you know your target --- electronic commerce, information dissemination, "just being there," etc. --- you are in a much better position to choose providers correctly What type of server? Does it matter what type of hardware the server is? Materially, no. However, you may have business or personal reasons for a preference. It's always a good idea to know the provider's hardware choices. UNIX servers are the most common. UNIX comes in many flavors, including a couple of UNIX operating systems that are free --- FreeBSD and Linux. UNIX can be run on a PC server as well as a UNIX-specific piece of hardware. The other common option is to have a Windows NT-based server, which is almost always an Intel-based PC server of some type (in a few cases, the server may be a Digital Alpha system). It should be noted that at this point in time, UNIX has an enviable record or reliability. In fact, there are UNIX-based presence providers who will guarantee 99%+ uptime. While NT is also very reliable, the general uptime numbers quoted are 98%-99%, not the tiny downtime numbers of UNIX. The primary reason to opt for NT is still database access. If you want or need database capabilities, and you design your Web site on a PC, it is vastly simpler to implement those database capabilities in an NT environment. Why? The major NT servers, Microsoft IIS and O'Reilly WebSite Pro, support various ODBC-based Web server data access. A Microsoft Access DBMS can be used on these servers. The advantage is that a Web designer can create, design, load, and maintain a local Access or SQL Server DBMS on both the Web design desktop and the Web server platform. A UNIX system will require some type of migration, unless the design phase occurs on a UNIX desktop system. If your Web design and maintenance is performed on a UNIX system, you can easily implement a UNIX DBMS on both systems. The important thing is to determine what the prospective hosting site is running. It's a sort of FYI issue. If the provider hedges on this question, find someone else. UNIX server. Which Web server software? This is more important than the type of server hardware, because each Web server software package has different strengths and features. At the base level of retrieving and displaying pages, transmitting data, and finding information, all the Web servers are similar. The top five Web servers (in terms of market share) are Apache (44%), Microsoft (21%), Netscape (12%), NCSA (6%), and O'Reilly(3%). Microsoft and O'Reilly dominate the NT platform. Apache dominates UNIX. However, an Apache NT version will shortly be available. Netscape runs on both UNIX and NT, but is heavily concentrated in UNIX. NCSA is also a UNIX product. There are differences that seem small, but can be major issues depending on your needs and requirements. For example: Apache - Extremely reliable. Available almost everywhere. Free. Microsoft IIS - Runs native with FrontPage extensions and ASP. Often offered with IIS is a third-party package called IIS Assistant. This is an on-line access statistics database -- extremely easy for keeping track of access stats. IIS does NOT support standard HTTPd password protection. O'Reilly WebSite - A combination of Webserver standards support (CGI, Win-CGI, HTTPd passwords, etc.) and Microsoft specifics (FrontPage extensions, ASP, etc.). WebSite also includes a native indexing and searching facility (WebIndex/WebFind), iHTML, an in-line page-based programming language that greatly simplifies Web- database integration, and iHTML Merchat, an on-line commerce system. Netscape - Great standards support. Excellent speed and technology. Wonderful third- party support. Many non-NT based providers feature "FrontPage extensions" that enable a non-NT Web Server to support some of the FrontPage NT-specific features. However, there are some features that do get supported and some that don't. This is especially true with the FrontPage Bots. If you are making extensive use of the Bots, either find an IIS-based host or ask a prospective host which Bots are and aren't supported. Otherwise, you may get a nasty surprise when a feature isn't supported by the "FrontPage extensions" on the host system. What's the monthly charge? Prices are all over the map. Consensus is $19.95 - $35.00/month for a virtual Web site (www.yourdomain.com) with around 20 MB storage, 500 MB - 1,000 MB data transfer (bandwidth) with E-mail. Features vary. Ask. What's the charge for domain registration/transfer? InterNic is the registration office for Internet IP addresses and domain names. The current charge for registering a domain name is $50/year. There is a two-year minimum to start, so the first charge is $100. Generally, InterNic will bill you directly. The Service provider may charge as much as $100 for their time (in registering your domain Filling out and filing the multi-page form). This is in addition to the $100 you get billed from InterNic. Make sure that you are the Administrative contact on the InterNic application. If you ever want to change service providers, your domain name can go with you, but it is much easier if you are listed as the Administrative contact. If this is the case, when you want to move the domain, you simply contact the new service provider who, in turn, contacts InterNic. InterNic then E-mails you (the Administrative contact) a form to acknowledge to OK the transfer. Transferring a domain incurs no charge from InterNic. Just be aware of the charges. Most providers are fair in their charges. A few treat domain registration and transfer like car dealers with "dealer-prep" charges. How much space do I get? Web providers have targeted service packages with certain limits, including space for your Web pages. Remember that Web pages themselves are very small. Multimedia (graphics, sound, animation, movies, etc.) generally is quite space-intensive. As an example, I have over 500 HTML files that total to under 1 MB of disk space. On the other hand, one movie segment of a couple of minutes can easily run upwards of 10 MB. A 20-second sound byte is often in the range of 200 KB. Unless you go crazy with multimedia, the general size limits of 20-30 MB will not be a problem. If you know that you are going to incorporate hundreds of pages with massive volumes of multimedia elements, look for higher storage limits. If you are using FrontPage extensions at a non-NT hosting site, ask about the size of the extensions and whether they count toward your limit. Obviously, if the FrontPage extensions require 10 MB, and your disk limit is 12MB, it's not a great deal. E-mail may be an issue. Some providers count E-mail storage and bandwidth against your monthly limit. This may be an issue if you plan to send or receive large volumes of E-mail. Ask. Lastly, what happens if worse comes to worse, and you exceed your storage limits? What are the charges that are incurred by going over the limits? Again here, rates vary widely and wildly. Charges generally vary from $0.04/MB to $1.00/MB over the allowance. Ask. Please note that the previously mentioned space overage charges are not universal. There are some providers who offer no extra space charges. Instead, if your space utilization goes over the limits of your plan, you automatically get bumped to the next higher plan. This can be a nasty surprise. For instance, if you sign up for a $19.95/month plan, and the next plan offered is $50/month, getting bumped up costs you $30/month. Check the package requirements carefully. Bandwidth restrictions The second class of restrictions is generally called bandwidth or "data transfer" limits. This is the volume of information that moves to and from your Web site. Limits vary from 200 MB/month to unlimited transfer. If you know your goal, and method of operation, these limits are not a problem. If you plan to have some downloadables available for surfers, those downloads will count against your bandwidth limit. If you plan to have lots of downloads available, allow for it. If you have site that is mainly for browsing, the number of hits you get will affect your transfer rate. Every time a browser hits a web page, the web page is downloaded to the browser. You can calculate the transfer rate very easily. If a Web page is 2 KB, and contains graphic elements that are 50 KB, each time a browser encounters the page, you will transfer 52KB. This is mitigated by the fact that most browsers perform what is called "caching." That is, they maintain a local copy of the Web page after browsing it. Consequently, if a user returns to your Web page four times in one browsing session, only the first "hit" will create a data transfer. Now, to confuse this picture further, each user can enable or disable caching. So assume 75% of browser hits will cache. Your access log will tell you how many "hits" are new and how many are cached. E-mail may or may not count toward bandwidth limits, If you are sending a 50 KB file to 200 people, that is 10 MB of bandwidth. E-mail can add up. If a provider has no bandwidth limits, it is perfectly acceptable to ask them why this is the case. Allowing hundreds of sites to transfer data at will indicates these possibilities: Huge servers Huge Bandwidth Few customers Great Management (we know what our customers are doing and we have the extra bandwidth to support this offer) Poor Management (Let everyone do whatever they want until we hit the wall) Averaging (Average sites use x amount of bandwidth. I have y sites. Therefore, available bandwidth = total bandwidth - (x * y). Determine the cost for extra bandwidth. Like everything else here, the cost is usually reasonable, but you should ask. Note here that a new practice has emerged at some providers. It is very similar to the disk space overage situation. Some providers offer bandwidth in 1 GB blocks. If your bandwidth utilization goes over your limit, you are charged for an entire 1 GB block of extra bandwidth, sometimes as much as $25. Another method is high charges for exceeding bandwidth limits. A charge of $0.12/MB doesn't sound bad if you're thinking in terms of disk space, but in terms of bandwidth, that $0.12 charge really amounts to $120/GB. Yet another overage situation is the bump to a higher-level server package, just like the disk space situation. Remember that going over the disk space limit can easily be avoided. Going over the bandwidth limitation can be the result of a sudden spike of access, which you may not be able to control. Access statistics Access statistics tell you how many people are looking at your Web pages, how often they are hitting, what they look at, how long the browse, and a host of other pieces of information that often determine the direction of your Web site. These statistics are generated by the Web server, which creates a record for each action of a user browsing your Web site. These records are stored in an access log. Access logs come in several different formats, the most common being the Common Log Format and Microsoft's IIS format. There are a large number of log analysis tools. These tools use your access log as an input file, and create statistics from that log. They analyze usage, provide information on users, hit rates, data volumes, page popularity, etc. Analysis tools range from Freebies that are generally not easy to use to simple, low cost tools like NetIntellect, WebTrends, and net.Analysis to high end server-based analysis tools. A typical raw log entry looks like this: 1. - - [02/Jun/1997:11:18:46 -0400] "GET /images/squiggle_bpm.gif HTTP/1.0" 200 This says that a user with IP address 194.201.183.18, on 2 June 1997, at 11:18:46 requested file /images/squiggle_bpm.gif from my Web site. The "200" code indicates the retrieval worked. Every item retrieved creates a record like this. One web page with 10 images will generate 11 log records (or "hits") for each web page access. Consequently, these log files get very large very fast. mrdopey.accu-info.com - - [02/Jun/1997:11:18:46 -0400] "GET /images/squiggle_bpm.gif HTTP/1.0" 200 This is the same log entry as above, except that the IP address has been replaced by an actual site name --- a user named mrgrumpy at holossys.com. I can now use my browser to locate www.accu-info.com, and determine which company was looking at my site, and what they saw and did. This procedure for looking up the user's name and substituting that name for the IP address is called "reverse DNS." It makes your access logs much more useful. Some providers use this feature to provide useful Web logs to customers. Most providers do not use it, even though Web servers have the capability to perform reverse DNS. You can also buy a product that performs reverse DNS on a log file. A number of these are available from www.tucows.com. Note that no matter what you do, at least 25%-50% of users will have only an IP address. This is a result of the way IP addresses are assigned, not a failing of the reverse DNS products. You will want to know a couple of things about access logs. How often does the provider furnish log statistics. Most do some kind of statistical generation. Is it daily, weekly, monthly? I have been in a position where my hit rate was plunging, and my provider gave me only weekly logs. It is very frustrating. You should get on-line access to the raw log files.More importantly, do you have access to the raw access logs. You can run those logs through a statistical generation package whenever you feel like it. However, if you don't have direct access, you can only use what the provider gives you. "Symbolic" log access is a widely implemented feature. That lets an authorized user view the access log through a browser. Yes, you can save the log to a text file via the browser. Yes, it saves disk space as well. The problem with symbolic log access is size. As the log file grows, the time to display the log gets longer and longer. If you expect substantial hit rates, symbolic access is probably not what you want. By month-end, you will be attempting to view log files that are multiple megabytes. Just as important is a different type of log called a "referrer log." This similar to the access log, but it shows where users were before coming to your Web site. This helps to determine your Web strategy. It shows which sites are really increasing your hit rates. Without this information, it is very difficult to determine a successful strategy for marketing your Web site. Not every service provider offers referrer logs. What kind of bandwidth does the provider have? Providers will advertise various types of communications lines --- analog, T1, T3, OC-3. While it is nice to know the capabilities of such lines, I want to emphasize that management practices, not raw bandwidth, determines the performance you will get from the provider. For instance, I could get a dedicated T3 line. That sounds great!. But if I connect it to a 486-33 machine as the server, the bandwidth won't help much. Likewise, I could get a gigantic server, but load it with a few thousand very interactive Web sites, and performance would suffer. Performance results from a blend of available bandwidth, server management, and Web site design characteristics. Nonetheless, here are the typical line designations and what they mean: 33.6 - this is a standard analog phone line transmission rate. Fine for a single user, A dedicated 28.8 line will generally support several concurrent users if the Web site itself is not high on interactivity. IDSN - This is a 128 K digital line. With roughly five times the capacity of a 28.8 line, a dedicated ISDN line can handle 10-20 concurrent users on an average site (no heavy graphics concentration) with little problem. T1 - 1.544 Mbps (1 Mbps = 1,000 Kbps)- 50 times the capacity of the standard 28.8 analog phone line. Each T1 has 24 "channels" (i.e. voice circuits or lines) each carrying transmissions at 64 KB. T3 - 44.5 Mbps (28 times the capacity of the T1) Each T3 has 672 "channels" OC- 3 - 155 Mbps (100 times the capacity of the T1, more than 3 times the capacity of the T3) Common Gateway Interface (CGI) CGI is another way to execute specific functions on your web site. CGI allows your Web pages to pass data to the server where it can be processed. CGI programs are commonly used to provide functions that are required by a web site. For instance, if you want to present a form that a user can fill out, and have that form E-mailed to you, there are CGI routines that will support this. Other CGI applications create and maintain hit counters, send users to other pages (re-direct), handle security, etc. There are abundant free CGI routines available on the Web. Service providers differ on allowing you to have a CGI-bin directory to contain the CGI routines that you use. Some will let you use your own CGI routines, while some will not. Some have a large library of CGI routines that you can use. Some allow no CGI other than the ones they provide. This is more significant if you are moving an existing Web site to a new provider. If you have used certain CGI routines that you want to take to the new site, it would be wise to inquire about CGI policies. Security I'm not going to get into the details of security, just the overview. There are many valid reasons for needing secure portions of a Web site. The standard situations are: Most of the Web site is free, but "subscribers" can get to certain special sections. Transactions are allowed, but must be handled separately from the rest of the Web site. Two relevant questions apply to security. Does the provider offer a secure server environment? If you are considering on-line transactions, you should offer that capability via a secure server. Secure servers greatly reduce consumer and seller fears of "stolen" information. Can you use password security on a portion of your Web site? Suppose you have a Web site that focuses on an industry (transportation, energy, etc.). On most of the Web site, you offer general information, but paying subscribers can get into your "insider" analytical research. That means that the research directory needs to be protected (otherwise the subscription is worthless). So you need to issue a userid and password to each subscriber. This will allow real subscribers into that section of the Web site, and keep out non-subscribers. To do this, you need to be able to use password security. While password security is standard with most Web servers, some providers disable password security. Some providers don't allow customers to use it all. Some have very low limits (i.e. "two directories with two passwords each'). If this an issue with your site, ask the provider about the password policy. Downtime No provider enjoys downtime. Sometimes it is accidental. Sometimes downtime is caused by required procedures. Nonetheless, it is helpful to know at least the scheduled downtime for maintenance. Minor "nice" features Almst all Web sites use simple features like clocks, hit counters, and forms. It is nice if these are supplied by your Web host. You can find them on your own, but many presence providers have these basic routines built into their systems. This a bigger issue with UNIX systems. ON NT, Microsoft IIS supports all the FrontPage Bots, and O'Reilly WebSite has built-in CGI routines. A simple way to check is to browse the host's Web site to see what kind of documentation is available for these types of procedures. Good instructions indicate a good customer orientation. On UNIX systems, a large CGI library can simplify your development & management chores. E-mail E-mail capabilities are different at each provider. Almost all send and deliver mail immediately. However, some hold the E-mail until specific times when they send and deliver. This could create surprises if you are unaware of the policy. Ask. A mailbox is typically assigned as a place where your mail is sent, and you can read it. In some cases, the provider may include your own POP3 mailbox. This will allow you to set up aliases (e.g. you will receive any mail addressed to yourdomain.com, regardless of the name prefix [webmaster@yourdomain.com, yourname@yourdomain.com, etc - it all goes to the same mailbox]). You can set up the POP3 to react to specified addresses, aliases, message titles, etc. Even without your own POP3, there are E-mail options that can be implemented via UNIX functions. These also may be available. More importantly is the limit (or lack thereof) on your E-mail volume. Some providers include the E-mail volume in the monthly bandwidth quota. If you plan to send and/or receive a lot of E-mail, this could affect your provider decision. Some providers have no E-mail limits. Some include the E-mail in the disk quota as well. All these are variables. It pays to ask. Support Support comes in all flavors --- E-mail only, phone support, newsgroups, chat groups, etc. The best support is a combination of all of these. If you have never developed a Web site before, make sure you have good phone support. Check it out. It's easy to "offer" support. It's hard to maintain good support. Everyone has a phone number. The support number may just be hooked to an answering machine. You can send e-mail, but how fast do you get a response? Are support people actively participating in the chat groups and newsgroups? If phone support is offered, what are the hours? Do they fit your hours of availability? Are the support people knowledgeable? Do they have an interest in the success of the provider or are they just phone answerers? Support is more important if you are just starting out, but support is important to everyone. The Web still holds frustrations, and good support people can keep those to a minimum. Further, new technologies are coming to market, and being tried. Sometimes being the first is a big plus. If you like being on the leading edge, support is even more critical for you. Developer resources An adjunct to support is a developer resources area. This could be a set of Web pages, a newsgroup, on-line conferences, etc.? It's main function is to provide Web site developers with timely accurate information on technologies and services offered by the provider. They may also offer very helpful services, like access to the server's error log. If your program bombs, you can view the error log. Everything that puts the developer closer to the system is helpful. At the very least, there should be good tutorials on the tools offered by the provider (forms, CGI routines, E-mail options, etc.). DBMS access Many sites will need a DBMS interface for data storage and retrieval. Most providers offer some type of database access. You do, however, need to see what type of database access is offered. For Windows NT servers, Microsoft Access and Microsoft SQL Server are almost always available. Sometimes Microsoft Access is available without an initial setup charge. You can generally use the Microsoft database access extensions (or iHTML in O'Reilly WebSite). However, this type of DBMS activity does require the initialization of a DSN on the server side. Check for charges. Offerings vary widely. A hosting package may include only one DSN. There may be an installation charge for each DSN. There may be an installation charge and a monthly charge for each DSN. Note, however, that this is something you cannot "sneak." The server administrator has to set up the DSN. UNIX servers are a little different. They may offer access to a "name brand" DBMS (Sybase, Oracle, etc.), or they may offer access to one of the lesser-known "free" products (e.g. mSQL). A second issue here is the database interface layer. It should be something like Allaire's Cold Fusion. While that genre of product is very good, you will need to check the capabilities to see if the product will support what you want to do. The bigger question is whether you have time and energy to learn a set of new products. Microsoft tends to build in these extended capabilities to its base products. For example, if you use FrontPage to create Web pages, and your server is NT/IIS, it will be relatively simple to add capabilities that connect the Web pages to an Access or SQL Server database. On the UNIX side, the capabilities are often broader, but you must consider that learning a new DBMS and a new interface, then coding and testing is a time-consuming process. Do you have the time? On both UNIX and NT, the setup fee for using an DBMS should be $25 or less. Exorbitant charges for simple database access should be unacceptable. If you know what you need to do, these are not difficult issues. If you aren't sure what you want to do with a database, you can spend a lot of time trying to make choices, which will be based on a guess. Warning Shots Lastly, some general things to watch for. These can serve as warnings. They aren't always indicative of trouble, but certainly they should make you think hard before committing. There are a few presence providers who offer greatly discounted rates on the condition that you sign a twelve-month contract. In a fast-moving technology area, twelve months is a long time. Often, the discounted rate is only the same as any number of providers who have no such 12-month minimum. The minimum may be fine, or it may be an indication that the provider needs money "up-front" to keep going. You have to ask how you would feel about committing to a one-year contract on any service without first trying it. And this type of commitment is worse, because wholesale changes in the technology infrastructure can occur within a matter of months. Silly things like "we charge extra for hyperlinks" should send up a red flag. Strange pricing plans are proliferating. Unbundled pricing (charge for the site itself, charge for E- mail, charge for newsgroups, etc.) is generally a method for appearing to offer very low prices. For instance, if I unbundle the pricing, I can offer Web sites for $15/month. Then if you want a mailbox, that's $5/month. Newsgroup access is $5/month, databases access is $10/month. I've added three basic capabilities, and the price is now at $35/month. On the other hand, unbundled pricing may help you. If your E-mail and newsgroup access is being handled elsewhere, and you don't need DBMS access, unbundled pricing really can A Key Point There is no requirement that one service provider handles your connection, E-mail, newsgroups access, and Web site. Geography doesn't matter here. You can have a local connectivity provider in New York with a Web site presence provider in California. In fact, this may make sense for support hours. You are not limited to having one service provider. I would recommend disentangling these functions. Find a good local connectivity supplier (ISP), and then look for a presence provider who specializes in Web presence sites. You generally get more capabilities for less money. Search If you plan to have a Web site with heavy content and/or numerous files, it will be a benefit to have a search utility. If you use FrontPage, and your provider has acceptable FrontPage extensions, you can use the built-in search Bot in FrontPage. O'Reilly WebSite also has a built-in search facility (WebIndex/WebFind). Other search engines are also available from the relatively new, like Harvest and SWISH to the tried and true WAIS. Another option is to find a presence provider that offers Excite for Web Servers. EWS is simple to set up, works well, and provides for selective indexing. That is, you can decide which directories and/or files you want in the searchable index. O'Reilly WebSite's search also lets you pick your targets. Such has not been the case with Microsoft Index Server, indexes your entire site --- no choices. The Index Server control mechanism is the IDQ files, which determine limits on what can be displayed as an answer to the search query. IIS V4 offers user control of the IDQ files, if the service provider enables that feature. This is one of those "minor issues" which could become a major issue, depending on the content and design of your Web site. In my case, I have over 500 files and another 3,000 links to maintain. Selective indexing is a must. So for me, Excite was a major factor in IPP selection. FYI - If you are told that "sure you can do a search, we have the ___ search engine available," use AltaVista or HotBot or some other Web search tool, and look for instructions on installing that search utility. Often what seems simple to the technical support people may not be so simple for you. This is especially true of UNIX search utilities and non-commercial search engines. Things like Harvest, WAIS, and SWISH are excellent search mechanisms, but their "ease of use" is another issue. Look around the net, and find some documents with installation and setup instructions. If it looks too difficult, it probably is. References Don't be afraid to ask for references. That Web site is your business. Your customers ask you for references. You may be more comfortable getting references as well. Just be sure to find references that have something in common with your situation. If you are just starting, it does you little good to talk to a happy customer named "UNIX Aces" as a reference. Likewise, if you are an experienced Web developer, you may have little in common with "Ken and Karen's Wonderful Web Page." Other The big and broad "Other" category could include just about anything. Is Java supported on the Web server? Java is already becoming a requirement for an advanced Web site. Most Servers have no problem with Java, but, again, you may want to double-check. Does the provider allow adult-oriented sites? This is important for two reasons. If you personally disapprove of adult-oriented material, you may not want to do business with a provider who offers such services. On the operations side, adult sites can have enormous hit rates that can easily consume a server. Many providers will segregate their adult sites on different machines from the regular sites. That is better, but can still create a bandwidth problem as the adult site can create high volumes of network traffic. The best of providers handle adult Web sites in a totally separate environment. Many providers simply don't allow adult Web sites. Conclusion Treat this like a business deal, not a toss of the coin. Your Web site is your image to the wired world. What seem like trivial issues may be enough to dissuade visitors from returning. Do things right and, like any business, you will prosper. However, unlike any other business, Web services and quality may be totally unrelated to charges. Shop carefully.