Friday, December 5, 2008

Can you find what you're looking for?

Most Archiving solutions do a great job of storing information in an efficient manner for a long period of time. They vary in the ways they collect, store, and single instance the data. Some select an archiving solution for the way they collect the data and store it. Some may select an archiving solution for the user interface or how easy it is to use and deploy. An area often overlooked is technology behind the search that is used to find data within the archive.

Finding what you need in a timely matter within the archive is vital when that data is requested by the courts or other entities. When purchasing an archiving solution most will use the search engine that is included within the product without even thinking about using another solution. Enterprise search solutions have advanced technology used to analyze language, meaning, and concepts in your data and use that to produce more relevant results and provide a better rank for the results.

Most would think that the best place to search for a good enterprise search solution is to go the search leaders. Start with Google and Yahoo then maybe Microsoft. Although these players all offer excellent Internet search engines they do not offer the same level of advanced search required for enterprise search needs.

A recent Article posted on the ITPRO website in the UK talks about the the differences between Internet search and enterprise search. For example the relevancy rating in Google considers how many other pages link to a particular page. This does not apply in searching for enterprise data stored on file servers, SharePoint sites, and in email archives. Relevancy should be based on the meaning of the content of the data along with the metadata stored with that document.

Each solution described in the article uses a different technology to determine the meaning of the content within the data and how it bases the result relevancy on that data. For example Recommind uses statistical models built from the semantic analysis of existing data to produce relevant results and Autonomy uses Bayesian statistical models that use inferences from past searches to influence the relevancy of future results.

The article is a very interesting read if you are looking into search technology for your enterprise or to support your existing archive solution.

Here is the link: http://www.itpro.co.uk/608925/why-enterprise-search-is-not-internet-search

Technorati Tags:
, , ,

Monday, October 27, 2008

No attacks on Microsoft Planned...

I am not sure which is most concerning. The fact that a corporation can buy a fighter jet or the fact that Larry Ellison already owns one.

Don't worry Microsoft, it's only a light attack jet. :)

http://bits.blogs.nytimes.com/2008/10/23/a-new-fighter-jet-for-googles-founders/

Thursday, October 23, 2008

Archive or Backup?

Recent research by CMS Watch shows that some IT managers still do not understand the differences between e-mail archiving and backup.

Backup is for the short term risk associated with a potential disaster that could prevent users from accessing their data. Backups should be easily available and provide a quick restore to a previous state before the disaster occurred.

An archive is used for the long term efficient storage of data. This includes features such as single instancing (de-duplication). Storing data long term in an archive preserves it and ensures that it is not tampered with. Ensuring the integrity of the data is vital in a legal discovery request. Archives also allow for the quick search and access to that data without restore to an email server.

Article link: http://www.cmswire.com/cms/enterprise-cms/cant-tell-an-email-archive-from-a-backup-solution-003350.php

Technorati Tags:
, , ,

Tuesday, October 21, 2008

Searching data is a burden for Google.

I read an article from eWeek today that I found interesting. Apparently Google was recently served with a subpoena for documents related to a court case it was involved in as an investor. They argued that it would be an unreasonable burden for them to search their own files.

I am not sure whether or not the judge believed them but it seems that a company that corners the market on search technology should not be able to get away with that argument. Although, Google isn't really considered a legal discovery search tool.

A legal discovery search tool provides something extra with their search technology. In the article they call it "concept search". This is the idea of doing an analysis of how the search terms are used in the data that is returned. Knowing the meaning of words and the way they are used in the data set allows for the inclusion of false negatives and the elimination of false positives from the results that are returned. A more accurate search means less review time before presentation saving companies money and lessening their burden to produce data.

I believe that as legal search technology improves the argument of unreasonable burden will be less successful.

Article Link: http://www.eweek.com/c/a/Security/Google-Cant-Search-Their-Own-Documents/

Technorati Tags:
, , , ,

Monday, August 25, 2008

George needs an archive!

I thought that the following article from the AP last week was an interesting insight to the fact that even our government still does not understand the importance of Archiving:

http://hosted.ap.org/dynamic/stories/W/WHITE_HOUSE_E_MAIL?SITE=AZMES&SECTION=HOME&TEMPLATE=DEFAULT

The White House is missing 225 days of email. They are accepting bids for a recovery process that will include 35,000 disaster recovery tapes. 35,000!!! As a former Exchange administrator and as some of you are as well we can all understand the time it takes to restore 10 tapes let alone 35,000. I am sure that the bids for this project will be quite large as the labor involved will be high.

This should be a lesson to all those out there still using backup tapes for recovery or for your archiving solution. If an archiving solution was in place at the White House there would be no need to restore backup tapes in an effort to find emails that are being requested. The information could be easily searched for from the archive and exported in a readable format or even recovered as individual messages.

I think that the tape backup era is coming to a close. With so many different solutions on the market that now offer better performance than tape I can’t imagine anyone sticking with tape as a medium to store data for disaster recovery. Take Microsoft’s DPM for example. Although they have the option to store data on tape they also have the option to store online snapshots of the Exchange email system for quick disk-based recovery in the event of a disaster. There are also a number of other storage vendors on the market using a snapshot technology to backup Exchange and other email systems. As the cost of storage decreases the popularity of these disk-based backup systems is increasing.

Using a disk-based backup system in combination with an Archive would mean that organizations no longer need to rely on tape backup for their email systems. The disk-based backup can be used for near term recovery (under 1 year) and the archive for longer term recovery. Information over 1 year old is typically only requested for discovery purposes. Most users would not make requests for restore of their email if they have not accessed it in a year.

It will be interesting (and painful) to see how much this process might cost our government. As you and I both know it is not the government that will be paying. It will be us with our tax dollars.

George – Please buy an archiving system!

Technorati Tags:
, , , , , ,

Friday, August 15, 2008

Bigfoot exists, the world is flat, and executives are still concerned about eDiscovery!

One out of three of these are confirmed. You decide which…

An article from the National Law journal states that 2 out of 5 executives are still bothered by electronic discovery. The article references a survey by Deloitte Financial Advisory Services of more than 520 executives from banking, securities, financial services, and technology industries. Here are some interesting data points from the survey:



  • 17.5% of respondents said they were not ready to handle eDiscovery requests

  • 11.8% reported no specific policies for data retention or destruction

  • 47.5% worried about the cost of discovering data in their organization

  • 16.3% feared court sanctions from failure to respond to discovery requests

  • 12.9% worried about meeting court deadlines


This survey tells me that executives either through fear of the unknown or lack of budget are still not adequately prepared for an eDiscovery request. Being proactive about eDiscovery by installing an archiving system, mapping the data in your environment, and testing the plan you have in place will greatly reduce the anxiety these executives are feeling.

If they are not worried about eDiscovery then they must certainly be worried about Bigfoot. This week Fox News reported that a couple of guys from Georgia found the body of a Bigfoot. I used to never worry about a Bigfoot coming out from the woods and assaulting me but now I am not so sure. You will have to look for yourself and decide (there are pictures here). I myself can’t wait for the press conference this morning.

I also used to never worry about falling off the face of the earth while circumnavigating the globe but apparently that is still called into question as well. I came across a website this week for a group (they call themselves the Flat Earth Society) that believes the earth is flat. They even have theories about why the earth is flat, why most don’t know it and have been lied to, and how travel can happen with a flat earth. It is interesting to look at if you have some extra time on your hands.

I am off to the store now to stock up on supplies to protect myself from the Bigfoot. I know that garlic is used for vampires but what keeps a sasquatch away? :)




Technorati Tags:
, , , ,

Friday, August 8, 2008

It hurts to say it... My wife was right!

I can’t tell you how many times I have heard my wife say “pull over and ask for directions” or “it wouldn’t kill you to look at a map once in a while”. I hate to say it but she is right (don’t tell my wife I said that). Or at least she is right when it comes to data discovery.

Correctly mapping the data that exists in your organization is a great step towards a pro-active eDiscovery plan. You can’t know how to get the data in a timely manner if you don’t know where it exists and you can’t stop and ask anyone for directions once that legal request comes in. Proactively planning by creating a map that shows where the data exists, what type of data it is, and how you can access that data will go a long way in shortening the time to your final destination of providing the data requested.

According to an article in the August 2008 issue of Inside Counsel, creating an effective data map starts with a meeting with IT. They might already have a map for storage purposes and it can be expanded to contain the additional information needed for eDiscovery. It is also recommended to meet with the owners of the data. They will be able to tell you what type of data is contained in their system and of what value it is to the organization.

Creating a data map can be a big task for a large organization. One solution is to reduce the number of places that you might need to go to retrieve data. One way to do that is through an effective archiving solution. By storing multiple data types (email, files, SharePoint, IM, etc) in a single repository you reduce the number of data stores you need to visit to discover data. This greatly simplifies the legal hold process and reduces the time to fulfill requests.

After you have an effective data map in place it is also important to practice. Create a legal request for yourself and see how long it takes to produce results. It will ensure that the data map that you have created is useful and accurate. It will also keep your wife from saying “can’t you find out where you are going before you start driving?” or “Two words: Google Maps”.



Technorati Tags:
, ,

Thursday, July 31, 2008

This Cloud has a Silver Lining!

A while ago I did a webcast with an analyst firm on storage in the context of archiving. One of the topics of that webcast was the changing standards and technology for storage changing over time.

If you frequently take photos with a digital camera you have no doubt thought about this topic. Think about it, you have probably either filled hard drives or burned CDs or DVDs to store the hundreds of photos that you are taking that you want to preserve forever. How do you know that the storage mechanism you use for those photos will be around for your lifetime?

In my lifetime I can remember several different types of storage mechanisms. When I was much younger my father had a mainframe computer in the basement of our house. I would spend a lot of time watching the reel to reel tapes spin to load data to and from the machine. When we were old enough to use the TRS-80 that he had we would load programs into memory from a cassette tape (you had to listen for breaks in the noise on the tape to find the programs). The 8 inch floppy disks we moved to next made it much easier and faster. I also remember our first 20MB hard drive. Wow, 20MB, how could we ever fill that?

I think you see my point. I can’t go down to the local best buy and get a reel to reel reader for a computer tape. With the changes in interfaces I am not even sure that old 20MB hard drive can be attached to a modern PC let alone the OS reading the cluster size on the drive. As we store more and more data on removable media this issue is something that needs to be kept in mind.

That brings me to archiving. How can you be sure that the storage mechanism you use today will be in use 5 to 10 years from now? I noticed that an archiving vendor announced that it now has support for blue-ray DVD storage this week. How do we even know that blue-ray disks will last 10 years? I know that they must perform some type of accelerated aging on the media to test its longevity but the technology has only been around a couple of years for real world tests.

I believe that one answer to this problem will be “cloud storage” (I was hoping to use this buzz word soon!). Storing data in “the cloud” will make keeping up with evolving technology the problem of the company providing the storage. If companies providing “cloud storage” want to compete for archiving storage they will need to address this problem and state it in their messaging. “Store data with us because we will ensure it can be accessed when you need it regardless of the changes to technology over the years.” Archiving vendors will also need to plan to partner or integrate with these solutions.



Technorati Tags:
, , , ,

Lemmatization and Stemming

A colleague brought up the term Lemmatization to me today and I thought it would make a great blog topic. After realizing that he wasn’t insulting me I looked further into the meaning of the word.

My first stop was the online dictionary. Lemmatization apparently means “the act or process of lemmatizing” (thanks a lot online dictionary). Not being happy with that definition I looked at Wikipedia, the most truthful and valuable source of information on the web (sarcasm intended). They offered additional assistance.

Lemmatization is the process by which a word is taken and reduced to its Lemma or its canonical form. For example if you take the word “walking” the lemma or base form of that word would be “walk”. Or if you take the word “better” the lemma of that word would be “good”. Lemmatization takes into account the context and meaning of the word when determining the base form of the word.

Stemming is the process of reducing the word to its root form without necessarily taking into account the context or meaning of the word. For example the words “walking”, “walks”, and “walker” would all have the root of “walk” but the word “ran” would not have a root of “run” when using a suffix stripping stemming algorithm. Lemmatization algorithms are more accurate in that the meaning and context of the word is considered.

Why is this important? Using lemmatization in searching for data in your archive during a discovery request provides more accuracy in that you receive results based on meaning and context which is much more valuable than a straight keyword search.

The next time you are at a major social function please remember this blog. I am sure explaining the meaning of the word lemmatization will not only impress your many friends but also make you the life of the party (or you might be assaulted, it could go either way…). :)

Wikipedia source: http://en.wikipedia.org/wiki/Stemming and http://en.wikipedia.org/wiki/Lemmatisation


Technorati Tags:
, , , ,

Wednesday, July 30, 2008

Archiving IMs

Ben Worthen posted a blog entry at The Wall Street Journal yesterday on a problem with Wall Street traders using instant messaging. They were going to have to cease using their Thomson Reuter’s IM service without a compliance system in place.

The SEC requires that IM messages are monitored and retained in order to maintain compliance. Apparently the software that Thomson Reuters was using could no longer be used due to a dispute over the licensing. A new system is in place now and the crisis was averted.

It is interesting to think about the branching out of compliance for other systems outside of email. Instant Messaging used to be considered a very risky tool and many have delayed or avoided deploying it in their organization. Since most public instant messaging systems are not controlled or governed it is easier for insider information to be exposed to the public without the company’s knowledge.

Instant Messaging system adoption is increasing rapidly. College students today rely on texting and IM technologies for things like communicating with their friends and collaborating on school projects. As they move into the corporate world they want to continue to use the communication tools that are familiar to them. It only seems natural to them to use instant messaging instead of the phone to collaborate on a project they have to complete as a team.

Microsoft is also pushing the adoption of IM. With Microsoft’s Office Communication Server, Instant Messaging is becoming more integrated into phone and email communication. More companies are seeing corporate instant messaging as a business tool rather than a way to communicate with your wife that you will be late coming home from work (Sorry honey, I will be late tonight). As this adoption continues companies struggle with what that means for compliance.

The SEC is clear on what it means for financial institutions as Ben states in his blog. Financial companies need to monitor and save instant messaging discussions. This is where I believe that archiving can help. Instant messaging archiving is growing need for companies deploying these instant messaging solutions. An effective archiving solution should store these discussions and index them so they can be monitored and discovery requests can be fulfilled. As the demand increases archiving vendors will be called upon to meet this need.

Ben’s blog post: http://blogs.wsj.com/biztech/2008/07/29/did-traders-almost-have-to-stop-sending-instant-messages/



Technorati Tags:
, , , , , ,

Archiving Provides Efficient DLM

SearchCIO.com published a well written article on Data Lifecycle Management today. The article included archiving as a means to a successful DLM plan. HSM (Hierarchical Storage Management) is only part of the picture in DLM. Archiving must play a key role.

Archiving not only provides the management of the data but also indexes and stores the data in an efficient manner that HSM systems can benefit from. Typically archiving systems can store a single instance of data that is duplicated across emails systems, file systems, SharePoint and other systems from which data is archived. By storing only a single copy the document the storage needs for the HSM system are greatly reduced.

How does this fit into DLM? Storing the data is efficiently is only half of the story. Managing the data by applying retention across all data sets uniformly ensures that data that is at the end of the lifecycle is removed. It also ensures that data that needs to be kept for legal discovery purposes gets kept.

The article is a great read and includes information about the cost savings realized from storing data more efficiently thanks in part to archiving solutions.

Article Source: http://searchcio.techtarget.com/news/article/0,289142,sid182_gci1323186,00.html#

Technorati Tags:
, , , , , ,

Thursday, July 24, 2008

Need a new career?

I recently came across a career profile on About.com for an e-discovery professional. In the profile they state how the industry has grown to be a $2 billion industry and will be $21.8 billion by 2011. Since the industry is growing so rapidly the need for companies to hire professionals that specialize in e-discovery is also growing.

The profile of an e-discovery professional starts with a person who has a background in law or in information technology. Usually paralegals are interested but with rising salaries more lawyers are getting into the mix. It is interesting to note that some come from backgrounds in information technology. Since half of the problem of e-discovery is the gathering and storing of electronic data it is not surprising that an IT professional might choose this as a career path. I would suspect that legal professionals with a background in IT (rare) would be the best candidates.

So what are the responsibilities for your new found career in e-discovery? The profile states the following responsibilities:

  • Assessing a client’s ESI.
  • Helping to create ESI preservation policies.
  • Serving on e-discovery teams.
  • Ensuring compliance with the new federal rules regarding ESI.
  • Educating clients on e-discovery policies.
  • Drafting and communicating litigation hold procedures.
  • Using technology to facilitate discovery.
  • Assisting in the collection, processing, review, analysis and production of ESI.
  • Serving as a liaison between the legal team, IT personnel, vendors and records management personnel.

I know all this sounds glamorous but what is the pay like? I am sure that this industry is so exciting that most will want to do the job just for the prestige and fame it brings. Although if you do expect to get paid then you can look for a salary in the range of $125,000 to $250,000 annually.

So its time to brush up that resume get a new tie and apply at your favorite corporate legal department. Your new career in e-discovery awaits. Don't be late for the interview!

(Oh, and don't wear the tie into the office of your current job after interviewing. Everyone will know what you are up to.)

Source: http://legalcareers.about.com/od/careerprofiles/p/e-discovery.htm



Technorati Tags:
, , , , , , ,

Asian CIOs get it.

Recent survey results posted by ZDnet Asia showed that regulatory compliance sits high on the priority list for Asian CIOs. The survey also stated that their c-level peers did not share the concern. Although I believe it is typical for CIOs to see what is happening in the IT market before other c-level executives it is interesting that other executives in Asia have not caught on yet.

It seems to me that regulations regarding email compliance are more defined in the United States since the Federal Rules of Civil Procedure were put in place. When I talk to companies overseas the regulations are less clear. Privacy concerns and contractual obligations seem to be the main driver. For example, in some regions of Europe making sure that administrators can’t read the personal email of their employees is a key requirement. When the archiving system exports email from the messaging server there needs to be a means to allow employees to keep their personal emails from being archived. Implementing something like this for US employees might expose companies to risk.

It’s nice to see that regulatory compliance is finally becoming an important need in Asia. Hopefully they will be motivated to define procedures without a case like Enron.

Article Source: http://www.zdnetasia.com/news/business/0,39044229,62044095,00.htm?scid=rss_z_nw



Technorati Tags:
, , , , ,

Tuesday, July 22, 2008

E-Discovery survey confirms archiving investment.

A survey sponsored by FIOS that interviewed legal professionals from fortune 500 companies confirms an investment in email archiving tools. The survey respondents consisted of 26 lawyers and 2 non-lawyers responsible for legal discovery.

79% of the respondents stated that they had made investment in technology in the last 12 months to assist in legal discovery. 39% of those made investment in email archiving second only to legal hold management (50%). I believe that this is a trend that will continue as companies realize the best way to protect themselves or assist themselves with e-discovery requests is to proactively implement an archiving solution.

Other interesting technology trends from the survey:

Automate legal hold – Over 30% purchased litigation hold solutions
Centralize email – Email archiving systems are being implemented to give companies a single location for search and management.
Increase insourcing – 79% of respondents invested in e-discovery for in-house solutions.

When I read this article it was clear to me that a good email archiving solution can effectively address the technology trends that resulted from the survey. Legal hold can be implemented easier by having all email within an archive. Email archive solutions can centralize email from multiple messaging platforms. Email archives also provide an efficient storage mechanism for keeping email data in-house.


Article Source: http://www.law.com/jsp/legaltechnology/pubArticleLT.jsp?id=1202423099886



Technorati Tags:
, , ,

Saturday, July 19, 2008

Do social networking sites expose companies to risk?

I was sitting in my kitchen this morning going through my normal Saturday morning routine. Having a homemade Amish doughnut made by the Amish restaurant in town and a large cup of coffee. (I know the doughnuts are not made by real Amish people but just by the people they hire. They are still good though.) As I was reading the paper (also a part of the routine) I came across an article that got me thinking…

The story was about how web profiles on social networking sites such as FaceBook and MySpace can negatively affect clients defending themselves against cases such as drunken driving accidents. In a couple of cases when the person driving was found to be drunk and they did bodily harm to another their social networking page hurt their case. The photos stored on their site showed them to either be at parties with alcohol, drinking alcohol, or wearing clothes that advertised for alcohol. Showing a defendant for a drunken driving charge having a good time with drink in hand right before sentencing damaged their case and they received stiffer penalties.

This does not have much to do with corporate discovery cases yet but stay with me on this one. It got me thinking about how a lawyer might look to new locations for discovery requests when companies are being sued by their employees. For example let’s say that a disgruntled employee is fabricating a harassment claim against the company they work for from a recent company party they attended. If employees at those companies posted photos of the event in question on their social networking sites then those photos could be misinterpreted in favor of the employee filing the complaint. The actions of a few at the party might be captured in digital form showing that the party was either out of control or unsafe.

Although I do not know of any cases yet where discovery requests were sent to FaceBook or Myspace for personal photos (I am sure that it has happened), I am sure as these sites grow the risk will become greater. Personally I have gone to my fellow employees’ blogs or profiles to see pictures from events that I could not attend. I have also learned just by doing this blog that my company has polices that serve to protect them from liability and protect myself form posting anything that could get me into trouble (see “The Fine Print” to the right).

As an employee for a public company your actions in public are not only a reflection on you but a reflection on the type of employees the company you work for hires. The types of things you post on your social networking site reflect upon your public life and since most post what company they work for on those sites it could also reflect upon the company you work for as well. I heard a lawyer once say that there is a “grandmother test” to find out if you are doing something that could be deemed inappropriate or illegal. If you can’t describe to your grandmother what you were doing in that photo you posted of yourself at that party then it probably should not be posted. :)

Article Source: http://www.dispatch.com/live/content/national_world/stories/2008/07/19/Facebook_evidence.ART_ART_07-19-08_A1_PVAPRIU.html?sid=101

Technorati Tags:
, , , , ,

Friday, July 18, 2008

Archiving concern grows...

Network World posted an article today by Michael Osterman. They recently conducted a survey and found that 36% of decision makers are more concerned about email archiving today than they were 1 year ago. 30% were more concerned about e-discovery. Only 12% to 13% were less concerned about email archiving and e-discovery.

This shows that most companies have yet to sufficiently solve their email archiving and e-discovery needs. This market and the archiving and discovery needs of companies are still being defined. As each new ruling is made and companies struggle to understand the FRCP their concern increases. I believe that this is a sign that the archiving and discovery market is still not mature and will see substantial growth.

Source: http://www.networkworld.com/newsletters/gwm/2008/071408msg2.html

Technorati Tags:
, , ,

Wednesday, July 16, 2008

Terminated!

This article really pumped me up…

It looks like our friend Arnold is having some trouble producing documents for an eDiscovery request. So who is the terminator’s worthy adversary? A group of juvenile parolees. The parolees filed suit because they claim that their 14th amendment rights were violated. According to the parolees the discovery request they made was significantly delayed and caused detrimental harm to their case. They also claim that Arnold failed to produce all documents that were requested including databases and logs.

Although Arnold was able to produce over 55,000 documents the court determined it was not enough to comply with the order. Arnold’s office also claimed that there was only one person performing the discovery search and they had been overwhelmed with requests and she had family issues during the time the requests came through. The judge was less than sympathetic stating that it was not appropriate to assigned just one person to the request and that the defendant failed to solve problems with discovery conferences.

Arnold also took it on the chin for the type of documents produced. The documents were converted to .pdf before being sent to the plaintiff. In the documents original form they were searchable. Once converted to .pdf they were not. This created additional burden for the plaintiffs in the case. The judge ruled that if the documents in their original format were searchable then there is no reason why they can’t be given to the plaintiff as searchable documents.

I guess the lessons learned from this are that you should take all discovery requests seriously. Follow the EDRM process and meet with the requesting party to make sure that you understand the request and resolve any differences. If a good faith effort was seen by the judge in this case then he would not have mentioned the problems with the discovery conferences. Also, produce documents in either their original format or in a format that is useful to the requesting party. You might be seen as being uncooperative if you produce documents that cause the requesting party to have an additional burden during the review process.

Hasta la vista, baby! :)


Source Link: http://www.rcalaw.com/component/option,com_myblog/show,judge-sanctions-state-of-california-with-ediscovery-violations.html/Itemid,0/

ERDM Process: http://www.edrm.net/



Technorati Tags:
, , , ,

Tuesday, July 15, 2008

You want my voicemail too?

I was recently asked by a colleague what I thought the most interesting challenge was going to be for archiving over the next couple of years. My response was voicemail archiving. With the last release of Exchange Server Microsoft has made it easier for archiving solutions to grab voicemail and store it in their archives. Since voicemail can now be part of an existing email as an attached .wav file there is no reason why an archiving solution should not be able to store this data. Storing the data is the easy part and probably not the most interesting evolution in archiving but indexing and searching that voice data is.

An Article from Wisconsin Technology News that has been sitting in my inbox for a while now recently recaptured my attention. It discussed how the evolution of VOIP technologies can raise eDiscovery concerns for companies. Since voicemail is stored as a file and unified communications solutions store this data in the email system courts are now looking at this information as discoverable.

In pure electronic form the voicemails are not very useful. Solutions on the market today that index data do so by using the text contained in files and emails. The data in an audio file would need to be converted to text first and then indexed to be easily recalled in a discovery request. The article describes three ways of grabbing these keywords for indexing:

· Phonetic – Looking at patterns in the speech to determine words (Not accurate or likely to happen in my opinion)
· Manual transcription – Sending the voice file away to be manually transcribed (Prone to human error and expensive)
· Automated transcription – Using a speech to text conversion process (Most probable solution)

All three methods for gathering text from voicemails have yet to be perfected and all still seem to have slight deficiencies. It is my opinion that it will be very expensive for companies to produce voicemail data in a discovery request until a solution that contains a suitable filter (that can be integrated into existing indexing engines) for converting speech to text (effectively and without error) can be created.

Article Link: http://wistechnology.com/articles/4789/

Technorati Tags:
, , , , , , , , , ,

Monday, July 14, 2008

Email archiving provides no excuse!

You have all heard the same old story from local to state governments wherever you live. Some major scandal takes place, an eDiscovery request is placed, and for some odd reason we find out that no email is being retained for the employees involved. Or worse yet it has been maliciously deleted and not recoverable. Some government agencies even have a policy to purge email on a routine basis.

An article from the AP was floating around today that discussed this situation. In three instances state employees were told to either routinely or deliberately delete email from their mail systems and backup tapes. The main problem is that these emails are being destroyed before they have the chance to be viewed as public record.

State governments are finding out what corporations found out long ago. Even though you might have a policy of routinely deleting email it may not protect you in all cases. Emails should be treated as public record and should be retained for a period of time. In some states guidelines are set to define what a public record is and employees are expected to determine what to save. Although a process that relies on manual classification of email lends itself to human errors and inconsistencies.

So what are government agencies to do, keep everything? In the article Joerling was worried that keeping too many records may not make it efficient for state agencies to recall email when requested. On this point I have to agree with the government activists. It is better to keep everything and store it efficiently in an email archive where it can be indexed and easily searched. With modern archiving technology there is really no excuse not to produce public data when requested.

AP article link: http://hosted.ap.org/dynamic/stories/T/TEC_SAVING_E_MAIL?SITE=AP&SECTION=HOME&TEMPLATE=DEFAULT&CTIME=2008-07-14-04-35-52

Technorati Tags:
, , , , , ,

Even Google says its complex…

So what is it that the search giant Google finds complex? Answer: eDiscovery. In an article posted on NewsFactor (link below) Google states that they are relying on partners to assist with the complex nature of eDiscovery. In the first paragraph of the article I found this quote interesting:

"We see archiving as the foundation for later stages of the process, and are working with partners in the e-discovery ecosystem to ensure that data can be imported into other technologies."

Even Google recognizes that a good eDiscovery plan begins with archiving. In order to produce information in days and not weeks when that eDiscovery request comes through it is important to have a centralized archive that can contain multiple content types. The less number of places a search engine or discovery product has to access in order to find data the quicker the result.

The article also provides what Google and other search vendors find important for a good eDiscovery plan. They are:

· Broad file format – Support indexing multiple file types
· Text analytics – Searching for words and meaning
· Reach – Searching all systems
· Testing – Text an execution of your plan

I think that the last bullet might be overlooked more often than not but it is great advice. Test out the plan you have in place for eDiscovery by submitting a request to your team and see how well you can execute the request. Measure performance in not only the time it takes but also in the accuracy and completeness of the search. It might be surprising what gets overlooked.

http://www.newsfactor.com/story.xhtml?story_id=0320011LUI68



Technorati Tags:
, , , ,

Monday, July 7, 2008

Larger Mailboxes in Exchange

A whitepaper was posted this month at the Exchange Server TechCenter by Tom Di Nardo (Senior Technical Writer, Microsoft Exchange Server) on the subject of planning for larger mailboxes in Exchange 2007. In the white paper Tom makes reference to stubbing by third party archiving solutions. Here is what he says about the administrative problems with using stub files in Exchange:

Server performance Removing the message bodies and attachments from Exchange reduces the mailbox size, but it does not significantly change the server performance for users accessing Exchange via Outlook in online mode and Outlook Web Access. Item counts are the primary performance driver for the Exchange store, and not aggregate size. For example, server performance with a folder containing 100 KB of full e-mail message items is similar to a folder containing 100 KB of stub files.

Client complexity Because the use of stub files with a third-party archiving solution requires the deployment and use of Outlook add-ins, a significant amount of time must be spent by administrators to deploy and manage these add-ins. Administrator time is also required to assist end users with technical difficulties using the add-ins. Not deploying stub files removes all of this additional administrative work that must be performed, thereby allowing more time to administrators and end users.

I could not let this go by without some comment. First I am not sure I understand why he states that server performance is a problem with stubbing. While I might be in agreement that stubbing does not show a drastic improvement in server performance it does not adversely affect server performance. If I can increase the number of items in my mailbox that I have immediate access to without going to the archive and not degrade performance is that not an improvement? Having the 100KB of stubbed items might be similar in performance to 100KB of full data but it allows users easier access to more items without going to an archive.

The other thing he mentions as a negative is the use of a third-party add-in for outlook. I can’t cover how all archive vendors handle stubbing in the Outlook client so I will only speak to what Quest does (Full disclosure: I work for Quest and am probably biased). Archive Manager deploys a form through Exchange to the Outlook Client. No need for an add-in or a client installed at the desktop. While I can agree with Tom that this might be slightly more to set up, the benefits out-weigh the administrative burden.

However I do believe that the archiving market is headed away from storage management for email. As storage becomes cheaper and easier to manage for messaging systems (Exchange is a good example) the need is shifting towards longer term storage of data for compliance purposes. As users adopt the Exchange 2007 platform they might have less of a need for storage management. Not everyone is on Exchange 2007 though. There are still some customers on Exchange 5.5. Until Exchange 2007 (and future versions) is adopted in 80-90% of the market administrators will need a solution and storage management will still be feature that Archiving vendors will need to deliver.

White Paper link: http://technet.microsoft.com/en-us/library/cc671168(EXCHG.80).aspx

Technorati Tags:
, , , , , ,

Wednesday, July 2, 2008

eDiscovery, Quotas, and Hosting, oh my....

Three articles caught my eye today that I really wanted to comment on...

eDiscovery

First, there is an article from Baseline Magazine on how to prepare for an eDiscovery showdown. There were a couple of things that I paid attention to in the article. The first is that CIOs are not sure of how much a legal discovery request could really cost them. They might have a great storage plan but the data might be stored in legacy systems or stored in off site locations that could be expensive to locate and retrieve. Imagine if that legacy system that was used for the email storage no longer existed as a standard and could not be recreated to restore the data. Information might not only be costly to retrieve but in some cases impossible. This is why a centralized archive is so important. Everything is stored in one place in non-proprietary formats that allow fast retrieval.

The other interesting item in this article is where it mentions that CIOs might be responsible for storing VOIP data or voicemails that result from VOIP calls. This is very interesting. Imagine the conversations you have using VOIP being recorded, archived, and (through voice recognition) searched.

Here is the link: http://www.baselinemag.com/c/a/Storage/Preparing-for-an-EDiscovery-Showdown/

Quotas

IDM Online posted results of a survey that polled coporate email users and showed that 65% of the respondents had quotas in place that limited the size of their mailbox forcing them to manage their email storage. Of those, 66% managed their mailbox in a way to retain mail using means outside the email system including home email accounts or pst files.

Imagine the risk that companies are exposing themselves to because of mailbox quotas. I really do believe that the need for archiving is shifiting more towards compliance in the future and away from storage management. Although this survey shows me that the move has not happened yet and the need to control mailbox sizes seamlessly without having the users resort to pst files is still great.

Here is the link: http://www.idm.net.au/story.asp?id=9724

Hosting

Last but not least Byte and Switch posted an article describing how the hosted archiving and discovery market is growing. It is interesting to see how many vendors there are in the hosted archive market. It is not as many as in-house vendors but there are still a lot of vendors for that market segment. I only have one observation about this.

If email is moving toward SAAS and is hosted offsite and the archive for that email is hosted offsite as well then no email data is stored in-house at all. Aren't companies exposing themselves to risk by not maintaining a copy of their data in-house? What happens if they pick a vendor that is not in it for the long haul and closes up shop? What happens if they want to migrate to another hosted email service and still need access to the legacy data? It is my opinion that an in-house archiving solution provide companies with a safe place to keep data that is platform agnostic. If they ever decide to change email services then access to their legacy data is maintained.

Here is the link: http://www.byteandswitch.com/document.asp?doc_id=158102&WT.svl=news1_1

Technorati Tags:
, , , , , , , ,

Monday, June 30, 2008

Journal Envy?

I was going through some old email today and I came across an email discussion with a colleague on journaling in Exchange 2007. As a result of that discussion I was able to clear up some confusion on what type of journaling is included with Exchange 2007 according to the way you have it licensed.

As you may or may not know journaling in Exchange is the ability to record all communications that take place within Exchange. Archiving vendors use journaling as a means to capture all communication from an Exchange server and store it in an Archive. If Envelope Journaling is enabled it also allows archive vendors to capture the transport envelope which includes the P1 message headers which include information about BCC and distribution group recipients.

In Exchange 2003 Journaling is enabled for each mail store. You could enable message-only journaling, BCC journaling, or the aforementioned Envelope journaling. Having journaling at the store level is slightly limiting in that you must apply it to all users for that store. What if you only wanted to journal a few select mailboxes that needed special attention? This brings me to Exchange 2007.

Exchange 2007 has two flavors of journaling, standard and premium. Standard journaling, otherwise known as per-mailbox database journaling, allows you to enable journaling for all users in a specific mailbox database. Premium journaling, otherwise known as per-recipient journaling, allows you to enable journaling for individual mailbox recipients. Since journaling is located on the hub transport I have much more flexibility in Exchange 2007 for journaling. So where is the confusion?

What my friend was confused about is that premium journaling is only available if you have an Exchange Enterprise CAL license. No worries though. While you might envy those super Exchange administrators out there fortunate enough to have an enterprise CAL, standard journaling works just as well. You are not losing functionality but only losing flexibility.

For more on journaling you can visit the Exchange Server TechCenter on Micrsoft’s website. I read through it and if I can understand it anyone can. :)

Here’s the link: http://technet.microsoft.com/en-us/library/bb124382(EXCHG.80).aspx



Technorati Tags:
, , , , , ,

Sunday, June 29, 2008

I’m a SWAGaholic… (LegalTech Day 2)

On the Friday of LegalTech I was able to spend some time walking through the vendor displays in the trade show. I was disappointed to see from my trade show book that there was no category for Archiving. I think archiving is a critical factor to success for eDiscovery (see previous post) so not having a category for it was surprising. By my count there were 4 companies there that had major focus on email archiving. Most were focused on the eDiscovery process or at least a part of the eDiscovery process. This isn’t surprising since it is a show for legal professionals.

I had a conversation with a person at the eCopy booth that was worth mentioning. They make software that sits on top of a Canon network copier/scanner. The software allows users to scan physical documents into electronic documents and helps to automate the process when you are working with a large number of documents. Some of the features they offer are:

Bates Numbering – Assigns each document a unique number as it is scanned into electronic format.

OCR – Recognizes scanned text so that it can be indexed or categorized.

Annotations – Allows you to make notes on the electronic document as well as redact any text that should not be seen.

I know that document scanning solutions have been around for a while but what impressed me about eCopy was the high number of connectors they offered in their solution. They have the capability to integrate with email applications, document management solutions, and hr systems just to name a few. The salesperson I was speaking two also mentioned the possibility of archiving being a need.

There are still some companies out there that have yet to convert the majority of business that they used to do on paper over to digital. They might be storing data in a physical archive that involves paper document storage. Imagine the amount of storage space saved in going from a physical archive to a digital archive. Not to mention the information is now indexed and can easily be searched for later recall. I would think that in order to ensure that you are protected and can recall data quickly you would want to move to a digital archive as quickly as possible.

After making it around the show floor I spent some time at our booth. I was surprised over two days at how many people asked me what eDiscovery was. I guess I never thought that some legal professionals still might be new to this industry. So for those of you that are new to eDiscovery let me lend you a hand. The simplest definition that the EDRM website provides is this:

The process of finding, identifying, locating, retrieving, and reviewing potentially relevant data in designated computer systems.

Oh and by the way. EDRM stands for the “Electronic Discovery Reference Model” and they (the group is made up of over 70 participating companies) spend the majority of their time setting standards and guidelines for the eDiscovery industry. It is a great place to get started. Here’s the link: http://www.edrm.net/

Last but not least… One of my favorite things to do at trade shows is get some SWAG (It’s for the kids, honest). As I walked by the booths I collected anything that looked like it might keep me (I mean the kids) entertained. The winner this show (or the item that kept the kids entertained best) was what I can only describe as a wall crawler. Remember the sticky rubber animals that you can throw against the walls and they would slowly crawl down? The kids loved these and immediately started throwing them all over the house. Apparently when I was a kid I never realized that they leave marks on the walls and windows. Now I do…

Technorati Tags:
, , , , , , , , ,

Friday, June 27, 2008

A Vespa? Really? (LegalTech Day 1)

Today and tomorrow I am at LegalTech. LegalTech is show for legal professionals and IT workers covering such subjects as eDiscovery, archiving, and other technology issues of interest to legal professionals. I wanted to relay two things to you from the show today.

The first comes from the sessions I attended today. The first session I attended today was delivered by Bill Tolsen from Mimosa Systems. The main theme I gained from his presentation was how important proactive eDiscovery was. He stated that he thought the market was headed in that direction.

I also attended a presentation given by Keri Farrell from Quest Software (Full Disclosure: Keri and I work together at Quest). A portion of Keri’s presentation talked about the differences between Proactive and Reactive eDiscovery. I thought she did an excellent job explaining it and thought the definitions she used were easy to understand. For those that do not know here is what we mean when we talk about proactive and reactive eDiscovery.

From her presentation she says Reactive eDiscovery is a company saying “We don’t know what’s out there, or where it all is, but need to produce evidence for x within x days” or “I lost my keys… I need them now!”

She says that Proactive eDiscovery is a company saying “We need to set up our Exchange environment in order to facilitate discovery” or “I’m going to put my keys in the right place so I can find them easily!”

Proactive eDiscovery involves Archiving. You can’t be proactive without it. You will not know where you your keys are if you do not have a place to keep them. Having everything (that needs to be saved) contained in a single repository in an Archive allows you to store data in a location where it can be easily found. There are also some other advantages I thought about with the proactive approach. They are:

* You can do journaling to capture envelope information where as reactive eDiscovery does not involve journaling

* Indexing is done as you store data making the search for data faster since there is no need for the index to be built first.

* Retention policies can be managed better if everything is centrally stored.

* Centralize and therefore control .pst files in the archive instead of at the desktop.

* Legal hold is easier to apply to a centralized repository.

Those are just a few advantages but I am sure there are more. The point is that in order to successfully remain compliant a proactive approach is best. There is a place for the reactive approach but in order to minimize your risk you should be proactive.

That brings me to the second Item. I learned something today. If you want to attract people to your booth at a tradeshow a Vespa is actually better than an Xbox. :)



Technorati Tags:
, , , ,

Tuesday, June 24, 2008

Can your company read your email?

Below is a link to an interesting article in the Los Angeles Times this week. It has to do with a ruling by the 9th Circuit Court of appeals on an employer’s ability to access and read electronic messages (text messages or email) when they are stored with an outside provider. There are several interesting things to note about this ruling.

Arch Wireless was defined in the ruling as being an “electronic communication service” (ECS). This is important because if they were defined as a “remote computing service” (RCS) then they would be off the hook. The Stored Communications Act allows an RCS to release stored private electronic communication with consent of either the user or the subscriber (the city in this case) whereas if they are an ECS they can only release private communication with consent of the addressee or recipient.

At this point all of you who are using a hosted email or hosted archiving solution are wondering if your service is considered an RCS or ECS. As luck would have it the court made the distinction between what defines an RCS and an ECS. An ECS provides users with the ability to send or receive electronic communication. An ECS might also store those electronic communications but only temporarily for the purpose of transmission of the content or backup protection. On the other hand an RCS is defined as the provisioning of public computer storage or processing by means of an electronic communications systems.

Arch Wireless was defined as an ECS because when it archived the text messages it was not clear who it was doing that for. If they were clearly providing a storage service for their provider then they would have been classified as an RCS.

It is clear to me from this ruling that corporations using a third party for email service (e.g. MSN Hotmail) would need consent from the addressee or recipient in order to search or look at the email stored in that service. It is not clear to me whether or not this might extend to hosting companies hosting an email system (e.g. Corporate Exchange) for the public. In those cases they might be looked on as provisioning storage space so that they can host the Exchange organization.

However it is very clear to me that the way to avoid the risk associated with this ruling is to Archive. Whether hosted or onsite an Archive is not the storage of email for the purposes of back protection or a temporary holding position for transmission. An Archive is providing a permanent copy of the email for the purposes of compliance or eDiscovery. If a company uses a hosted solution for archiving they are safe since that solution clearly is provisioning storage to the public and storing data permanently. Safer yet is the company that Archives all data and stores it in house. This ruling would not apply to them.

Listed below are links to the article and the ruling. Enjoy! :)

http://www.latimes.com/technology/la-me-text19-2008jun19,0,1023202.story

http://www.ca9.uscourts.gov/ca9/newopinions.nsf/D2CDDB4098D7AFB28825746C0048ED24/$file/0755282.pdf?openelement

Technorati Tags:
, , , , , , , ,

Monday, June 23, 2008

How do I start this thing?

This officially marks my entry into the world of blogging. As you will be able to tell from this first post I am a newbie. I have been holding back on this for a long time now and am finally pulling the trigger. My hope is that this will be an informative site where you will find useful information. I guess you might be asking yourself at this point what this blog might be about....

That's WHAC will be about What's Happening with Archiving and Compliance. Basically I will be covering any interesting information in the world of Archiving and issues facing the IT industry as it struggles with regulatory compliance. It took me a while to find a name but I am happy with what I settled on. I thought it might be best to use language popular with "the kids" today so I could better relate to the "young people". (This all comes from the fact that my 6 year old son thinks I am old and told me so in no uncertain terms).

I am starting this blog for a few different reasons. First the archiving market is huge. There are too many Archiving vendors in the market to count and each does things slightly different. I hope I can bring clarity to some of the confusion.

Second, it seems to me that IT's challenge of meeting regulatory compliance is constantly evolving. I seem to learn something new in this area everyday. As I find new things I will post so that you might be informed as well.

Third, my boss is making me :). Actually it was suggested by many but this has been on my mind for a while now.

Well there it is, my first post. If you have gotten this far then I haven't scared you off yet. I hope that this will be both fun and informative. Feel free to leave any comments. I will look forward to hearing from those who find this blog useful. Talk to you soon!