Thursday, July 31, 2008

This Cloud has a Silver Lining!

A while ago I did a webcast with an analyst firm on storage in the context of archiving. One of the topics of that webcast was the changing standards and technology for storage changing over time.

If you frequently take photos with a digital camera you have no doubt thought about this topic. Think about it, you have probably either filled hard drives or burned CDs or DVDs to store the hundreds of photos that you are taking that you want to preserve forever. How do you know that the storage mechanism you use for those photos will be around for your lifetime?

In my lifetime I can remember several different types of storage mechanisms. When I was much younger my father had a mainframe computer in the basement of our house. I would spend a lot of time watching the reel to reel tapes spin to load data to and from the machine. When we were old enough to use the TRS-80 that he had we would load programs into memory from a cassette tape (you had to listen for breaks in the noise on the tape to find the programs). The 8 inch floppy disks we moved to next made it much easier and faster. I also remember our first 20MB hard drive. Wow, 20MB, how could we ever fill that?

I think you see my point. I can’t go down to the local best buy and get a reel to reel reader for a computer tape. With the changes in interfaces I am not even sure that old 20MB hard drive can be attached to a modern PC let alone the OS reading the cluster size on the drive. As we store more and more data on removable media this issue is something that needs to be kept in mind.

That brings me to archiving. How can you be sure that the storage mechanism you use today will be in use 5 to 10 years from now? I noticed that an archiving vendor announced that it now has support for blue-ray DVD storage this week. How do we even know that blue-ray disks will last 10 years? I know that they must perform some type of accelerated aging on the media to test its longevity but the technology has only been around a couple of years for real world tests.

I believe that one answer to this problem will be “cloud storage” (I was hoping to use this buzz word soon!). Storing data in “the cloud” will make keeping up with evolving technology the problem of the company providing the storage. If companies providing “cloud storage” want to compete for archiving storage they will need to address this problem and state it in their messaging. “Store data with us because we will ensure it can be accessed when you need it regardless of the changes to technology over the years.” Archiving vendors will also need to plan to partner or integrate with these solutions.



Technorati Tags:
, , , ,

Lemmatization and Stemming

A colleague brought up the term Lemmatization to me today and I thought it would make a great blog topic. After realizing that he wasn’t insulting me I looked further into the meaning of the word.

My first stop was the online dictionary. Lemmatization apparently means “the act or process of lemmatizing” (thanks a lot online dictionary). Not being happy with that definition I looked at Wikipedia, the most truthful and valuable source of information on the web (sarcasm intended). They offered additional assistance.

Lemmatization is the process by which a word is taken and reduced to its Lemma or its canonical form. For example if you take the word “walking” the lemma or base form of that word would be “walk”. Or if you take the word “better” the lemma of that word would be “good”. Lemmatization takes into account the context and meaning of the word when determining the base form of the word.

Stemming is the process of reducing the word to its root form without necessarily taking into account the context or meaning of the word. For example the words “walking”, “walks”, and “walker” would all have the root of “walk” but the word “ran” would not have a root of “run” when using a suffix stripping stemming algorithm. Lemmatization algorithms are more accurate in that the meaning and context of the word is considered.

Why is this important? Using lemmatization in searching for data in your archive during a discovery request provides more accuracy in that you receive results based on meaning and context which is much more valuable than a straight keyword search.

The next time you are at a major social function please remember this blog. I am sure explaining the meaning of the word lemmatization will not only impress your many friends but also make you the life of the party (or you might be assaulted, it could go either way…). :)

Wikipedia source: http://en.wikipedia.org/wiki/Stemming and http://en.wikipedia.org/wiki/Lemmatisation


Technorati Tags:
, , , ,

Wednesday, July 30, 2008

Archiving IMs

Ben Worthen posted a blog entry at The Wall Street Journal yesterday on a problem with Wall Street traders using instant messaging. They were going to have to cease using their Thomson Reuter’s IM service without a compliance system in place.

The SEC requires that IM messages are monitored and retained in order to maintain compliance. Apparently the software that Thomson Reuters was using could no longer be used due to a dispute over the licensing. A new system is in place now and the crisis was averted.

It is interesting to think about the branching out of compliance for other systems outside of email. Instant Messaging used to be considered a very risky tool and many have delayed or avoided deploying it in their organization. Since most public instant messaging systems are not controlled or governed it is easier for insider information to be exposed to the public without the company’s knowledge.

Instant Messaging system adoption is increasing rapidly. College students today rely on texting and IM technologies for things like communicating with their friends and collaborating on school projects. As they move into the corporate world they want to continue to use the communication tools that are familiar to them. It only seems natural to them to use instant messaging instead of the phone to collaborate on a project they have to complete as a team.

Microsoft is also pushing the adoption of IM. With Microsoft’s Office Communication Server, Instant Messaging is becoming more integrated into phone and email communication. More companies are seeing corporate instant messaging as a business tool rather than a way to communicate with your wife that you will be late coming home from work (Sorry honey, I will be late tonight). As this adoption continues companies struggle with what that means for compliance.

The SEC is clear on what it means for financial institutions as Ben states in his blog. Financial companies need to monitor and save instant messaging discussions. This is where I believe that archiving can help. Instant messaging archiving is growing need for companies deploying these instant messaging solutions. An effective archiving solution should store these discussions and index them so they can be monitored and discovery requests can be fulfilled. As the demand increases archiving vendors will be called upon to meet this need.

Ben’s blog post: http://blogs.wsj.com/biztech/2008/07/29/did-traders-almost-have-to-stop-sending-instant-messages/



Technorati Tags:
, , , , , ,

Archiving Provides Efficient DLM

SearchCIO.com published a well written article on Data Lifecycle Management today. The article included archiving as a means to a successful DLM plan. HSM (Hierarchical Storage Management) is only part of the picture in DLM. Archiving must play a key role.

Archiving not only provides the management of the data but also indexes and stores the data in an efficient manner that HSM systems can benefit from. Typically archiving systems can store a single instance of data that is duplicated across emails systems, file systems, SharePoint and other systems from which data is archived. By storing only a single copy the document the storage needs for the HSM system are greatly reduced.

How does this fit into DLM? Storing the data is efficiently is only half of the story. Managing the data by applying retention across all data sets uniformly ensures that data that is at the end of the lifecycle is removed. It also ensures that data that needs to be kept for legal discovery purposes gets kept.

The article is a great read and includes information about the cost savings realized from storing data more efficiently thanks in part to archiving solutions.

Article Source: http://searchcio.techtarget.com/news/article/0,289142,sid182_gci1323186,00.html#

Technorati Tags:
, , , , , ,

Thursday, July 24, 2008

Need a new career?

I recently came across a career profile on About.com for an e-discovery professional. In the profile they state how the industry has grown to be a $2 billion industry and will be $21.8 billion by 2011. Since the industry is growing so rapidly the need for companies to hire professionals that specialize in e-discovery is also growing.

The profile of an e-discovery professional starts with a person who has a background in law or in information technology. Usually paralegals are interested but with rising salaries more lawyers are getting into the mix. It is interesting to note that some come from backgrounds in information technology. Since half of the problem of e-discovery is the gathering and storing of electronic data it is not surprising that an IT professional might choose this as a career path. I would suspect that legal professionals with a background in IT (rare) would be the best candidates.

So what are the responsibilities for your new found career in e-discovery? The profile states the following responsibilities:

  • Assessing a client’s ESI.
  • Helping to create ESI preservation policies.
  • Serving on e-discovery teams.
  • Ensuring compliance with the new federal rules regarding ESI.
  • Educating clients on e-discovery policies.
  • Drafting and communicating litigation hold procedures.
  • Using technology to facilitate discovery.
  • Assisting in the collection, processing, review, analysis and production of ESI.
  • Serving as a liaison between the legal team, IT personnel, vendors and records management personnel.

I know all this sounds glamorous but what is the pay like? I am sure that this industry is so exciting that most will want to do the job just for the prestige and fame it brings. Although if you do expect to get paid then you can look for a salary in the range of $125,000 to $250,000 annually.

So its time to brush up that resume get a new tie and apply at your favorite corporate legal department. Your new career in e-discovery awaits. Don't be late for the interview!

(Oh, and don't wear the tie into the office of your current job after interviewing. Everyone will know what you are up to.)

Source: http://legalcareers.about.com/od/careerprofiles/p/e-discovery.htm



Technorati Tags:
, , , , , , ,

Asian CIOs get it.

Recent survey results posted by ZDnet Asia showed that regulatory compliance sits high on the priority list for Asian CIOs. The survey also stated that their c-level peers did not share the concern. Although I believe it is typical for CIOs to see what is happening in the IT market before other c-level executives it is interesting that other executives in Asia have not caught on yet.

It seems to me that regulations regarding email compliance are more defined in the United States since the Federal Rules of Civil Procedure were put in place. When I talk to companies overseas the regulations are less clear. Privacy concerns and contractual obligations seem to be the main driver. For example, in some regions of Europe making sure that administrators can’t read the personal email of their employees is a key requirement. When the archiving system exports email from the messaging server there needs to be a means to allow employees to keep their personal emails from being archived. Implementing something like this for US employees might expose companies to risk.

It’s nice to see that regulatory compliance is finally becoming an important need in Asia. Hopefully they will be motivated to define procedures without a case like Enron.

Article Source: http://www.zdnetasia.com/news/business/0,39044229,62044095,00.htm?scid=rss_z_nw



Technorati Tags:
, , , , ,

Tuesday, July 22, 2008

E-Discovery survey confirms archiving investment.

A survey sponsored by FIOS that interviewed legal professionals from fortune 500 companies confirms an investment in email archiving tools. The survey respondents consisted of 26 lawyers and 2 non-lawyers responsible for legal discovery.

79% of the respondents stated that they had made investment in technology in the last 12 months to assist in legal discovery. 39% of those made investment in email archiving second only to legal hold management (50%). I believe that this is a trend that will continue as companies realize the best way to protect themselves or assist themselves with e-discovery requests is to proactively implement an archiving solution.

Other interesting technology trends from the survey:

Automate legal hold – Over 30% purchased litigation hold solutions
Centralize email – Email archiving systems are being implemented to give companies a single location for search and management.
Increase insourcing – 79% of respondents invested in e-discovery for in-house solutions.

When I read this article it was clear to me that a good email archiving solution can effectively address the technology trends that resulted from the survey. Legal hold can be implemented easier by having all email within an archive. Email archive solutions can centralize email from multiple messaging platforms. Email archives also provide an efficient storage mechanism for keeping email data in-house.


Article Source: http://www.law.com/jsp/legaltechnology/pubArticleLT.jsp?id=1202423099886



Technorati Tags:
, , ,

Saturday, July 19, 2008

Do social networking sites expose companies to risk?

I was sitting in my kitchen this morning going through my normal Saturday morning routine. Having a homemade Amish doughnut made by the Amish restaurant in town and a large cup of coffee. (I know the doughnuts are not made by real Amish people but just by the people they hire. They are still good though.) As I was reading the paper (also a part of the routine) I came across an article that got me thinking…

The story was about how web profiles on social networking sites such as FaceBook and MySpace can negatively affect clients defending themselves against cases such as drunken driving accidents. In a couple of cases when the person driving was found to be drunk and they did bodily harm to another their social networking page hurt their case. The photos stored on their site showed them to either be at parties with alcohol, drinking alcohol, or wearing clothes that advertised for alcohol. Showing a defendant for a drunken driving charge having a good time with drink in hand right before sentencing damaged their case and they received stiffer penalties.

This does not have much to do with corporate discovery cases yet but stay with me on this one. It got me thinking about how a lawyer might look to new locations for discovery requests when companies are being sued by their employees. For example let’s say that a disgruntled employee is fabricating a harassment claim against the company they work for from a recent company party they attended. If employees at those companies posted photos of the event in question on their social networking sites then those photos could be misinterpreted in favor of the employee filing the complaint. The actions of a few at the party might be captured in digital form showing that the party was either out of control or unsafe.

Although I do not know of any cases yet where discovery requests were sent to FaceBook or Myspace for personal photos (I am sure that it has happened), I am sure as these sites grow the risk will become greater. Personally I have gone to my fellow employees’ blogs or profiles to see pictures from events that I could not attend. I have also learned just by doing this blog that my company has polices that serve to protect them from liability and protect myself form posting anything that could get me into trouble (see “The Fine Print” to the right).

As an employee for a public company your actions in public are not only a reflection on you but a reflection on the type of employees the company you work for hires. The types of things you post on your social networking site reflect upon your public life and since most post what company they work for on those sites it could also reflect upon the company you work for as well. I heard a lawyer once say that there is a “grandmother test” to find out if you are doing something that could be deemed inappropriate or illegal. If you can’t describe to your grandmother what you were doing in that photo you posted of yourself at that party then it probably should not be posted. :)

Article Source: http://www.dispatch.com/live/content/national_world/stories/2008/07/19/Facebook_evidence.ART_ART_07-19-08_A1_PVAPRIU.html?sid=101

Technorati Tags:
, , , , ,

Friday, July 18, 2008

Archiving concern grows...

Network World posted an article today by Michael Osterman. They recently conducted a survey and found that 36% of decision makers are more concerned about email archiving today than they were 1 year ago. 30% were more concerned about e-discovery. Only 12% to 13% were less concerned about email archiving and e-discovery.

This shows that most companies have yet to sufficiently solve their email archiving and e-discovery needs. This market and the archiving and discovery needs of companies are still being defined. As each new ruling is made and companies struggle to understand the FRCP their concern increases. I believe that this is a sign that the archiving and discovery market is still not mature and will see substantial growth.

Source: http://www.networkworld.com/newsletters/gwm/2008/071408msg2.html

Technorati Tags:
, , ,

Wednesday, July 16, 2008

Terminated!

This article really pumped me up…

It looks like our friend Arnold is having some trouble producing documents for an eDiscovery request. So who is the terminator’s worthy adversary? A group of juvenile parolees. The parolees filed suit because they claim that their 14th amendment rights were violated. According to the parolees the discovery request they made was significantly delayed and caused detrimental harm to their case. They also claim that Arnold failed to produce all documents that were requested including databases and logs.

Although Arnold was able to produce over 55,000 documents the court determined it was not enough to comply with the order. Arnold’s office also claimed that there was only one person performing the discovery search and they had been overwhelmed with requests and she had family issues during the time the requests came through. The judge was less than sympathetic stating that it was not appropriate to assigned just one person to the request and that the defendant failed to solve problems with discovery conferences.

Arnold also took it on the chin for the type of documents produced. The documents were converted to .pdf before being sent to the plaintiff. In the documents original form they were searchable. Once converted to .pdf they were not. This created additional burden for the plaintiffs in the case. The judge ruled that if the documents in their original format were searchable then there is no reason why they can’t be given to the plaintiff as searchable documents.

I guess the lessons learned from this are that you should take all discovery requests seriously. Follow the EDRM process and meet with the requesting party to make sure that you understand the request and resolve any differences. If a good faith effort was seen by the judge in this case then he would not have mentioned the problems with the discovery conferences. Also, produce documents in either their original format or in a format that is useful to the requesting party. You might be seen as being uncooperative if you produce documents that cause the requesting party to have an additional burden during the review process.

Hasta la vista, baby! :)


Source Link: http://www.rcalaw.com/component/option,com_myblog/show,judge-sanctions-state-of-california-with-ediscovery-violations.html/Itemid,0/

ERDM Process: http://www.edrm.net/



Technorati Tags:
, , , ,

Tuesday, July 15, 2008

You want my voicemail too?

I was recently asked by a colleague what I thought the most interesting challenge was going to be for archiving over the next couple of years. My response was voicemail archiving. With the last release of Exchange Server Microsoft has made it easier for archiving solutions to grab voicemail and store it in their archives. Since voicemail can now be part of an existing email as an attached .wav file there is no reason why an archiving solution should not be able to store this data. Storing the data is the easy part and probably not the most interesting evolution in archiving but indexing and searching that voice data is.

An Article from Wisconsin Technology News that has been sitting in my inbox for a while now recently recaptured my attention. It discussed how the evolution of VOIP technologies can raise eDiscovery concerns for companies. Since voicemail is stored as a file and unified communications solutions store this data in the email system courts are now looking at this information as discoverable.

In pure electronic form the voicemails are not very useful. Solutions on the market today that index data do so by using the text contained in files and emails. The data in an audio file would need to be converted to text first and then indexed to be easily recalled in a discovery request. The article describes three ways of grabbing these keywords for indexing:

· Phonetic – Looking at patterns in the speech to determine words (Not accurate or likely to happen in my opinion)
· Manual transcription – Sending the voice file away to be manually transcribed (Prone to human error and expensive)
· Automated transcription – Using a speech to text conversion process (Most probable solution)

All three methods for gathering text from voicemails have yet to be perfected and all still seem to have slight deficiencies. It is my opinion that it will be very expensive for companies to produce voicemail data in a discovery request until a solution that contains a suitable filter (that can be integrated into existing indexing engines) for converting speech to text (effectively and without error) can be created.

Article Link: http://wistechnology.com/articles/4789/

Technorati Tags:
, , , , , , , , , ,

Monday, July 14, 2008

Email archiving provides no excuse!

You have all heard the same old story from local to state governments wherever you live. Some major scandal takes place, an eDiscovery request is placed, and for some odd reason we find out that no email is being retained for the employees involved. Or worse yet it has been maliciously deleted and not recoverable. Some government agencies even have a policy to purge email on a routine basis.

An article from the AP was floating around today that discussed this situation. In three instances state employees were told to either routinely or deliberately delete email from their mail systems and backup tapes. The main problem is that these emails are being destroyed before they have the chance to be viewed as public record.

State governments are finding out what corporations found out long ago. Even though you might have a policy of routinely deleting email it may not protect you in all cases. Emails should be treated as public record and should be retained for a period of time. In some states guidelines are set to define what a public record is and employees are expected to determine what to save. Although a process that relies on manual classification of email lends itself to human errors and inconsistencies.

So what are government agencies to do, keep everything? In the article Joerling was worried that keeping too many records may not make it efficient for state agencies to recall email when requested. On this point I have to agree with the government activists. It is better to keep everything and store it efficiently in an email archive where it can be indexed and easily searched. With modern archiving technology there is really no excuse not to produce public data when requested.

AP article link: http://hosted.ap.org/dynamic/stories/T/TEC_SAVING_E_MAIL?SITE=AP&SECTION=HOME&TEMPLATE=DEFAULT&CTIME=2008-07-14-04-35-52

Technorati Tags:
, , , , , ,

Even Google says its complex…

So what is it that the search giant Google finds complex? Answer: eDiscovery. In an article posted on NewsFactor (link below) Google states that they are relying on partners to assist with the complex nature of eDiscovery. In the first paragraph of the article I found this quote interesting:

"We see archiving as the foundation for later stages of the process, and are working with partners in the e-discovery ecosystem to ensure that data can be imported into other technologies."

Even Google recognizes that a good eDiscovery plan begins with archiving. In order to produce information in days and not weeks when that eDiscovery request comes through it is important to have a centralized archive that can contain multiple content types. The less number of places a search engine or discovery product has to access in order to find data the quicker the result.

The article also provides what Google and other search vendors find important for a good eDiscovery plan. They are:

· Broad file format – Support indexing multiple file types
· Text analytics – Searching for words and meaning
· Reach – Searching all systems
· Testing – Text an execution of your plan

I think that the last bullet might be overlooked more often than not but it is great advice. Test out the plan you have in place for eDiscovery by submitting a request to your team and see how well you can execute the request. Measure performance in not only the time it takes but also in the accuracy and completeness of the search. It might be surprising what gets overlooked.

http://www.newsfactor.com/story.xhtml?story_id=0320011LUI68



Technorati Tags:
, , , ,

Monday, July 7, 2008

Larger Mailboxes in Exchange

A whitepaper was posted this month at the Exchange Server TechCenter by Tom Di Nardo (Senior Technical Writer, Microsoft Exchange Server) on the subject of planning for larger mailboxes in Exchange 2007. In the white paper Tom makes reference to stubbing by third party archiving solutions. Here is what he says about the administrative problems with using stub files in Exchange:

Server performance Removing the message bodies and attachments from Exchange reduces the mailbox size, but it does not significantly change the server performance for users accessing Exchange via Outlook in online mode and Outlook Web Access. Item counts are the primary performance driver for the Exchange store, and not aggregate size. For example, server performance with a folder containing 100 KB of full e-mail message items is similar to a folder containing 100 KB of stub files.

Client complexity Because the use of stub files with a third-party archiving solution requires the deployment and use of Outlook add-ins, a significant amount of time must be spent by administrators to deploy and manage these add-ins. Administrator time is also required to assist end users with technical difficulties using the add-ins. Not deploying stub files removes all of this additional administrative work that must be performed, thereby allowing more time to administrators and end users.

I could not let this go by without some comment. First I am not sure I understand why he states that server performance is a problem with stubbing. While I might be in agreement that stubbing does not show a drastic improvement in server performance it does not adversely affect server performance. If I can increase the number of items in my mailbox that I have immediate access to without going to the archive and not degrade performance is that not an improvement? Having the 100KB of stubbed items might be similar in performance to 100KB of full data but it allows users easier access to more items without going to an archive.

The other thing he mentions as a negative is the use of a third-party add-in for outlook. I can’t cover how all archive vendors handle stubbing in the Outlook client so I will only speak to what Quest does (Full disclosure: I work for Quest and am probably biased). Archive Manager deploys a form through Exchange to the Outlook Client. No need for an add-in or a client installed at the desktop. While I can agree with Tom that this might be slightly more to set up, the benefits out-weigh the administrative burden.

However I do believe that the archiving market is headed away from storage management for email. As storage becomes cheaper and easier to manage for messaging systems (Exchange is a good example) the need is shifting towards longer term storage of data for compliance purposes. As users adopt the Exchange 2007 platform they might have less of a need for storage management. Not everyone is on Exchange 2007 though. There are still some customers on Exchange 5.5. Until Exchange 2007 (and future versions) is adopted in 80-90% of the market administrators will need a solution and storage management will still be feature that Archiving vendors will need to deliver.

White Paper link: http://technet.microsoft.com/en-us/library/cc671168(EXCHG.80).aspx

Technorati Tags:
, , , , , ,

Wednesday, July 2, 2008

eDiscovery, Quotas, and Hosting, oh my....

Three articles caught my eye today that I really wanted to comment on...

eDiscovery

First, there is an article from Baseline Magazine on how to prepare for an eDiscovery showdown. There were a couple of things that I paid attention to in the article. The first is that CIOs are not sure of how much a legal discovery request could really cost them. They might have a great storage plan but the data might be stored in legacy systems or stored in off site locations that could be expensive to locate and retrieve. Imagine if that legacy system that was used for the email storage no longer existed as a standard and could not be recreated to restore the data. Information might not only be costly to retrieve but in some cases impossible. This is why a centralized archive is so important. Everything is stored in one place in non-proprietary formats that allow fast retrieval.

The other interesting item in this article is where it mentions that CIOs might be responsible for storing VOIP data or voicemails that result from VOIP calls. This is very interesting. Imagine the conversations you have using VOIP being recorded, archived, and (through voice recognition) searched.

Here is the link: http://www.baselinemag.com/c/a/Storage/Preparing-for-an-EDiscovery-Showdown/

Quotas

IDM Online posted results of a survey that polled coporate email users and showed that 65% of the respondents had quotas in place that limited the size of their mailbox forcing them to manage their email storage. Of those, 66% managed their mailbox in a way to retain mail using means outside the email system including home email accounts or pst files.

Imagine the risk that companies are exposing themselves to because of mailbox quotas. I really do believe that the need for archiving is shifiting more towards compliance in the future and away from storage management. Although this survey shows me that the move has not happened yet and the need to control mailbox sizes seamlessly without having the users resort to pst files is still great.

Here is the link: http://www.idm.net.au/story.asp?id=9724

Hosting

Last but not least Byte and Switch posted an article describing how the hosted archiving and discovery market is growing. It is interesting to see how many vendors there are in the hosted archive market. It is not as many as in-house vendors but there are still a lot of vendors for that market segment. I only have one observation about this.

If email is moving toward SAAS and is hosted offsite and the archive for that email is hosted offsite as well then no email data is stored in-house at all. Aren't companies exposing themselves to risk by not maintaining a copy of their data in-house? What happens if they pick a vendor that is not in it for the long haul and closes up shop? What happens if they want to migrate to another hosted email service and still need access to the legacy data? It is my opinion that an in-house archiving solution provide companies with a safe place to keep data that is platform agnostic. If they ever decide to change email services then access to their legacy data is maintained.

Here is the link: http://www.byteandswitch.com/document.asp?doc_id=158102&WT.svl=news1_1

Technorati Tags:
, , , , , , , ,