Showing posts with label txt. Show all posts
Showing posts with label txt. Show all posts

Wednesday, March 21, 2012

multiple wildcards in Foreach Loop

Hi, I am using Foreach Loop to loop through files in a directory...
I would like to use more than one wildcards (e.g. *.txt *.log ).. but the container does not seem to work that way. It only takes one wildcard...

Is there anyway i can pass in multiple file extensions ?
thanks

Hi mf915,

I do not know if this will help.

If you have only .txt and .log files in the directory then you can use filename.* or *.*. If there is some files in the directory that you do not want to get picked up by the foreach loop, then you will have to move the files to a seperate folder. This will only work if your .txt and .log files is in the same format for example comma separated, have the same number of columns and headings.

Kind Regards,

Joos Nieuwoudt

Monday, March 12, 2012

multiple text files mining

hey everbody,

i'm absolutely new to any sort of data management

here it goes: suppose we store 100 .txt or .doc files in sql server and we want that none of the files data should match more than 60%: the question which arises are

1. how do we store files in ms-sql (binary format or normal text)?

2. how do we match the files?

3. what code we write in c# for this purpose?

4. has this nething to do with pattern recognition?

My request to all new n active experienced user's to participate because Plzzzzz help me?

1.I should use a SSIS solution using "Import Column transform" to get the files and store in a column with varbinary(max) data type; if you want to store only text in a column you can use varchar(max) data type (you can store maximum 2 GB)

2.Using SSIS solution I told there is a way to matching file (use "for each loop container")

3.For my ideea you don't to do that; you can run periodically the package created with SSIS in a job , depends on your business logic

4.Study the tutorial from here if you want create a text mining project. The ideea if you want to know let say the frequence of the terms /phrases, extracting clustering terms/concepts from the docs.

|||

does SSIS have a readymade package to compare mulitple files? if yes how does one go about it? thanx man !!!

|||

what did ou mean "compare multiple files"?

using for each loop container ou can select *.txt or *.doc files or have a special name.

if mean comparing contents of files i think you can use script task when you can customize this comparison using .net (as you should did it using .NET)

|||

i mean that for example there are two text files (.txt or .doc) stored in SQL server, containing an essay on American Independance:

I want to check that the essays do not match more than 60%. How do i do this? help appreciated !!!

|||

vickwal wrote:

i mean that for example there are two text files (.txt or .doc) stored in SQL server, containing an essay on American Independance:

I want to check that the essays do not match more than 60%. How do i do this? help appreciated !!!

You can use MS Integration Service.

I think you have-to convert (presumably MS Word) .doc format into plain text .txt.

Then you can load .txt files into a table like ESSAYS(AUTHOR varchar(255), FILENAME varchar(255), [Content] TEXT) using For Each Loop Container control.

Then use Fuzzy Lookup comparing by Content field, using same ESSAYS table as base and as Lookup table.

You can play with Similarity Threshold there.

Fuzzy Lookup operator will produce an output for each row of base table where Similarity and Confidence columns will be. Just spool it into another table.

good luck,

Mark

multiple text files mining

hey everbody,

i'm absolutely new to any sort of data management

here it goes: suppose we store 100 .txt or .doc files in sql server and we want that none of the files data should match more than 60%: the question which arises are

1. how do we store files in ms-sql (binary format or normal text)?

2. how do we match the files?

3. what code we write in c# for this purpose?

4. has this nething to do with pattern recognition?

My request to all new n active experienced user's to participate because Plzzzzz help me?

1.I should use a SSIS solution using "Import Column transform" to get the files and store in a column with varbinary(max) data type; if you want to store only text in a column you can use varchar(max) data type (you can store maximum 2 GB)

2.Using SSIS solution I told there is a way to matching file (use "for each loop container")

3.For my ideea you don't to do that; you can run periodically the package created with SSIS in a job , depends on your business logic

4.Study the tutorial from here if you want create a text mining project. The ideea if you want to know let say the frequence of the terms /phrases, extracting clustering terms/concepts from the docs.

|||

does SSIS have a readymade package to compare mulitple files? if yes how does one go about it? thanx man !!!

|||

what did ou mean "compare multiple files"?

using for each loop container ou can select *.txt or *.doc files or have a special name.

if mean comparing contents of files i think you can use script task when you can customize this comparison using .net (as you should did it using .NET)

|||

i mean that for example there are two text files (.txt or .doc) stored in SQL server, containing an essay on American Independance:

I want to check that the essays do not match more than 60%. How do i do this? help appreciated !!!

|||

vickwal wrote:

i mean that for example there are two text files (.txt or .doc) stored in SQL server, containing an essay on American Independance:

I want to check that the essays do not match more than 60%. How do i do this? help appreciated !!!

You can use MS Integration Service.

I think you have-to convert (presumably MS Word) .doc format into plain text .txt.

Then you can load .txt files into a table like ESSAYS(AUTHOR varchar(255), FILENAME varchar(255), [Content] TEXT) using For Each Loop Container control.

Then use Fuzzy Lookup comparing by Content field, using same ESSAYS table as base and as Lookup table.

You can play with Similarity Threshold there.

Fuzzy Lookup operator will produce an output for each row of base table where Similarity and Confidence columns will be. Just spool it into another table.

good luck,

Mark

multiple text files mining

hey everbody,

i'm absolutely new to any sort of data management

here it goes: suppose we store 100 .txt or .doc files in sql server and we want that none of the files data should match more than 60%: the question which arises are

1. how do we store files in ms-sql (binary format or normal text)?

2. how do we match the files?

3. what code we write in c# for this purpose?

4. has this nething to do with pattern recognition?

My request to all new n active experienced user's to participate because Plzzzzz help me?

1.I should use a SSIS solution using "Import Column transform" to get the files and store in a column with varbinary(max) data type; if you want to store only text in a column you can use varchar(max) data type (you can store maximum 2 GB)

2.Using SSIS solution I told there is a way to matching file (use "for each loop container")

3.For my ideea you don't to do that; you can run periodically the package created with SSIS in a job , depends on your business logic

4.Study the tutorial from here if you want create a text mining project. The ideea if you want to know let say the frequence of the terms /phrases, extracting clustering terms/concepts from the docs.

|||

does SSIS have a readymade package to compare mulitple files? if yes how does one go about it? thanx man !!!

|||

what did ou mean "compare multiple files"?

using for each loop container ou can select *.txt or *.doc files or have a special name.

if mean comparing contents of files i think you can use script task when you can customize this comparison using .net (as you should did it using .NET)

|||

i mean that for example there are two text files (.txt or .doc) stored in SQL server, containing an essay on American Independance:

I want to check that the essays do not match more than 60%. How do i do this? help appreciated !!!

|||

vickwal wrote:

i mean that for example there are two text files (.txt or .doc) stored in SQL server, containing an essay on American Independance:

I want to check that the essays do not match more than 60%. How do i do this? help appreciated !!!

You can use MS Integration Service.

I think you have-to convert (presumably MS Word) .doc format into plain text .txt.

Then you can load .txt files into a table like ESSAYS(AUTHOR varchar(255), FILENAME varchar(255), [Content] TEXT) using For Each Loop Container control.

Then use Fuzzy Lookup comparing by Content field, using same ESSAYS table as base and as Lookup table.

You can play with Similarity Threshold there.

Fuzzy Lookup operator will produce an output for each row of base table where Similarity and Confidence columns will be. Just spool it into another table.

good luck,

Mark

multiple text docs evaluation

hey everbody,

i'm absolutely new to any sort of data management

here it goes: suppose we store 100 .txt or .doc files in sql server and we want that none of the files data should match more than 60%: the question which arises are

1. how do we store files in ms-sql (binary format or normal text)?

2. how do we match the files?

3. what code we write in c# for this purpose?

4. has this nething to do with pattern recognition?

My request to all new n active experienced user's to participate because Plzzzzz help me?

What is the purpose of this? If you want to discriminate against files, that should be done before saving them. If you need a difference analyzer, I suggest you look how subversion, cvs or any other source control systems handles and saves the differences.|||

the purpose is:

say i store two text documents uploaded to SQL Server from a web portal. Now i want SQL Server to determine that how much percent the content of these two files matches? Say if I want that the files shouldn't match more than 60%, and if they do they should be discarded. thanks man, help appreciated !!!

|||That sounds like a trigger might be able to do that processing during a table load/update. I will move this post to the DB forum for advice.|||thanx man! I was wondering if SQL Server 05 had some SSIP analytic readymade service do that?

multiple text docs evaluation

hey everbody,

i'm absolutely new to any sort of data management

here it goes: suppose we store 100 .txt or .doc files in sql server and we want that none of the files data should match more than 60%: the question which arises are

1. how do we store files in ms-sql (binary format or normal text)?

2. how do we match the files?

3. what code we write in c# for this purpose?

4. has this nething to do with pattern recognition?

My request to all new n active experienced user's to participate because Plzzzzz help me?

What is the purpose of this? If you want to discriminate against files, that should be done before saving them. If you need a difference analyzer, I suggest you look how subversion, cvs or any other source control systems handles and saves the differences.|||

the purpose is:

say i store two text documents uploaded to SQL Server from a web portal. Now i want SQL Server to determine that how much percent the content of these two files matches? Say if I want that the files shouldn't match more than 60%, and if they do they should be discarded. thanks man, help appreciated !!!

|||That sounds like a trigger might be able to do that processing during a table load/update. I will move this post to the DB forum for advice.|||thanx man! I was wondering if SQL Server 05 had some SSIP analytic readymade service do that?

multiple text docs evaluation

hey everbody,

i'm absolutely new to any sort of data management

here it goes: suppose we store 100 .txt or .doc files in sql server and we want that none of the files data should match more than 60%: the question which arises are

1. how do we store files in ms-sql (binary format or normal text)?

2. how do we match the files?

3. what code we write in c# for this purpose?

4. has this nething to do with pattern recognition?

My request to all new n active experienced user's to participate because Plzzzzz help me?

What is the purpose of this? If you want to discriminate against files, that should be done before saving them. If you need a difference analyzer, I suggest you look how subversion, cvs or any other source control systems handles and saves the differences.|||

the purpose is:

say i store two text documents uploaded to SQL Server from a web portal. Now i want SQL Server to determine that how much percent the content of these two files matches? Say if I want that the files shouldn't match more than 60%, and if they do they should be discarded. thanks man, help appreciated !!!

|||That sounds like a trigger might be able to do that processing during a table load/update. I will move this post to the DB forum for advice.|||thanx man! I was wondering if SQL Server 05 had some SSIP analytic readymade service do that?