Page 1 of 1

cleaning data (array related)

Posted: Fri Nov 08, 2013 10:34 pm
by tester
I can't find it, so I'm not sure whether I posted this question earlier, or just started to write the post, and closed the browser in the middle. Mixed memories about arrays confuse me even more.

I guess this is again ruby question. Cleaning data from non-numeric characters, spaces and so on. While it's not a big deal with anything after numers (standard green prims can do this by default), I don't know how to remove anything prior do that data.

Let say that I have an array written like this:

11asd,12bcv3,crd13, 14,15

A real mess, but it should end like this:

11,12,13,14,15

So all non-numeric characters (except comma - yep, this time comma) prior to first numeric substring (at index), and after it - are removed.

The cleaning has to do with possible user interactions with data, that are entered manually. I used harsh example; usually it will be all sort of typos.

Re: cleaning data (array related)

Posted: Fri Nov 08, 2013 11:28 pm
by Nubeat7
it is string related in your example and you can use .gsub for it

Code: Select all

output @in.gsub(/[A-z]/, '')

problem here is that there are no special signs included like "!","?",.... will still be there, and only all alphabeticals are deleted.. would need some search in regex

Re: cleaning data (array related)

Posted: Fri Nov 08, 2013 11:40 pm
by tester
Thanks, but there is small thing. What about non-letter non-numeric characters? Like /.:'\ and so on? From what I see, adding items to [] part - uses them as filter (so it works for some signs), but there are some limitations, like slashes /\ and few others. Is there a pattern definition, that removes all except numbers (and optionally commas; this part can be eventually remultiplexed via green prims)?

Or different way - "pass through only numbers and commas" instead of "remove all except numbers and commas"?

//edit: it looks that it can go through a parser.

Re: cleaning data (array related)

Posted: Fri Nov 08, 2013 11:48 pm
by Nubeat7
oh got it, instead of matchevery character in the brackets you can do a ^ befor so it matches every character NOT in the brackets

Code: Select all

output @in.gsub(/[^0-9,]/, '')


http://www.tutorialspoint.com/ruby/ruby ... ssions.htm

Re: cleaning data (array related)

Posted: Fri Nov 08, 2013 11:59 pm
by tester
I see, thanks, will use yours.
Meanwhile I did this one (but seems to not work correctly). :-)

Re: cleaning data (array related)

Posted: Sat Nov 09, 2013 12:21 am
by billv
tester wrote:What about non-letter non-numeric characters? Like /.:'\ and so on?

Delete special characters with...

Code: Select all

output @str.gsub("\/","")

or

Code: Select all

output @str.gsub("\"","")

Re: cleaning data (array related)

Posted: Sat Nov 09, 2013 12:26 am
by tester
I think what Nubeat7 proposed would be better here, because it limits the choice to showing numbers and selected characters instead of removing the "unknown party".

Re: cleaning data (array related)

Posted: Sat Nov 09, 2013 12:33 am
by billv
Yes...I'm just addressing the special character removal.
I imagine you would probably map them all together next.

Re: cleaning data (array related)

Posted: Sat Nov 09, 2013 12:51 am
by tester
Sure, I understand. Just pointed it to others who read.