Welcome, Guest. Please login or register.
Did you miss your activation email?
Friday 27 December 2024, 10:02:29 pm

Login with username, password and session length

CLICK HERE for the The official Endian Roadmap and Issue tracker
14262 Posts in 4377 Topics by 6517 Members
Latest Member: Sandro
Search:     Advanced search
+  EFW Support
|-+  Support
| |-+  Hardware Support
| | |-+  Endian on SSD Issues
0 Members and 1 Guest are viewing this topic. « previous next »
Pages: [1] 2  All Go Down Print
Author Topic: Endian on SSD Issues  (Read 83161 times)
shcc
Jr. Member
*
Offline Offline

Posts: 6


« on: Tuesday 27 March 2012, 07:25:29 am »

Hello All,

Got a bit of an issue here, we have deployed multiple Endian firewalls.  Has anyone noticed a massive failure rate on SSD's? We have been noticing that they fail about 3 months after deployment.  The boxes with spindle drives just keep rolling on with no major issues, while the SSD boxes die between 3 and 4 months.  We have tried multiple brands of SSD's and multiple capacities as well 8gb to 64gb.  I'm stumped.  I am a major advocate for SSD's, my boss on the other hand is starting to hate them with a passion. Any help or input would be much appreciated.
Logged
mrkroket
Hero Member
*****
Offline Offline

Posts: 495


« Reply #1 on: Wednesday 28 March 2012, 03:53:40 am »

I didn't worked with SSD's yet. But they are famous for their low write cycles.
You need to configure your OS correctly to avoid early wear out.
https://wiki.archlinux.org/index.php/Solid_State_Drives#Tips_for_Minimizing_SSD_Read.2FWrites
I don't think SSD is the best thing for some servers. The write ratio on a firewall can be very high (think about logs and caches).

For high speed firewalls, I would use ramdisks instead of SSD's. On a Firewall main disk usage is for logs, proxy cache and lookup tables (like dansguardian). I'd make a big ramdisk (4-6 GB) and use it for these intensive disk operations. On cron I'd program to discharge useful info to spinned HDD (mainly logs).
I know, is a complex environment, and hard to setup, but I prime reliability.

A system with 160GB classic disk and 16GB RAM can be cheaper than its 160GB SSD counterpart with 4GB, and you can achieve almost the same (Ok, the SDD would boot much faster). But unreliable servers are the worst thing ever, no matter how fast they go.
 As IT staff your main priority should be stability, not speed. People can complain if the system is slow, but be sure that they will cry out if the system doesn't work at all.
Logged
rosch
Full Member
***
Offline Offline

Gender: Male
Posts: 20



« Reply #2 on: Wednesday 09 May 2012, 10:32:06 am »

For me it was more than 4 months if I remember correctly but my endian 2.4.1 started to mess up not long ago.
I replaced both SSD with old spinning harddisks. There is not much writing going on anyway. At least for me there isn't. But I guess that depends a lot on what services you have enabled..

Anyway, what I could save from the logs is this:

Apr 30 01:50:35 efw kernel: [6532505.296049] ata2.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
Apr 30 01:50:35 efw kernel: [6532505.296055] ata2.00: failed command: FLUSH CACHE
Apr 30 01:50:35 efw kernel: [6532505.296064] ata2.00: cmd e7/00:00:00:00:00/00:00:00:00:00/a0 tag 0
Apr 30 01:50:35 efw kernel: [6532505.296066]          res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
Apr 30 01:50:35 efw kernel: [6532505.296070] ata2.00: status: { DRDY }
Apr 30 01:50:40 efw kernel: [6532510.345013] ata2: link is slow to respond, please be patient (ready=0)
Apr 30 01:50:45 efw kernel: [6532515.344015] ata2: device not ready (errno=-16), forcing hardreset
Apr 30 01:50:45 efw kernel: [6532515.344025] ata2: soft resetting link
Apr 30 01:50:50 efw kernel: [6532520.544012] ata2: link is slow to respond, please be patient (ready=0)
Apr 30 01:50:55 efw kernel: [6532525.390013] ata2: SRST failed (errno=-16)
Apr 30 01:50:55 efw kernel: [6532525.390023] ata2: soft resetting link
Apr 30 01:51:00 efw kernel: [6532530.590014] ata2: link is slow to respond, please be patient (ready=0)
Apr 30 01:51:05 efw kernel: [6532535.435013] ata2: SRST failed (errno=-16)
Apr 30 01:51:05 efw kernel: [6532535.435023] ata2: soft resetting link
Apr 30 01:51:11 efw kernel: [6532540.635013] ata2: link is slow to respond, please be patient (ready=0)
Apr 30 01:51:40 efw kernel: [6532570.470013] ata2: SRST failed (errno=-16)
Apr 30 01:51:40 efw kernel: [6532570.470023] ata2: soft resetting link
Apr 30 01:51:45 efw kernel: [6532575.517013] ata2: SRST failed (errno=-16)
Apr 30 01:51:45 efw kernel: [6532575.517018] ata2: reset failed, giving up
Apr 30 01:51:45 efw kernel: [6532575.517023] ata2.00: disabled
Apr 30 01:51:45 efw kernel: [6532575.517030] ata2.00: device reported invalid CHS sector 0
Apr 30 01:51:45 efw kernel: [6532575.517047] ata2: EH complete
Apr 30 01:51:45 efw kernel: [6532575.517096] sd 1:0:0:0: [sdb] Unhandled error code
Apr 30 01:51:45 efw kernel: [6532575.517099] sd 1:0:0:0: [sdb] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
Apr 30 01:51:45 efw kernel: [6532575.517104] sd 1:0:0:0: [sdb] CDB: Write(10): 2a 00 01 28 1e 01 00 00 08 00
Apr 30 01:51:45 efw kernel: [6532575.517117] end_request: I/O error, dev sdb, sector 19406337


Now if it is really failing I don't know..these SSDs are only about 1 year old.
By the way the SSD are Crucial M4 64GB.
crucial.com/store/partspecs.aspx?imodule=CT064M4SSD2
Logged
mrkroket
Hero Member
*****
Offline Offline

Posts: 495


« Reply #3 on: Thursday 10 May 2012, 12:05:29 am »

Probably EFW is not SSD aware, so it can burn some sectors fairly fast.
Logged
rosch
Full Member
***
Offline Offline

Gender: Male
Posts: 20



« Reply #4 on: Thursday 10 May 2012, 01:09:36 am »

Probably EFW is not SSD aware, so it can burn some sectors fairly fast.
Yep I agree, and I guess that's what happened. Unfortunately the sectors are not simply marked as dead as is usually done with spinning disks.
I tried to find official information about SSD support but couldn't find anything.
My 6 year old disks do the job just fine. At the same time you can almost switch off your heating system Cheesy
Logged
SharpieTM
Jr. Member
*
Offline Offline

Posts: 4


« Reply #5 on: Friday 11 May 2012, 04:22:41 am »

Probably EFW is not SSD aware, so it can burn some sectors fairly fast.

SSDs do not write data the same as conventional HDDs. They have wear-leveling, so even if you write to the same file it will level it over the whole NAND (not to the same spot on the NAND). This is all done on the SSDs controller and completely hidden from the OS or SATA controller.



Logged
SharpieTM
Jr. Member
*
Offline Offline

Posts: 4


« Reply #6 on: Friday 11 May 2012, 04:38:47 am »

Hello All,

Got a bit of an issue here, we have deployed multiple Endian firewalls.  Has anyone noticed a massive failure rate on SSD's? We have been noticing that they fail about 3 months after deployment.  The boxes with spindle drives just keep rolling on with no major issues, while the SSD boxes die between 3 and 4 months.  We have tried multiple brands of SSD's and multiple capacities as well 8gb to 64gb.  I'm stumped.  I am a major advocate for SSD's, my boss on the other hand is starting to hate them with a passion. Any help or input would be much appreciated.

Are you saying that they are completely dead after you remove them from the machine? Can't be read on another computer??

This is very unfortunate, since I thought using an SSD would bring along a nice boost if using the HTTP PROXY caching.

I wonder if the failures are more related to the 2.6.32 kernel that is being used in EFW, or the SATA controller used?? From what I read (for Ubuntu at least) 2.6.33 was the first kernel that RELIABLY provided TRIM for SSDs. I did notice that my current EFW uses 'noatime' in the fstab for mounting the drive already. I have had good success with linux and SSDs, so this was a surprising thread to read.
Logged
SharpieTM
Jr. Member
*
Offline Offline

Posts: 4


« Reply #7 on: Friday 11 May 2012, 04:48:05 am »

Some googling seems to point at 2.6.32 not being the best kernel for SSDs:

archivum.info/linux-ide@vger.kernel.org/2010-02/00243/bad-performance-with-SSD-since-kernel-version-2.6.32.html

In fact, there are a few people that complained about their SSDs not working right with that kernel, and once they used a new one, their issues went away.

So the question now is, how do we get a newer kernel on EFW?
Logged
rosch
Full Member
***
Offline Offline

Gender: Male
Posts: 20



« Reply #8 on: Friday 11 May 2012, 05:12:51 am »

Maybe EFW 2.5.1's kernel is new enough. I don't know because I don't have such a machine running at this time.

As for SSD on Ubuntu, my laptop has been running fine for over a year with Lucid.
Logged
hde
Jr. Member
*
Offline Offline

Posts: 1


« Reply #9 on: Saturday 19 May 2012, 01:33:09 am »

I can confirm this issue, we've lost 20+ systems running SSD disks - We tried different SSD manufacturers, different services running on Endian etc. but the system kept on crashing.
The only solution - if we wanted to stay with SSD drives, was to configure RAID1 (using 2xSSD's) and it seems to solve the problem.

We've seen two kinds of error codes:

ata1.01: status: { DRDY ERR }
ata1.01: error: { UNC }

And ICRC error.

Both errors result in a complete system failure so Endian software is unable
to boot up.
Logged
rosch
Full Member
***
Offline Offline

Gender: Male
Posts: 20



« Reply #10 on: Saturday 19 May 2012, 01:37:35 am »

I can confirm this issue, we've lost 20+ systems running SSD disks

I assume you also ran efw 2.4.1 on the SSD machines?
Logged
mrkroket
Hero Member
*****
Offline Offline

Posts: 495


« Reply #11 on: Saturday 19 May 2012, 04:54:50 am »

I can confirm this issue, we've lost 20+ systems running SSD disks - We tried different SSD manufacturers, different services running on Endian etc. but the system kept on crashing.
The only solution - if we wanted to stay with SSD drives, was to configure RAID1 (using 2xSSD's) and it seems to solve the problem.

We've seen two kinds of error codes:

ata1.01: status: { DRDY ERR }
ata1.01: error: { UNC }

And ICRC error.

Both errors result in a complete system failure so Endian software is unable
to boot up.
40 SSD's, wow. That isn't a casual issue.
If you swim in money is an option. I prefer having a nice reliable system rather than a buggy fast system.
If each 1 or 2 years you must change SDD's on each firewall, and control when to change, ppl to change them, downtimes, etc etc it isn't worth the effort.
A Firewall file use is totally different from a desktop machine, maybe SSD's aren't suitable for logging use.

Server fiability must be measured on years, not months. I have had a EFW firewall on a simple desktop machine from 2009, without problems.
Any IT system should priorize in that order:
-Stability and fiability
-Failover capability
-Features
-Speed
-Heat
-Noise

If you improve speed, heat and noise, but add tons on stability problems, what are you achieving?!?!
Logged
pwinterf
Full Member
***
Offline Offline

Posts: 14


« Reply #12 on: Wednesday 30 May 2012, 12:14:40 am »

can you guys tell me if your  are the same as im getting here

im using a small 30 gog SSD

regards peter

posted in general support



Posts: 9


View Profile Personal Message ()
   
   
issue with 2.4.1
« on: Yesterday at 11:23:11 pm »
   Reply with quote Modify message
Ive been having an issue with 2.4.1 for quite a while now, that initialy looked like a HW issue.

after about 48-72 hours the disk light on the box is hard on.

no response from the network , and although the menu is still displayed on the console, selecting
any of the options ie reboot gives /sbin/reboot not found.

Now ive replaced the system completely now, with the exception of the plugin network cards.

I had also run with an alternative disk for a few days , just in case that was faulty, and that did the same thing.

after several of these 48-72 hour periods of lock up/freeze of the disk IO the disk becomes corrupt beyond recovery and a re-install

and re-apply of the rules restores operation.

systems running now on a newer version of the jetway atom board that it originality ran on, but it still does it.

I cant move to 2.5.1 until ive resolved the issue with interzone dns that gets broken as soon as i upgrade, with same rule set.

Any ideas.

regards peter
Logged
rosch
Full Member
***
Offline Offline

Gender: Male
Posts: 20



« Reply #13 on: Wednesday 30 May 2012, 12:46:39 am »

Does your log give you hints about what is going on?
If it's close to the posts above then I'd say your SSDs were also failing.

About your spinning disks, you can verify them with smartctl http://smartmontools.sourceforge.net/man/smartctl.8.html.
Or use the palimpsest disk utility if you're running gnome.

So far 2.4.1 did not lockup with spinning disks for me, only on (initially) new SSDs, and that was after 6 months of operation.
Logged
pwinterf
Full Member
***
Offline Offline

Posts: 14


« Reply #14 on: Tuesday 03 July 2012, 04:32:22 pm »

Finaly caught my system in the act, usualy its so locked up
That i cant review logs, and as disk is offline it cant write
The errors so could never phathom out what was causing it.

This time i ssh in and checked the dmesg and logs which were in cache trying to write
out and loads of ata errors retrying etc, just like one of the logs in this thread.

Ill get a magnetic disk to replace ssd and see how that goes but looks like
Theres an issue with ssd and endian.

Regards peter
Logged
Pages: [1] 2  All Go Up Print 
« previous next »
Jump to:  

Page created in 0.156 seconds with 18 queries.
Powered by SMF 1.1 RC2 | SMF © 2001-2005, Lewis Media Design by 7dana.com