Discussion:
[PIC] SDHC read problem
IVP
2015-02-11 23:58:09 UTC
Permalink
Hi all,

I'm having a wonderfully frustrating time with a variety of SDHC cards

The same code which works very well with one fails to varying degrees
with others. I have two samples of each. All are readable in a PC card
slot except the Silicon Power, which I think might be crap and wonder
if they support SPI anyway. A couple of the 4GB seem OK in a camera

Strontium 8GB, everything works
Silicon Power 8GB, abject failure (one won't even format)
Sandisk 4GB, doesn't complete initialisation (endless ACMD41 loop)
Panasonic 4GB, ditto
Unknown brand 4GB, initialises but won't read more than 1 block

The last is the particular problem I'm tackling and wonder if anyone
has any ideas. Initialisation is at 250kHz, switching to 11MHz for
normal operation

Loading Image...

Loading Image...

As you can see in the analyser screens, the PIC (dsPIC33) first reads
Sector 0, then calculates which absolute Sector FAT is pointing to. In
this case, the partition from hidden sectors starts at Sector 0x2000.
From there, the s/w pulls out the directory and accesses files. Compare
with the Strontium reading 3 blocks with no problem. Directory read
is successful as I can see the file names on an LCD and retrieve them
as data. The Strontium is also in another application and will quite
happily run for hours and hours pulling data without a glitch. It's that
application which I'm modifying just to keep the problematic project
moving forward

For some reason, the Unknown brand doesn't send an 0xFE token on
the second CMD17, which is trying to read Sector 0x2000. Below is
the relevant section of the s/w.

I've tried repeating the CMD, flicking CS, delays, dummy data sends
with no difference. Some of those stuff up operation of the only working
cards, the Strontiums

Vcc is a good steady 3.29V, plenty of ceramic and bulk caps, pull-ups
on Dout, Din, Data2 and Data3. No noise apparent

The sensible me says buy more Strontium. The stubborn me wants to
find out why some cards don't work properly (which may pay dividends
in the future even though 4GB cards will probably become scarce). At
the moment I feel it's wasted effort to get the 4GB working when 8GB
are readily available (Silicon Power notwithstanding) but I'd still like
to get to the bottom of it, if there is a bottom

Help

TIA

Joe

-------------------------------------

read_fat: clr sec_adr_lo ;read sector 0
clr sec_adr_hi
mov #ram1,w8 ;into PIC RAM
call getblock

;find Boot Sector / Root Directory

clr offset ;default, no offset
mov ram1,w0 ;read first byte of sector 0
bra nz,boot ;<> 0 (E9 or EB)

;if 0 at sector 0 then look for Boot Sector
;MBR + partition, read 0x1c6 - 0x1c9

partition: mov ram1+#0x1c6,w0 ;fetch 0x1c6 and 0x1c7, LSW
mov w0,sec_adr_lo
mov w0,offset ;offset <> 0
mov ram1+#0x1c8,w0 ;fetch 0x1c8 and 0x1c9, MSW
mov w0,sec_adr_hi

mov #ram1,w8
call getblock ;read sector pointed to by
0x1c6-0x1c9

;--------------------

getblock: mov #0xff,w0
call send_w0_f

mov #cmd17,w0 ;read block
call send_w0_f
mov sec_adr_hi,w0 ;block number <31:16>
swap w0
call send_w0_f
swap w0
call send_w0_f
mov sec_adr_lo,w0 ;block number <15:0>
swap w0
call send_w0_f
swap w0
call send_w0_f
mov #0xff,w0 ;dummy CRC
call send_w0_f

gettoken: mov #0xff,w0
call send_w0_f
bclr SPI1STAT,#SPIROV
btss SPI1STAT,#SPIRBF
bra $-2
mov SPI1BUF,w0
mov #0xfe,w1
xor w1,w0,w0
bra nz,gettoken

send_w0_f: mov w0,SPI1BUF ;load data into SPI buffer
call usec ;wait at high speed (some dsPIC
SPI
flags are broken)
return



-----
No virus found in this message.
Checked by AVG - www.avg.com
Version: 2015.0.5646 / Virus Database: 4284/9097 - Release Date: 02/11/15
--
http://www.piclist.com/techref/piclist PIC/SX FAQ & list archive
View/change your membership options at
http://mailman.mit.edu/mailman/listinfo/piclist
James Cameron
2015-02-12 00:41:16 UTC
Permalink
One of those nasty interoperability problems.

You're dealing with firmware in the card, which runs the host
communications and the flash translation layer (FTL).

The compliance against standards varies considerably across the
market. My guess is you've found a bug in the firmware, a corner case
that is exercised by your pattern of access, but is not tested by the
device manufacturer or firmware vendor.

Some ideas:

- switch to a pattern of access that is known to work,

- randomly experiment with patterns of access until one might be found
to work,

- raise issue with the device manufacturer, who will escalate to the
firmware vendor,

- explore the various documented instances of firmware upload
processes, in case you can find one that gets you more options.

I'm more familiar with using SD host controllers with these things.

Perhaps the card wants a supply voltage change; some SDHC or SDXC
cards require 1.8 V at higher transfer speeds.
--
James Cameron
http://quozl.linux.org.au/
--
http://www.piclist.com/techref/piclist PIC/SX FAQ & list archive
View/change your membership options at
http://mailman.mit.edu/mailman/listinfo/piclist
IVP
2015-02-12 01:46:32 UTC
Permalink
Post by James Cameron
The compliance against standards varies considerably across
the market. My guess is you've found a bug in the firmware, a
corner case that is exercised by your pattern of access, but is
not tested by the device manufacturer or firmware vendor.
I probably shouldn't expect them to, as SPI seems more of a
courtesy vs the licensed 4-bit i/f encountered commercially
Post by James Cameron
- switch to a pattern of access that is known to work
That's partly the problem. As mentioned, I have a scheme that's
very reliable with one type of card which doesn't work with others.
Post by James Cameron
Perhaps the card wants a supply voltage change; some SDHC
or SDXC cards require 1.8 V at higher transfer speeds.
I have tinkered with the s/w flow quite a bit, but not tried changing
to 1.8V. In the PC card reader which I use to transfer files to the
cards Vdd is 3.4V

Joe



-----
No virus found in this message.
Checked by AVG - www.avg.com
Version: 2015.0.5646 / Virus Database: 4284/9097 - Release Date: 02/11/15
--
http://www.piclist.com/techref/piclist PIC/SX FAQ & list archive
View/change your membership options at
http://mailman.mit.edu/mailman/listinfo/piclist
James Cameron
2015-02-12 02:49:26 UTC
Permalink
Post by IVP
Post by James Cameron
The compliance against standards varies considerably across
the market. My guess is you've found a bug in the firmware, a
corner case that is exercised by your pattern of access, but is
not tested by the device manufacturer or firmware vendor.
I probably shouldn't expect them to, as SPI seems more of a
courtesy vs the licensed 4-bit i/f encountered commercially
Post by James Cameron
- switch to a pattern of access that is known to work
That's partly the problem. As mentioned, I have a scheme that's
very reliable with one type of card which doesn't work with others.
Not unusual. Many different implementations of firmware in those
cards. Can't even rely on them staying the same over a vendor's
product lifetime.
Post by IVP
Post by James Cameron
Perhaps the card wants a supply voltage change; some SDHC
or SDXC cards require 1.8 V at higher transfer speeds.
I have tinkered with the s/w flow quite a bit, but not tried changing
to 1.8V. In the PC card reader which I use to transfer files to the
cards Vdd is 3.4V
Does that reader use the higher transfer rate? Might be time to look
at the conversation between the reader and the card.
--
James Cameron
http://quozl.linux.org.au/
--
http://www.piclist.com/techref/piclist PIC/SX FAQ & list archive
View/change your membership options at
http://mailman.mit.edu/mailman/listinfo/piclist
John Gardner
2015-02-12 03:01:57 UTC
Permalink
... look at the conversation between the reader and the card...

That's a good idea - Thanks, James...
--
http://www.piclist.com/techref/piclist PIC/SX FAQ & list archive
View/change your membership options at
http://mailman.mit.edu/mailman/listinfo/piclist
IVP
2015-02-12 09:06:33 UTC
Permalink
Post by James Cameron
Does that reader use the higher transfer rate? Might be time to
look at the conversation between the reader and the card.
On the scope it appears to be around 19MHz per data pin at 3.4V

I went out this evening for a look at what's available. No 4GB in
sight and a Transcend 8GB was on sale so I bought one. Worked
first time to my great relief so I'll get some more

Maybe it's just a peculiarity with 4GB. Spent too many fruitless days
on it already (and on the initial project to no avail, which is why I
ended up using the Strontium 8GB in the first place), will have to put
them on the backburner and get on with the project using 8GB. No
doubt the 8GB will be gone soon as well

The 4GB work OK in other devices so they'll get used

At some point I'd like to back-engineer the 4-bit protocol, in as
simple a form as I expect to need it

At some point

Joe


-----
No virus found in this message.
Checked by AVG - www.avg.com
Version: 2015.0.5646 / Virus Database: 4284/9097 - Release Date: 02/11/15
--
http://www.piclist.com/techref/piclist PIC/SX FAQ & list archive
View/change your membership options at
http://mailman.mit.edu/mailman/listinfo/piclist
Christopher Head
2015-02-12 18:18:21 UTC
Permalink
Post by IVP
At some point I'd like to back-engineer the 4-bit protocol, in as
simple a form as I expect to need it
Hi,
If you grab something like the STM32F4, it has a built in SD host controller that uses the 1-bit/4-bit interface instead of SPI. The host controller only handles the lowest level interfacing stuff, but that stuff is precisely what’s redacted from the publicly available SD specification. So, between them, with only a tiny bit of intuition and reading between lines, you can get a system going.
--
Christopher Head
--
http://www.piclist.com/techref/piclist PIC/SX FAQ & list archive
View/change your membership options at
http://mailman.mit.edu/mailm
IVP
2015-02-12 19:28:41 UTC
Permalink
Post by Christopher Head
If you grab something like the STM32F4
Thanks, I'm looking it up, sounds like a research project for the
next spare weekend

The card reader I have here (which has sockets for 5 types
of card + USB) uses a Realtek RTS5161 and a scattering of
R and C, that's about it

Joe


-----
No virus found in this message.
Checked by AVG - www.avg.com
Version: 2015.0.5646 / Virus Database: 4284/9102 - Release Date: 02/12/15
--
http://www.piclist.com/techref/piclist PIC/SX FAQ & list archive
View/change your membership options at
http://mailman.mit.edu/mailman/listinfo/piclist
James Holland
2015-02-12 19:02:56 UTC
Permalink
Date: Thu, 12 Feb 2015 12:58:09 +1300
Subject: [PIC] SDHC read problem
Hi all,
I'm having a wonderfully frustrating time with a variety of SDHC cards
The same code which works very well with one fails to varying degrees
with others. I have two samples of each. All are readable in a PC card
slot except the Silicon Power, which I think might be crap and wonder
if they support SPI anyway. A couple of the 4GB seem OK in a camera
Your problem sounds very familiar, I do recall, from a few years back, many devices that would work with a 2GB card but not with many 4GB cards. Sandisk has this to say;

http://kb.sandisk.com/app/answers/detail/a_id/54/kw/SD%20SDHC/r_id/101834

Given that 4GB is an SDHC card and SPI is an SD format then I would think that unless it specifically states that it is SD compatible its going to be a pig in a poke.
James
--
http://www.piclist.com/techref/piclist PIC/SX FAQ & list archive
View/change your membership options at
http://mailman.mit.edu/mailman/listinfo/piclist
IVP
2015-02-12 19:41:54 UTC
Permalink
Post by James Holland
Sandisk has this to say;
Thanks.

Things do seem to have settled down starting at 8GB. Which
is fortunate because that looks to be entry level in retail stores
now

I'm also thinking of them for archive storage, rather than DVD,
particularly for prority data like product projects, important
photos and documents etc

Joe


-----
No virus found in this message.
Checked by AVG - www.avg.com
Version: 2015.0.5646 / Virus Database: 4284/9102 - Release Date: 02/12/15
--
http://www.piclist.com/techref/piclist PIC/SX FAQ & list archive
View/change your membership options at
http://mailman.mit.edu/mailman/listinfo/piclist
RussellMc
2015-02-12 21:41:42 UTC
Permalink
Post by IVP
I'm also thinking of them for archive storage, rather than DVD,
particularly for prority data like product projects, important
photos and documents etc
Hopefully multiple copies.


When they die they can "just die". Depending on the controller and system
used this can be essentially unrecoverable even if the actual data is still
there.



Russell.
--
http://www.piclist.com/techref/piclist PIC/SX FAQ & list archive
View/change your membership options at
http://mailman.mit.edu/mailman/listinfo/piclist
Harrison Cooper
2015-02-17 17:39:49 UTC
Permalink
Just adding my nickels worth.....if your planning on putting the SD card away in a drawer, holding archived information, then you better get it out at least once a month, and plug it into something that will read it, and let the controller refresh the charge. Otherwise you will end up with corrupted data or worse, nothing at all. Flash may be non-volatile but doesn't retain the charge forever. Obviously the same goes for USB thumb drives and such.

-----Original Message-----
From: piclist-***@mit.edu [mailto:piclist-***@mit.edu] On Behalf Of IVP
Sent: Thursday, February 12, 2015 12:42 PM
To: Microcontroller discussion list - Public.
Subject: Re: [PIC] SDHC read problem
Post by James Holland
Sandisk has this to say;
Thanks.

Things do seem to have settled down starting at 8GB. Which is fortunate because that looks to be entry level in retail stores now

I'm also thinking of them for archive storage, rather than DVD, particularly for prority data like product projects, important photos and documents etc

Joe


-----
No virus found in this message.
Checked by AVG - www.avg.com
Version: 2015.0.5646 / Virus Database: 4284/9102 - Release Date: 02/12/15

--
http://www.piclist.com/techref/piclist PIC/SX FAQ & list archive View/change your membership options at http://mailman.mit.edu/mailman/listinfo/piclist

________________________________

PLEASE NOTE: The information contained in this electronic mail message is intended only for the use of the designated recipient(s) named above. If the reader of this message is not the intended recipient, you are hereby notified that you have received this message in error and that any review, dissemination, distribution, or copying of this message is strictly prohibited. If you have received this communication in error, please notify the sender by telephone or e-mail (as shown above) immediately and destroy any and all copies of this message in your possession (whether hard copies or electronically stored copies).
--
http://www.piclist.com/techref/piclist PIC/SX FAQ & list archive
View/change your membership options at
http://mailman.mit.edu/mailman/listinfo/piclist
James Cameron
2015-02-17 19:45:48 UTC
Permalink
Post by Harrison Cooper
Just adding my nickels worth.....if your planning on putting the SD
card away in a drawer, holding archived information, then you better
get it out at least once a month, and plug it into something that
will read it, and let the controller refresh the charge. Otherwise
you will end up with corrupted data or worse, nothing at all. Flash
may be non-volatile but doesn't retain the charge forever.
Obviously the same goes for USB thumb drives and such.
I agree in principle, but I'd change that to once a year.

And don't forget devices with embedded Flash or eMMC.

The Flash translation layer controllers can also do ECC across the
data, and are capable of recovering from a number of discharged cells.
--
James Cameron
http://quozl.linux.org.au/
--
http://www.piclist.com/techref/piclist PIC/SX FAQ & list archive
View/change your membership options at
http://mailman.mit.edu/mailman/listinfo/piclist
IVP
2015-02-17 21:44:58 UTC
Permalink
Post by Harrison Cooper
let the controller refresh the charge
Thanks, I didn't know that

When you say "refresh the charge" do you mean just power it
up or actually re-write all the data ?

That would be a pain and more hassle than copying a DVD
every few years

Joe


-----
No virus found in this message.
Checked by AVG - www.avg.com
Version: 2015.0.5646 / Virus Database: 4284/9133 - Release Date: 02/17/15
--
http://www.piclist.com/techref/piclist PIC/SX FAQ & list archive
View/change your membership options at
http://mailman.mit.edu/mailman/listinfo/piclist
Harrison Cooper
2015-02-17 22:18:51 UTC
Permalink
There are several things that occur when it gets powered up, garbage collection, etc so its not that you need to completely rewrite the data, but simply accessing the device should suffice in restoring the charge. As these get bigger in size, the technology shrinks and thus the actual charge becomes smaller....smaller charge relates to not lasting as long due to leakage. All I am saying is for any of these technologies....tape, DVD, NAND, everything degrades in time. They do make some DVD's that use a technology that is claimed to last a very long time, but it takes a special writer to burn those.

For me, I use an ISP that has 5-nines redundancy and ftp my archives to their servers and cross my fingers...

-----Original Message-----
From: piclist-***@mit.edu [mailto:piclist-***@mit.edu] On Behalf Of IVP
Sent: Tuesday, February 17, 2015 2:45 PM
To: Microcontroller discussion list - Public.
Subject: Re: [PIC] SDHC read problem
Post by Harrison Cooper
let the controller refresh the charge
Thanks, I didn't know that

When you say "refresh the charge" do you mean just power it up or actually re-write all the data ?

That would be a pain and more hassle than copying a DVD every few years

Joe


-----
No virus found in this message.
Checked by AVG - www.avg.com
Version: 2015.0.5646 / Virus Database: 4284/9133 - Release Date: 02/17/15

--
http://www.piclist.com/techref/piclist PIC/SX FAQ & list archive View/change your membership options at http://mailman.mit.edu/mailman/listinfo/piclist

________________________________

PLEASE NOTE: The information contained in this electronic mail message is intended only for the use of the designated recipient(s) named above. If the reader of this message is not the intended recipient, you are hereby notified that you have received this message in error and that any review, dissemination, distribution, or copying of this message is strictly prohibited. If you have received this communication in error, please notify the sender by telephone or e-mail (as shown above) immediately and destroy any and all copies of this message in your possession (whether hard copies or electronically stored copies).
--
http://www.piclist.com/techref/piclist PIC/SX FAQ & list archive
View/change your membership options at
http://mailman.mit.edu/mailman/listinfo/piclist
IVP
2015-02-17 23:03:44 UTC
Permalink
I too use some on-line storage, and of course schematics and
code can be printed out. Hard copy still has a very relevant
place

A few months ago I retrieved data from zip drive disks which are
several years old. My own zip drive died from the 'click of death'
a long time ago. They didn't really become the "next big thing in
storage" did they

Coincidentally I happened to read an article this morning which
mentions bit rot

http://www.nzherald.co.nz/opinion/news/article.cfm?c_id=466&objectid=11403558

I don't think it will become so bad that a format like jpg or bmp
can't be read. A photo on good stock should last a very long time

Joe



-----
No virus found in this message.
Checked by AVG - www.avg.com
Version: 2015.0.5646 / Virus Database: 4284/9133 - Release Date: 02/17/15
--
http://www.piclist.com/techref/piclist PIC/SX FAQ & list archive
View/change your membership options at
http://mailman.mit.edu/mailman/listinfo/piclist
RussellMc
2015-02-17 23:30:34 UTC
Permalink
Based (perhaps) on

http://www.bbc.com/news/science-environment-31450389
Post by IVP
Coincidentally I happened to read an article this morning which
mentions bit rot
http://www.nzherald.co.nz/opinion/news/article.cfm?c_id=466&objectid=11403558
I don't think it will become so bad that a format like jpg or bmp
can't be read. A photo on good stock should last a very long time
Joe
--
http://www.piclist.com/techref/piclist PIC/SX FAQ & list archive
View/change your membership options at
http://mailman.mit.edu/mailman/listinfo/piclist
Jesse Lackey
2015-02-17 23:34:59 UTC
Permalink
My 2c on this topic...

After a near-miss 8 years ago that would have lost all development data
I got a regular backup system in place, and another near-miss 5 years
ago when my shop got burgled I added online backups as well.

I have a main monster work computer and a low-end laptop, and I want to
have backups and sync between them. Windows 7 on both.

I have 5 directories I keep backed up:
Development
Email
Media (pics)
Random Documents
Zips of software/drivers I commonly use (to make setting up a new laptop
faster, for example)

Anything not in those directories I know is not going to be backed up
often. Total size: 30K files taking 47gb. "Treesize" is a good utility
for looking thru directory structures for unneeded stuff.

I use "CopyTo synchronizer" to maintain sync between the lappy and the
desktop. I tell it which way to write, and off it goes. Handy. So I
tell it to write from desktop to the lappy, go work out of the house
when possible or @ client, and sync back at night.

So my laptop and desktop are typically in sync within a few days. So
all data is now on 2 harddrives.

I use "IDrive" to backup those directories online. I do this every few
days at most - daily when developing fast. So now it is all online as
well, and possible to retrieve earlier versions to some degree.

Finally every several weeks, Norton Ghost does a backup of the main
computer's whole harddrive to another drive on the same machine. It
reminds me so that I don't forget to do it.

So all the data is on 2 harddrives on main computer (main drive is 512gb
flash, actually), laptop harddrive, and online.

This works pretty well... takes maybe 8 min to do the file sync between
computers, most of the time is scanning 30K files rather than the copy.
Of old work stuff that had many directories of many small files I made
a single .zip to speed it up.

Not sure how much data the O.P. has, but the above would scale fine to
at least 5X the size since of course most of the data doesn't change.
The main thing is to not have zillions of small files (Eclipse IDE I'm
looking at you), if you can keep that under control you're set.

Ok back to debugging I2C state machines
J
Post by Harrison Cooper
There are several things that occur when it gets powered up, garbage
collection, etc so its not that you need to completely rewrite the
data, but simply accessing the device should suffice in restoring the
charge. As these get bigger in size, the technology shrinks and thus
the actual charge becomes smaller....smaller charge relates to not
lasting as long due to leakage. All I am saying is for any of these
technologies....tape, DVD, NAND, everything degrades in time. They
do make some DVD's that use a technology that is claimed to last a
very long time, but it takes a special writer to burn those.
For me, I use an ISP that has 5-nines redundancy and ftp my archives
to their servers and cross my fingers...
February 17, 2015 2:45 PM To: Microcontroller discussion list -
Public. Subject: Re: [PIC] SDHC read problem
Post by Harrison Cooper
let the controller refresh the charge
Thanks, I didn't know that
When you say "refresh the charge" do you mean just power it up or
actually re-write all the data ?
That would be a pain and more hassle than copying a DVD every few years
Joe
----- No virus found in this message. Checked by AVG - www.avg.com
Version: 2015.0.5646 / Virus Database: 4284/9133 - Release Date: 02/17/15
-- http://www.piclist.com/techref/piclist PIC/SX FAQ & list archive
View/change your membership options at
http://mailman.mit.edu/mailman/listinfo/piclist
________________________________
PLEASE NOTE: The information contained in this electronic mail
message is intended only for the use of the designated recipient(s)
named above. If the reader of this message is not the intended
recipient, you are hereby notified that you have received this
message in error and that any review, dissemination, distribution, or
copying of this message is strictly prohibited. If you have received
this communication in error, please notify the sender by telephone or
e-mail (as shown above) immediately and destroy any and all copies of
this message in your possession (whether hard copies or
electronically stored copies).
--
http://www.piclist.com/techref/piclist PIC/SX FAQ & list archive
View/change your membership options at
http://mailman.mit.edu/mailman/listinfo/piclist
John Ferrell
2015-02-18 17:31:14 UTC
Permalink
What do you do to keep the deletes in sync on the two computers?
Post by Jesse Lackey
My 2c on this topic...
After a near-miss 8 years ago that would have lost all development data
I got a regular backup system in place, and another near-miss 5 years
ago when my shop got burgled I added online backups as well.
I have a main monster work computer and a low-end laptop, and I want to
have backups and sync between them. Windows 7 on both.
Development
Email
Media (pics)
Random Documents
Zips of software/drivers I commonly use (to make setting up a new laptop
faster, for example)
Anything not in those directories I know is not going to be backed up
often. Total size: 30K files taking 47gb. "Treesize" is a good utility
for looking thru directory structures for unneeded stuff.
I use "CopyTo synchronizer" to maintain sync between the lappy and the
desktop. I tell it which way to write, and off it goes. Handy. So I
tell it to write from desktop to the lappy, go work out of the house
So my laptop and desktop are typically in sync within a few days. So
all data is now on 2 harddrives.
--
John Ferrell W8CCW
Julian NC 27283
It is better to walk alone,
than with a crowd going the wrong direction.
--Diane Grant
--
http://www.piclist.com/techref/piclist PIC/SX FAQ & list archive
View/change your membership options at
http://mailman.mit.edu/mailman/listinfo/piclist
Jesse Lackey
2015-02-18 23:25:00 UTC
Permalink
Hi - CopyTo synchronizer has this as an option, so my deletes/renames
are replicated. For my use this is better, but one does have to be
careful. If you choose to sync the wrong way you'll overwrite your new
work. BUT- I have two "projects" (saved settings) in CopyTo clearly
labeled, and I have it show me the optional preview/confirm of what it
is going to do before it does it. I haven't lost anything ever.

J
Post by John Ferrell
What do you do to keep the deletes in sync on the two computers?
Post by Jesse Lackey
My 2c on this topic...
After a near-miss 8 years ago that would have lost all development data
I got a regular backup system in place, and another near-miss 5 years
ago when my shop got burgled I added online backups as well.
I have a main monster work computer and a low-end laptop, and I want to
have backups and sync between them. Windows 7 on both.
Development
Email
Media (pics)
Random Documents
Zips of software/drivers I commonly use (to make setting up a new laptop
faster, for example)
Anything not in those directories I know is not going to be backed up
often. Total size: 30K files taking 47gb. "Treesize" is a good utility
for looking thru directory structures for unneeded stuff.
I use "CopyTo synchronizer" to maintain sync between the lappy and the
desktop. I tell it which way to write, and off it goes. Handy. So I
tell it to write from desktop to the lappy, go work out of the house
So my laptop and desktop are typically in sync within a few days. So
all data is now on 2 harddrives.
--
http://www.piclist.com/techref/piclist PIC/SX FAQ & list archive
View/change your membership options at
http://mailman.mit.edu/mailman/listinfo/piclist
John Ferrell
2015-02-19 19:42:01 UTC
Permalink
I have a similar need. I intend to put a batch file in place that will
save the deleted file(and its original location) to a "Dumpster" file
and another batch file that finds the backup match and deletes them both.
Post by Jesse Lackey
Hi - CopyTo synchronizer has this as an option, so my deletes/renames
are replicated. For my use this is better, but one does have to be
careful. If you choose to sync the wrong way you'll overwrite your new
work. BUT- I have two "projects" (saved settings) in CopyTo clearly
labeled, and I have it show me the optional preview/confirm of what it
is going to do before it does it. I haven't lost anything ever.
J
--
John Ferrell W8CCW
Julian NC 27283
It is better to walk alone,
than with a crowd going the wrong direction.
--Diane Grant
--
http://www.piclist.com/techref/piclist PIC/SX FAQ & list archive
View/change your membership options at
http://mailman.mit.edu/mailman/listinfo/piclist
Allen Mulvey
2015-02-19 20:52:08 UTC
Permalink
You might want to look at Robocopy. That is what I use. It
has parameters to cover just about everything including
mirror. Mirror caries over deletes as well as additions and
changes. I don't use mirror because I want to retain all
original files.

Allen
Post by Harrison Cooper
-----Original Message-----
Sent: Thursday, February 19, 2015 2:42 PM
To: Microcontroller discussion list - Public.
Subject: Re: [PIC] SDHC read problem
I have a similar need. I intend to put a batch file in
place that
Post by Harrison Cooper
will
save the deleted file(and its original location) to a
"Dumpster"
Post by Harrison Cooper
file
and another batch file that finds the backup match and
deletes
Post by Harrison Cooper
them both.
Post by Jesse Lackey
Hi - CopyTo synchronizer has this as an option, so my
deletes/renames
Post by Jesse Lackey
are replicated. For my use this is better, but one does
have
Post by Harrison Cooper
to be
Post by Jesse Lackey
careful. If you choose to sync the wrong way you'll
overwrite your new
Post by Jesse Lackey
work. BUT- I have two "projects" (saved settings) in
CopyTo
Post by Harrison Cooper
clearly
Post by Jesse Lackey
labeled, and I have it show me the optional
preview/confirm
Post by Harrison Cooper
of what it
Post by Jesse Lackey
is going to do before it does it. I haven't lost
anything ever.
Post by Harrison Cooper
Post by Jesse Lackey
J
--
John Ferrell W8CCW
Julian NC 27283
It is better to walk alone,
than with a crowd going the wrong direction.
--Diane Grant
--
http://www.piclist.com/techref/piclist PIC/SX FAQ & list
archive
View/change your membership options at
http://mailman.mit.edu/mailman/listinfo/piclist
--
http://www.piclist.com/techref/piclist PIC/SX FAQ & list archive
View/change your membership options at
http://mailman.mit.edu/mailman/listinfo/piclist
John Ferrell
2015-02-19 21:16:17 UTC
Permalink
As I recall,
ROBOCOPY only considers folders. Also, the sync example will lead you to
delete unmatched copies.

To meet my needs any deletes must be very explicit.
My data is not that important, I am just studying the process.
Post by Allen Mulvey
You might want to look at Robocopy. That is what I use. It
has parameters to cover just about everything including
mirror. Mirror caries over deletes as well as additions and
changes. I don't use mirror because I want to retain all
original files.
Allen
--
John Ferrell W8CCW
Julian NC 27283
It is better to walk alone,
than with a crowd going the wrong direction.
--Diane Grant
--
http://www.piclist.com/techref/piclist PIC/SX FAQ & list archive
View/change your membership options at
http://mailman.mit.edu/mailman/listinfo/piclist
IVP
2015-03-17 04:04:29 UTC
Permalink
Hi all,

I wonder if I've found another, interesting if frustrating, failure

I sorted out the previous s/w read problem by paring the re-
initialisation routine back to the barest necessities and so the
read delay now no longer impacts downstream processing

So I expectantly put the card and application s/w on test. You
may recall this is an 8GB Strontium SDHC and a dsPIC

There are about 60 files, 300MB in total, containg various
sets of data that are retrieved as and when needed

For the purposes of the test I chose two and accessed them
continuously and alternately about 1s apart

Sometime during the night, after probably 12 hours running,
something went wrong. Clk was just going and going when I
got up. "Oh", I thought. What should happen (and was for
several hours before I went to bed) is that the dsPIC holds a
Busy line up whilst reading and processing the file data. It
drops the Busy line, telling the 18F it's OK to try and access
another file. So, when I looked at the system without powering
down, Busy was high and Clk was running. For some reason
it apparently hadn't finished reading the file. It's only 58kB and
is normally read in a fraction of a second. It was probably in
the state I found it in for quite some time

On subsequent re-boots reading that particular file causes the
same incompletion problem. But it fails to complete anywhere
from 1s to 15 minutes, which is from 1 to 900 reads at ~1s
between reads. ie after a re-boot it could fail at any time.

With all the testing taken into consideration, that file has been
access maybe 40,000 times over a couple of weeks, the
majority of them on that one day it failed, perhaps 30,000. As
the FAT is copied into RAM I'm sure it's the file that's broken.
If it wasn't in RAM then the FAT would have been read twice
as often as the file

I checked the board but couldn't see any intermittent electrical
problem. An analyser grab of a failure didn't produce anything
useful, except to show what I already knew. Clk simply ran out
of data to get and didn't turn off as the byte count wasn't finished

What seems to have fixed the problem is re-formatting and re-
loading the card. It's now been running for a day without incident
and I'll leave it going to see if it fails again in the same way

If reading a file is going to cause it to slowly deteriorate then it
seems I'll have to put some sort of block refresh regime in,
otherwise the application is going to fail in months rather than
decades.

Has anyone had experience or some knowledge of what I've
seen and how it appears to have been corrected ?

TIA

Joe


-----
No virus found in this message.
Checked by AVG - www.avg.com
Version: 2015.0.5751 / Virus Database: 4306/9318 - Release Date: 03/16/15
--
http://www.piclist.com/techref/piclist PIC/SX FAQ & list archive
View/change your membership options at
http://mailman.mit.edu/mailman/listinfo/piclist
James Cameron
2015-03-17 04:22:38 UTC
Permalink
Sounds like a buggy FTL (flash translation layer) that doesn't
properly handle corruption of flash cells. You're meant to be given
an error response.

Replace the FTL.
--
James Cameron
http://quozl.linux.org.au/
--
http://www.piclist.com/techref/piclist PIC/SX FAQ & list archive
View/change your membership options at
http://mailman.mit.edu/mailman/listinfo/piclist
IVP
2015-03-17 07:57:36 UTC
Permalink
Post by James Cameron
Sounds like a buggy FTL (flash translation layer) that doesn't
properly handle corruption of flash cells. You're meant to be
given an error response.
Well, I'll have to wait and see if the card fails again in the next
day or so using the same rate of reads. If it does then perhaps I
can look for something like a Data Error Token. Regrettably I
didn't save the analyser trace which I believed showed the fail
so I can't say whether there was a DET

Due to the nature of the data, a corrupt bit or two isn't that
important. Imagine one pixel of a jpeg being not quite right. I
expected to be able read a block completely, regardless of
whether its contents are correct or not, but it looks like the
card might not actually be able to read the cell(s), whatever
value is there and sends a one-byte DET instead of a block

At this stage it's hard to say what the failure mechanism is. After
a re-format and re-load the card seems to be behaving, so it
"can't" be something physically broken. If it fails again soon then
I'll have something to work on. If it doesn't then I'll be haunted
by what might happen when it's out the door

Joe


-----
No virus found in this message.
Checked by AVG - www.avg.com
Version: 2015.0.5751 / Virus Database: 4306/9319 - Release Date: 03/16/15
--
http://www.piclist.com/techref/piclist PIC/SX FAQ & list archive
View/change your membership options at
http://mailman.mit.edu/mailman/listinfo/piclist
Isaac Marino Bavaresco
2015-03-17 10:48:48 UTC
Permalink
Did you check your signals with a good oscilloscope? What you are
experiencing could it be noise or transmission line effects, etc?

Isaac
Post by IVP
Hi all,
I wonder if I've found another, interesting if frustrating, failure
I sorted out the previous s/w read problem by paring the re-
initialisation routine back to the barest necessities and so the
read delay now no longer impacts downstream processing
So I expectantly put the card and application s/w on test. You
may recall this is an 8GB Strontium SDHC and a dsPIC
There are about 60 files, 300MB in total, containg various
sets of data that are retrieved as and when needed
For the purposes of the test I chose two and accessed them
continuously and alternately about 1s apart
Sometime during the night, after probably 12 hours running,
something went wrong. Clk was just going and going when I
got up. "Oh", I thought. What should happen (and was for
several hours before I went to bed) is that the dsPIC holds a
Busy line up whilst reading and processing the file data. It
drops the Busy line, telling the 18F it's OK to try and access
another file. So, when I looked at the system without powering
down, Busy was high and Clk was running. For some reason
it apparently hadn't finished reading the file. It's only 58kB and
is normally read in a fraction of a second. It was probably in
the state I found it in for quite some time
On subsequent re-boots reading that particular file causes the
same incompletion problem. But it fails to complete anywhere
from 1s to 15 minutes, which is from 1 to 900 reads at ~1s
between reads. ie after a re-boot it could fail at any time.
With all the testing taken into consideration, that file has been
access maybe 40,000 times over a couple of weeks, the
majority of them on that one day it failed, perhaps 30,000. As
the FAT is copied into RAM I'm sure it's the file that's broken.
If it wasn't in RAM then the FAT would have been read twice
as often as the file
I checked the board but couldn't see any intermittent electrical
problem. An analyser grab of a failure didn't produce anything
useful, except to show what I already knew. Clk simply ran out
of data to get and didn't turn off as the byte count wasn't finished
What seems to have fixed the problem is re-formatting and re-
loading the card. It's now been running for a day without incident
and I'll leave it going to see if it fails again in the same way
If reading a file is going to cause it to slowly deteriorate then it
seems I'll have to put some sort of block refresh regime in,
otherwise the application is going to fail in months rather than
decades.
Has anyone had experience or some knowledge of what I've
seen and how it appears to have been corrected ?
TIA
Joe
-----
No virus found in this message.
Checked by AVG - www.avg.com
Version: 2015.0.5751 / Virus Database: 4306/9318 - Release Date: 03/16/15
--
http://www.piclist.com/techref/piclist PIC/SX FAQ & list archive
View/change your membership options at
http://mailman.mit.edu/mailman/listinfo/piclist
IVP
2015-03-17 11:56:50 UTC
Permalink
Post by Isaac Marino Bavaresco
Did you check your signals with a good oscilloscope? What you
are experiencing could it be noise or transmission line effects, etc?
That was my first thought. Perhaps some mains noise or interruption
mid-read. Hard to tell

If that's what had happened I expected it to run normally after being
powered up. That's when I discovered it would run intermittently. I
must have re-booted it at least a couple of dozen times the next
morning and each time it would stop after a short while and always
on the same one of the two files.

I've had the analyser on all comms and power lines but not found
any disturbances before or during read failure

It's most perplexing. I've looked around the web but not yet found
any mention of reads causing data degradation. If it doesn't then it's
hard to explain how the file can be read successfully thousands of
times and then start giving trouble.

The seemingly random time to failure after each power-up is also
odd. You'd think that it would not work instantly rather than at any
time up to a few minutes.

All I can do for the time being is just let it run until it fails. After the
format/reload the file has been accessed almost 20,000 times so
perhaps it'll fail again tomorrow.

If/when it does I hope I'll learn something useful.

Joe



-----
No virus found in this message.
Checked by AVG - www.avg.com
Version: 2015.0.5751 / Virus Database: 4306/9319 - Release Date: 03/16/15
--
http://www.piclist.com/techref/piclist PIC/SX FAQ & list archive
View/change your membership options at
http://mailman.mit.edu/mailman/listinfo/piclist
Christopher Head
2015-03-17 17:08:21 UTC
Permalink
Post by IVP
It's most perplexing. I've looked around the web but not yet found
any mention of reads causing data degradation. If it doesn't then it's
hard to explain how the file can be read successfully thousands of
times and then start giving trouble.
Maybe this is a silly question, but after the failure started happening, rather than reformatting the card, did you try plugging the card into a computer and reading the suspect file? If it also fails, there’s your answer: the card is broken or data is degrading, and possibly the error message will tell you why (the proper, non-SPI protocol has lots of error codes, though whether your OS will plumb through and show them to the user is another question). If it succeeds, your code or circuit is at fault instead.
--
Christopher Head
--
http://www.piclist.com/techref/piclist PIC/SX FAQ & list archive
View/change your membership options at
John Gardner
2015-03-17 18:02:35 UTC
Permalink
Hi Joe -

Perhaps relevant - Certainly interesting...

http://www.bunniestudios.com/blog/?page_id=1022

The author: http://en.wikipedia.org/wiki/Bunnie_Huang
--
http://www.piclist.com/techref/piclist PIC/SX FAQ & list archive
View/change your membership options at
http://mailman.mit.edu/mailman/listinfo/piclist
John Ferrell
2015-03-17 19:43:28 UTC
Permalink
The product involved also sounds very interesting...
Post by John Gardner
Hi Joe -
Perhaps relevant - Certainly interesting...
http://www.bunniestudios.com/blog/?page_id=1022
The author: http://en.wikipedia.org/wiki/Bunnie_Huang
--
John Ferrell W8CCW
Julian NC 27283
It is better to walk alone,
than with a crowd going the wrong direction.
--Diane Grant
--
http://www.piclist.com/techref/piclist PIC/SX FAQ & list archive
View/change your membership options at
http://mailman.mit.edu/mailman/listinfo/piclist
IVP
2015-03-17 23:11:56 UTC
Permalink
Post by John Gardner
Perhaps relevant - Certainly interesting...
http://www.bunniestudios.com/blog/?page_id=1022
Thanks

Cold comfort

Joe


-----
No virus found in this message.
Checked by AVG - www.avg.com
Version: 2015.0.5751 / Virus Database: 4306/9326 - Release Date: 03/17/15
--
http://www.piclist.com/techref/piclist PIC/SX FAQ & list archive
View/change your membership options at
http://mailman.mit.edu/mailman/listinfo/piclist
John Gardner
2015-03-17 23:48:41 UTC
Permalink
...Cold comfort...

I've had excellent results with older Sandisk cards; 128 MB, for instance,

and enough trouble with SDHC & on to sour me on using them in any

serious capacity.

Perhaps coughing up $2.5K per annum (or whatever it is now) to the SDA

would rectify this; it's not an option for me...
--
http://www.piclist.com/techref/piclist PIC/SX FAQ & list archive
View/change your membership options at
http://mailman.mit.edu/mailman/listinfo/piclist
IVP
2015-03-17 22:40:54 UTC
Permalink
Post by Christopher Head
did you try plugging the card into a computer and reading the
suspect file?
The card has stopped working again, somewhere between 21000
and 32000 reads of that file. It was OK before I went to bed, 11
hours after refreshing it. It's also intermittently failing to read after
re-powering, just as it was doing yesterday. I think that rules out
external influence (most likely)

Disappointing. The project has had so much other work put in

The suspect file does download to a PC. It's reported to be the
same size as the original but that information is probably coming
from the header, not a received byte count

I'll have to compare the actual contents of the original vs the failed
to see if it's really corrupted, and also see if I can find a Data Error
response from the card

A project I did in Nov 2011 with a 4GB card has a similar read
pattern so I'm getting it sent back for testing. AFAIK it's been
powered up since installation, although the user isn't aware of a
problem. Have to admit that user isn't usually very observant in
matters like this. To them it could look to be working, as the
card operation doesn't affect the visual functions.

If it too has failed then I'll have learned a valuable lesson and
need to devise a Plan B

The two smallest files are generally the most often read so it
might be as easy as copying them to a 128kB RAM and reading
from there. And keep a back-up CD for re-installing (but how
long is a CD going to last ?)

Joe


-----
No virus found in this message.
Checked by AVG - www.avg.com
Version: 2015.0.5751 / Virus Database: 4306/9326 - Release Date: 03/17/15
--
http://www.piclist.com/techref/piclist PIC/SX FAQ & list archive
View/change your membership options at
http://mailman.mit.edu/mailman/listinfo/piclist
James Cameron
2015-03-17 22:59:48 UTC
Permalink
I'd also consider ESD to the card ... not as a solution, but as a
source of problems.
--
James Cameron
http://quozl.linux.org.au/
--
http://www.piclist.com/techref/piclist PIC/SX FAQ & list archive
View/change your membership options at
http://mailman.mit.edu/mailman/listinfo/piclist
IVP
2015-03-18 00:33:18 UTC
Permalink
Post by James Cameron
I'd also consider ESD to the card ... not as a solution, but
as a source of problems.
I think the card is fairly well connected in situ. All data and
comms lines have the recommended pull-ups, nothing is floating,
good power supply etc

It seems to me that the card failing twice after about the same
number of reads and then similar behaviour after that first
failure is a big indicator as to the probable quality of the card.

To be honest, my application may be a lot more demanding than
typical but performance does seem to fall short of expectations

Joe


-----
No virus found in this message.
Checked by AVG - www.avg.com
Version: 2015.0.5751 / Virus Database: 4306/9326 - Release Date: 03/17/15
--
http://www.piclist.com/techref/piclist PIC/SX FAQ & list archive
View/change your membership options at
http://mailman.mit.edu/mailman/listinfo/piclist
IVP
2015-03-18 21:41:41 UTC
Permalink
Post by Christopher Head
did you try plugging the card into a computer and reading the
suspect file? If it also fails, there’s your answer: the card is
broken or data is degrading
I did compare the originals with the card copies, using command
line fc. No differences reported. That might indicate the card
controller is the problem. The fault always does occur with that
particular file, only after many many reads, and normal operation
is restored after a re-format.

I could try moving the file to other blocks on the card, for
example by adding a dummy file to the directory.

Before that, with the card in this faulty condition, I'm going to
twiddle about with Frhed to see if pointing the controller at the
other file makes any difference. Not sure if that helps overall
but I'd like to know

As I mentioned, a card is the simplest and most convenient storage
medium and so of course I would prefer to use one, if I can get
to the bottom of this issue and find a fix. Maybe if I can get it going
in native 4-bit mode that could be more reliable than SPI

Joe



-----
No virus found in this message.
Checked by AVG - www.avg.com
Version: 2015.0.5751 / Virus Database: 4311/9333 - Release Date: 03/18/15
--
http://www.piclist.com/techref/piclist PIC/SX FAQ & list archive
View/change your membership options at
http://m
James Cameron
2015-03-18 22:01:03 UTC
Permalink
Post by IVP
Post by Christopher Head
did you try plugging the card into a computer and reading the
suspect file? If it also fails, there’s your answer: the card is
broken or data is degrading
I did compare the originals with the card copies, using command
line fc. No differences reported. That might indicate the card
controller is the problem. The fault always does occur with that
particular file, only after many many reads, and normal operation
is restored after a re-format.
I could try moving the file to other blocks on the card, for
example by adding a dummy file to the directory.
No, that won't make a difference.

The mapping from block numbers to physical cells is a decision made by
the FTL firmware, to support automatic wear levelling. There's no way
you can reliably control that decision.
Post by IVP
Before that, with the card in this faulty condition, I'm going to
twiddle about with Frhed to see if pointing the controller at the
other file makes any difference. Not sure if that helps overall
but I'd like to know
As I mentioned, a card is the simplest and most convenient storage
medium and so of course I would prefer to use one, if I can get
to the bottom of this issue and find a fix. Maybe if I can get it going
in native 4-bit mode that could be more reliable than SPI
It ain't that simple inside the card. ;-)
--
James Cameron
http://quozl.linux.org.au/
--
http://www.piclist.com/techref/piclist PIC/SX FAQ & list archive
View/change your membership options at
http://mailman.mit.edu/mailman/listinfo/p
IVP
2015-03-18 23:38:46 UTC
Permalink
Post by James Cameron
Post by IVP
I could try moving the file to other blocks on the card, for
example by adding a dummy file to the directory.
No, that won't make a difference.
What I was thinking was along these lines -

File1 starts at 0x00008518, length 0xE44A
File2 starts at 0x00008527, length 0xEA66

File2 is the one that's causing problems

Copy File1's start/length to the directory entry for File2, see what happens

Copy File2's start/length to the directory entry for File1, see what happens

I thought I could write a sector to the card using Frhed but it
seems it can't. I've another utility around here somewhere

Joe
John Gardner
2015-03-18 23:57:35 UTC
Permalink
James -

http://lists.laptop.org/pipermail/devel/2010-August/029684.html

Mr. Huang mentions this - Nearly 5 years ago, though. Any relevance

to what Joe is doing?
--
http://www.piclist.com/techref/piclist PIC/SX FAQ & list archive
View/change your membership options at
http://mailman.mit.edu/mailman/listinfo/piclist
James Cameron
2015-03-19 00:19:22 UTC
Permalink
Post by John Gardner
James -
http://lists.laptop.org/pipermail/devel/2010-August/029684.html
Mr. Huang mentions this - Nearly 5 years ago, though. Any relevance
to what Joe is doing?
A little bit relevant, yes.

Since that post, we switched to eMMC soldered to the boards, and
that's what we use now.

We still use wear levelling and product lifetime tests, where samples
are tested to destruction by reading and writing continually.

During test, we look at latency, and success of operation.

That reminds me to check the results for the most recent second source
eMMC we brought into the BOM. ;-)

I think John's comment about the industry rate of change still apply.
Devices are going end-of-life quicker than before.
--
James Cameron
http://quozl.linux.org.au/
--
http://www.piclist.com/techref/piclist PIC/SX FAQ & list archive
View/change your membership options at
http://mailman.mit.edu/mailman/listinfo/piclist
John Gardner
2015-03-19 00:40:34 UTC
Permalink
Very good - Thanks.
--
http://www.piclist.com/techref/piclist PIC/SX FAQ & list archive
View/change your membership options at
http://mailman.mit.edu/mailman/listinfo/piclist
IVP
2015-03-19 00:40:49 UTC
Permalink
Post by John Gardner
http://lists.laptop.org/pipermail/devel/2010-August/029684.html
Mr. Huang mentions this - Nearly 5 years ago, though. Any
relevance
I guess with shrinking cells and charges, threshholds and noise
becomes more significant

It's been reported on this more than once that PICs aren't quite
the same as they used to be, EEPROM issues for instance.

That's why I would prefer to get hold of 2GB and 4GB rather
than 16 - 64GB. They just seemed to be more robust. For what
I want I'd even go back to low-density MMC

Joe


-----
No virus found in this message.
Checked by AVG - www.avg.com
Version: 2015.0.5751 / Virus Database: 4311/9333 - Release Date: 03/18/15
--
http://www.piclist.com/techref/piclist PIC/SX FAQ & list archive
View/change your membership options at
http://mailman.mit.edu/mailman/listinfo/piclist
James Cameron
2015-03-19 00:00:09 UTC
Permalink
Post by IVP
Post by James Cameron
Post by IVP
I could try moving the file to other blocks on the card, for
example by adding a dummy file to the directory.
No, that won't make a difference.
What I was thinking was along these lines -
File1 starts at 0x00008518, length 0xE44A
File2 starts at 0x00008527, length 0xEA66
File2 is the one that's causing problems
Copy File1's start/length to the directory entry for File2, see what happens
Copy File2's start/length to the directory entry for File1, see what happens
I thought I could write a sector to the card using Frhed but it
seems it can't. I've another utility around here somewhere
Yes, I understand what you mean.

You can certainly do what you like with reads and writes using
commands sent to the card. Various utilities can be used. But that's
not my point.

My point is that the firmware in the card, the FTL, maintains a
dynamic mapping table between block numbers and physical flash cells.

block number #1 --> flash cell #4 in page #22921,
block number #2 --> flash cell #8 in page #9193,
block number #3 --> flash cell #2 in page #19944,
etc

There's also a read aging table:

flash page #1 --> read count 233,
...
flash page #9193 --> read count 2289,
...
flash page #19944 --> read count 72,
...

So by manipulating your reads and writes, the only thing you achieve
is to exercise the FTL's mapping and wear levelling algorithm in
slightly different ways.

It means if you get a result, you can't trust it.
--
James Cameron
http://quozl.linux.org.au/
--
http://www.piclist.com/techref/piclist PIC/SX FAQ & list archive
View/change your membership options at
http://mailman.mit.edu/mailman/listinfo/piclist
IVP
2015-03-19 00:35:20 UTC
Permalink
Post by James Cameron
It means if you get a result, you can't trust it
OK, fair enough

My natural curiosity just wants me looking for evidence
and symptoms. I try not to give up until I find out what
the underlying cause of the problem is. Maybe the card
is just poor-quality which might have died in someone's
camera and been binned with a shrug and a sigh

I suppose you could liken this to a bad pixel on an LCD.
There is no external electronic access to that pixel to fix it
or replace it with another

Joe


-----
No virus found in this message.
Checked by AVG - www.avg.com
Version: 2015.0.5751 / Virus Database: 4311/9333 - Release Date: 03/18/15
--
http://www.piclist.com/techref/piclist PIC/SX FAQ & list archive
View/change your membership options at
http://mailman.mit.edu/mailman/listinfo/piclist
John Gardner
2015-03-19 00:57:22 UTC
Permalink
...I'd even go back to low-density MMC

http://en.wikipedia.org/wiki/MultiMediaCard#eMMC

"Almost all mobile phones and tablets use this form of flash for main storage."

...
Post by IVP
Post by James Cameron
It means if you get a result, you can't trust it
OK, fair enough
My natural curiosity just wants me looking for evidence
and symptoms. I try not to give up until I find out what
the underlying cause of the problem is. Maybe the card
is just poor-quality which might have died in someone's
camera and been binned with a shrug and a sigh
I suppose you could liken this to a bad pixel on an LCD.
There is no external electronic access to that pixel to fix it
or replace it with another
Joe
-----
No virus found in this message.
Checked by AVG - www.avg.com
Version: 2015.0.5751 / Virus Database: 4311/9333 - Release Date: 03/18/15
--
http://www.piclist.com/techref/piclist PIC/SX FAQ & list archive
View/change your membership options at
http://mailman.mit.edu/mailman/listinfo/piclist
--
http://www.piclist.com/techref/piclist PIC/SX FAQ & list archive
View/change your membership options at
http://mailman.mit.edu/mailman/listinfo/piclist
IVP
2015-03-18 23:48:08 UTC
Permalink
Post by James Cameron
It ain't that simple inside the card. ;-)
Understood. It's unfortunate and frustrating for a tinkerer to
be shut out of an embedded system

Thanks

Joe


-----
No virus found in this message.
Checked by AVG - www.avg.com
Version: 2015.0.5751 / Virus Database: 4311/9333 - Release Date: 03/18/15
--
http://www.piclist.com/techref/piclist PIC/SX FAQ & list archive
View/change your membership options at
http://mailman.mit.edu/mailman/listinfo/piclist
Christopher Head
2015-03-19 16:54:00 UTC
Permalink
Post by James Cameron
Post by IVP
I could try moving the file to other blocks on the card, for
example by adding a dummy file to the directory.
No, that won't make a difference.
The mapping from block numbers to physical cells is a decision made by
the FTL firmware, to support automatic wear levelling. There's no way
you can reliably control that decision.
It may not make a difference, and it won’t reliably make a difference. It may, however, depending on availability of personal mileage, make a difference in this case.

I have heard that some (cheap) FTLs don’t remap all logical blocks across all available physical blocks, but rather have groups of logical and physical blocks with fixed mappings such that any logical block in a group can be mapped to any physical block in the same group, but not to a physical block in any other group. If you move the data far enough in logical space to put it in a new group, it will then be incapable of landing on the same physical block.

Of course, this only applies if your card happens to have an FTL that works that way. Good luck finding out if that’s the case :/
--
Christopher Head
--
http://www.piclist.com/techref/piclist PIC/SX FAQ & list archive
View/change your membership options at
IVP
2015-03-19 19:29:29 UTC
Permalink
Post by Christopher Head
If you move the data far enough in logical space to put it
in a new group, it will then be incapable of landing on the
same physical block
Perhaps this is all "re-arranging the deck chairs on The Titanic"

but

Would / could a partition do that ?

I need less than 1GB on an 8GB or bigger card

Joe



-----
No virus found in this message.
Checked by AVG - www.avg.com
Version: 2015.0.5751 / Virus Database: 4311/9339 - Release Date: 03/19/15
--
http://www.piclist.com/techref/piclist PIC/SX FAQ & list archive
View/change your membership options at
http://mailman.mit.edu/mailman/listinfo/piclist
Christopher Head
2015-03-19 20:08:19 UTC
Permalink
Post by IVP
Perhaps this is all "re-arranging the deck chairs on The Titanic"
It probably is.
Post by IVP
but
Would / could a partition do that ?
I need less than 1GB on an 8GB or bigger card
Yes it would, assuming your card has such an FTL, since it would move all the logical block numbers. Just create the partition a couple gigs in from the start of the card.
--
Christopher Head
--
http://www.piclist.com/techref/piclist PIC/SX FAQ & list archive
View/change your membership options at
http://mailman.mit.edu/mailman/listinfo/piclist
James Cameron
2015-03-19 20:14:19 UTC
Permalink
Post by IVP
Post by Christopher Head
If you move the data far enough in logical space to put it
in a new group, it will then be incapable of landing on the
same physical block
Perhaps this is all "re-arranging the deck chairs on The Titanic"
but
Would / could a partition do that ?
A partition would only change the logical block numbers used by the
operating system to write the file, exercising a specific value space
for the algorithm.

If the FTL is doing its job, writes to a smaller partition will still
be spread across the physical flash pages. That includes autonomous
writes done to handle read disturb.
Post by IVP
I need less than 1GB on an 8GB or bigger card
This is also a technique to increase longevity in an application,
because the writes are spread over a larger space, and the lifetime
is extended.

If you've already got lifetime issues, I'm not so sure.
--
James Cameron
http://quozl.linux.org.au/
--
http://www.piclist.com/techref/piclist PIC/SX FAQ & list archive
View/change your membership options at
http://mailman.mit.edu/mailman/listinfo/piclist
IVP
2015-03-19 21:05:34 UTC
Permalink
Christopher / James

I'll try a partition, just to see if it does make a difference to what
seems to be a fairly predictable card read error. Maybe I just
got unlucky with a dodgy pathway at the bottom end of the card

Also, I'm making up a 512kB SRAM board. A TC554001, 4040
and 18F2520. Originally for files but coincidentally the size of a
card data block. I will try some experiments to see if occassional
re-writes, rather than a format/reload, resets whatever is happening
in the controller.

If it doesn't, well I can still use it for the files and now I have a
board layout for a datalogger or whatever

Thanks

Joe


-----
No virus found in this message.
Checked by AVG - www.avg.com
Version: 2015.0.5751 / Virus Database: 4311/9339 - Release Date: 03/19/15
--
http://www.piclist.com/techref/piclist PIC/SX FAQ & list archive
View/change your membership options at
http://mailman.mit.edu/mailman/listinfo/piclist
James Cameron
2015-03-19 21:19:35 UTC
Permalink
Post by James Cameron
This is also a technique to increase longevity in an application,
because the writes are spread over a larger space, and the lifetime
is extended.
That reminds me.

How, and are you, powering down the card after an operation?

We found in our XO-1.5 model that the FTL would remain active for some
time after a command. The FTL is probably preparing erase blocks for
the next write, or moved recently read data to erased blocks. It
probably has power fail detection.

If we pulled the supply from it too soon, lifetime was affected. Some
cards would permanently stop responding. Became a yield issue.

If we pulled the supply too slowly, lifetime was affected.

Fixes were to add a few second delay on power down, and add a
discharge circuit ...

http://wiki.laptop.org/images/a/ad/XO_4_Schematics.pdf (page 34, PQ18,
PQ41, PQ61.

... and I'm sure I've mentioned my enjoyment of that issue here
before. ;-)
--
James Cameron
http://quozl.linux.org.au/
--
http://www.piclist.com/techref/piclist PIC/SX FAQ & list archive
View/change your membership options at
http://mailman.mit.edu/mailman/listinfo/piclist
IVP
2015-03-19 22:53:20 UTC
Permalink
Post by James Cameron
How, and are you, powering down the card after an operation?
Power is continuous. The only non-data activity is when the PIC
sets CS for ~700us during the " > 74 clock cycles " re-init after
a data time-out.

Roughly the timing per second is 300ms to read/process data,
700ms until next read request, all with CS low apart from the
700us high, as above

The only power-downs have been when I see that the read has
failed sometime during the night. Which, of course, is the actual
problem. "something" happens in the controller after thousands
of reads to make further reads flakey, but it's not a fatal physical
"something" as it can be cured. I hope today to find out what
the minimum cure is, preferably less than a re-format / reload

I don't recall any instruction or recommendation to have CS high
when data isn't needed. In the pdfs I have there's very little about
CS.

I'll re-read them to see if I missed anything in the SD mode
section that applies to but isn't included in the SPI mode section.
Perhaps there's a counter or register that needs periodic reseting
or refreshing

Joe


-----
No virus found in this message.
Checked by AVG - www.avg.com
Version: 2015.0.5751 / Virus Database: 4311/9339 - Release Date: 03/19/15
--
http://www.piclist.com/techref/piclist PIC/SX FAQ & list archive
View/change your membership options at
http://mailman.mit.edu/mailman/listinfo/piclist
James Cameron
2015-03-20 00:59:40 UTC
Permalink
Post by IVP
Post by James Cameron
How, and are you, powering down the card after an operation?
Power is continuous.
Thanks.
Post by IVP
The only non-data activity is when the PIC sets CS for ~700us
during the " > 74 clock cycles " re-init after a data time-out.
Do you gate the clock at all?
Post by IVP
Roughly the timing per second is 300ms to read/process data,
700ms until next read request, all with CS low apart from the
700us high, as above
The only power-downs have been when I see that the read has
failed sometime during the night. Which, of course, is the actual
problem. "something" happens in the controller after thousands
of reads to make further reads flakey, but it's not a fatal physical
"something" as it can be cured. I hope today to find out what
the minimum cure is, preferably less than a re-format / reload
I don't recall any instruction or recommendation to have CS high
when data isn't needed. In the pdfs I have there's very little about
CS.
Yes, most customers use SD, so the datasheets have leaned toward it.
Post by IVP
I'll re-read them to see if I missed anything in the SD mode
section that applies to but isn't included in the SPI mode section.
Perhaps there's a counter or register that needs periodic reseting
or refreshing
Or perhaps the card needs an external clock source for executing the
code in the firmware, and if you are starving it of a clock it is
never getting a chance to do necessary garbage collection after a
command.

Perhaps you'd be able to prove that by scoping the current drawn by
the card after a read command with extra clocks provided. You might
need to scope it for thousands of reads before it needed to cleanup
though.
--
James Cameron
http://quozl.linux.org.au/
--
http://www.piclist.com/techref/piclist PIC/SX FAQ & list archive
View/change your membership options at
http://mailman.mit.edu/mailman/listinfo/piclist
IVP
2015-03-20 01:30:19 UTC
Permalink
Post by James Cameron
Post by IVP
The only non-data activity is when the PIC sets CS for ~700us
during the " > 74 clock cycles " re-init after a data time-out.
Do you gate the clock at all?
10 bytes of 0xFF are sent, with clock, from the PIC's SPI module
Post by James Cameron
Or perhaps the card needs an external clock source for executing
the code in the firmware, and if you are starving it of a clock it is
never getting a chance to do necessary garbage collection after a
command.
Hmmm, there's a thought. As SPI is in h/w it takes only a couple
of instructions to send 0xFF. After the block has been read I can
send plenty of dummy bytes + clock without impacting on the s/w
which follows

I'll do that now and see how it's doing in the morning. Currently
there's just one dummy byte, sent before each CMD17. ISTR
that is suggested in a pdf
Post by James Cameron
Perhaps you'd be able to prove that by scoping the current drawn
by the card after a read command with extra clocks provided. You
might need to scope it for thousands of reads before it needed to
cleanup though.
So you think a current change should be detectable when it goes
into a proper idle state ?

Joe


-----
No virus found in this message.
Checked by AVG - www.avg.com
Version: 2015.0.5751 / Virus Database: 4311/9339 - Release Date: 03/19/15
--
http://www.piclist.com/techref/piclist PIC/SX FAQ & list archive
View/change your membership options at
http://mailman.mit.edu/mailman/listinfo/piclist
James Cameron
2015-03-20 02:01:08 UTC
Permalink
Post by IVP
Post by James Cameron
Post by IVP
The only non-data activity is when the PIC sets CS for ~700us
during the " > 74 clock cycles " re-init after a data time-out.
Do you gate the clock at all?
10 bytes of 0xFF are sent, with clock, from the PIC's SPI module
Post by James Cameron
Or perhaps the card needs an external clock source for executing
the code in the firmware, and if you are starving it of a clock it is
never getting a chance to do necessary garbage collection after a
command.
Hmmm, there's a thought. As SPI is in h/w it takes only a couple
of instructions to send 0xFF. After the block has been read I can
send plenty of dummy bytes + clock without impacting on the s/w
which follows
I'll do that now and see how it's doing in the morning. Currently
there's just one dummy byte, sent before each CMD17. ISTR
that is suggested in a pdf
Good, let us know how it goes.
Post by IVP
Post by James Cameron
Perhaps you'd be able to prove that by scoping the current drawn
by the card after a read command with extra clocks provided. You
might need to scope it for thousands of reads before it needed to
cleanup though.
So you think a current change should be detectable when it goes
into a proper idle state ?
Yes. And depending on your decoupling capacitors nearby, you may see
the noise induced on the supply by the FTL CPU.

I've had a quick look at the SD Specifications, Part 1, Physical Layer
Simplified Specification Version 3.01, May 2010.

In this document, "programming" means the erasing or writing to flash
pages.

In the "SPI" section, it says if the card is busy programming, and you
reset CS, it will release DataOut line (float it) and continue with
programming. Do you check for that?

The specification goes on to say that if you force a reset, it will
terminate programming, and may destroy data. As you know your card
has destroyed data, it may be worth going back to make sure you don't
force a reset while it is programming.

Also, in the "clock control" section, it points out that a card is
meant to complete programming even if the host stops the clock, so
that goes against my earlier theory.

Looking at the state diagrams, CMD17 (read block) isn't meant to go
into programming state. This implies that read disturb damage would
accumulate and be handled on the next write. If your application
never does a write, damage would never be handled.

So you might also fix it by doing a write now and then. ;-)
--
James Cameron
http://quozl.linux.org.au/
--
http://www.piclist.com/techref/piclist PIC/SX FAQ & list archive
View/change your membership options at
http://mailman.mit.edu/mailman/listinfo/piclist
IVP
2015-03-20 03:47:15 UTC
Permalink
Post by James Cameron
So you might also fix it by doing a write now and then. ;-)
My application is intended to be read-only, but I can add
non-user writing. Having a block-size SRAM may be more
helpful than I thought

On my to-do list was to write the file back to the card, as I
suspected it may have been corrupted. After comparing the
"unplayable" with the original though it seems they are the same

As I mentioned before, I was going to look for the simplest
solution, so have just written the file to the card and replacing
what was there, with your suggestion of some sort of garbage
collection or unfinished business in mind

That may have been successful

When I powered the card up before this, it would run for
just one or two reads. After writing the file back to the disk
it does appear to be working again, but I don't know for how
long. The test would be to let it run overnight and see if the
reads stop again.

I have one little glitch to correct in the dsPIC s/w. I didn't take
into account the high word of the cluster number in the directory
entry. Re-saving the files has changed their starts from ~ 0x8500
to ~ 0x001d0060, so my s/w retrieves data from an incorrect
0x0060 pointer. It does fetch the correct amount though. At the
correct sector number is indeed the RIFF header for the file

So I'll fix that and do the test

Joe


-----
No virus found in this message.
Checked by AVG - www.avg.com
Version: 2015.0.5751 / Virus Database: 4311/9339 - Release Date: 03/19/15
--
http://www.piclist.com/techref/piclist PIC/SX FAQ & list archive
View/change your membership options at
http://mailman.mit.edu/mailman/listinfo/piclist
James Cameron
2015-03-20 05:43:56 UTC
Permalink
Post by IVP
Post by James Cameron
So you might also fix it by doing a write now and then. ;-)
My application is intended to be read-only, but I can add
non-user writing. Having a block-size SRAM may be more
helpful than I thought
On my to-do list was to write the file back to the card, as I
suspected it may have been corrupted. After comparing the
"unplayable" with the original though it seems they are the same
Imagine this;

0. normal operation, with read counters for flash pages incrementing
each time you read the blocks, until;

1. the read counter for a flash page hits the read disturb limit,
then;

2. the FTL knows that to read the block it will begin to degrade, and
so refuses the read command.

If the unplayability via SPI is because the flash pages are due for
relocation to avoid read disturb, when you mount the card on a system
via SDIO the first write for filesystem metadata will give the FTL
the right to do that relocation.

So another test is to wait for the problem to happen again, then load
the card into a computer and create a tiny file unrelated to the
application, then see if that fixes the problem for the moment.
Post by IVP
As I mentioned before, I was going to look for the simplest
solution, so have just written the file to the card and replacing
what was there, with your suggestion of some sort of garbage
collection or unfinished business in mind
That may have been successful
When I powered the card up before this, it would run for
just one or two reads. After writing the file back to the disk
it does appear to be working again, but I don't know for how
long. The test would be to let it run overnight and see if the
reads stop again.
I have one little glitch to correct in the dsPIC s/w. I didn't take
into account the high word of the cluster number in the directory
entry. Re-saving the files has changed their starts from ~ 0x8500
to ~ 0x001d0060, so my s/w retrieves data from an incorrect
0x0060 pointer. It does fetch the correct amount though. At the
correct sector number is indeed the RIFF header for the file
So I'll fix that and do the test
Joe
--
James Cameron
http://quozl.linux.org.au/
--
http://www.piclist.com/techref/piclist PIC/SX FAQ & list archive
View/change your membership options at
http://mailman.mit.edu/mailman/listinfo/piclist
IVP
2015-03-20 07:04:09 UTC
Permalink
Post by James Cameron
So another test is to wait for the problem to happen again, then
load the card into a computer and create a tiny file unrelated to
the application, then see if that fixes the problem for the moment
Forcing the FLT to do some housekeeping ? Good idea

I'll get back to you in a day or two, will take some time to test

Thanks

Joe


-----
No virus found in this message.
Checked by AVG - www.avg.com
Version: 2015.0.5751 / Virus Database: 4311/9342 - Release Date: 03/20/15
--
http://www.piclist.com/techref/piclist PIC/SX FAQ & list archive
View/change your membership options at
http://mailman.mit.edu/mailman/listinfo/piclist
IVP
2015-03-21 23:20:00 UTC
Permalink
Post by James Cameron
So another test is to wait for the problem to happen again,
then load the card into a computer and create a tiny file
unrelated to the application, then see if that fixes the problem
for the moment.
I started a test at 9:15pm and expected the card to fail about
18 hours later, as it had done twice before.

18 hours came and went. So did 24. And 30.

I've never been so disappointed to see something working
properly !!!

Then finally ......

I re-powered the circuit a few times and it stopped reading after
a short while each time, so that's still the same

However, I did notice an occassional click in the sound and on
the scope saw that the dsPIC DAC seems to do a full excursion
to make the click, as though there's a byte of 0xFFFF data. The
proper signal is ~ 0.1Vp-p (centred around mid-point data of
0x8000), the click is ~0.8Vp-p, or 0.4V above and below

Uploading the card file to the PC and comparing with the original
showed no difference (I verified that fc works by comparing to
another file. Mismatches galore). The card controller must be the
source of the bad data. Not every instance of playing that file and
sometimes the click is noticeably longer (although still a fraction
of a second), maybe a small few 0xFFFF bytes

Unfortunately because the card stopped working during the night
I didn't get to hear whether the click appeared before it failed. That
was partly the point of trying to get the failure to happen when I
was at the desk mid-afternoon. If I actually witnessed the failure
it might have been possible to determine if it was a 32768 count
or something like that. The card going for nearly twice as long
has blown that theory

So then I wrote another small file to the card. This doesn't seem
to have made a difference. The card still fails shortly after power-
up and the sound still occassionally has a click

Back to the drawing board

Joe


-----
No virus found in this message.
Checked by AVG - www.avg.com
Version: 2015.0.5751 / Virus Database: 4311/9353 - Release Date: 03/21/15
--
http://www.piclist.com/techref/piclist PIC/SX FAQ & list archive
View/change your membership options at
http://mailman.mit.edu/mailman/listinfo/piclist
Christopher Head
2015-03-26 04:30:02 UTC
Permalink
On Sun, 22 Mar 2015 12:20:00 +1300
Post by IVP
However, I did notice an occassional click in the sound and on
the scope saw that the dsPIC DAC seems to do a full excursion
to make the click, as though there's a byte of 0xFFFF data. The
proper signal is ~ 0.1Vp-p (centred around mid-point data of
0x8000), the click is ~0.8Vp-p, or 0.4V above and below
Uploading the card file to the PC and comparing with the original
showed no difference (I verified that fc works by comparing to
another file. Mismatches galore). The card controller must be the
source of the bad data. Not every instance of playing that file and
sometimes the click is noticeably longer (although still a fraction
of a second), maybe a small few 0xFFFF bytes
A few more questions to ponder.

After issuing a read block command, are you parsing the R1 response and
checking that every bit in the response is as expected?

After issuing a read block command, you are checking for the Start
Block Token first byte marker 0xFE, right? Not just expecting the data
to start immediately? I believe it’s possible for there to be some
empty space (which appears as 0xFF) between the R1 and the first byte of
the Start Block Token (which appears as 0xFE).

Are you checking for the difference between a Start Block Token and a
Data Error Token? If you ever see the latter, what error flags is it
showing?

After the data part, are you reading the CRC16 at the end of the token?
If so, are you checking it? If so, when you get the bad data out of the
card, is the CRC16 correct? If it’s correct, then the card must be
sending bad data to start with. If it’s wrong, then the data is being
corrupted on the SPI bus, and the card is (probably) innocent.
--
Christopher Head
IVP
2015-03-26 06:22:22 UTC
Permalink
Post by Christopher Head
After issuing a read block command, are you parsing the R1
response and checking that every bit in the response is as expected?
After issuing a read block command, you are checking for the Start
Block Token first byte marker 0xFE, right?
Yes. Typically, this may take somewhere under 1ms. So the CMD17
is sent and then dummy 0xFFs (in my s/w about 1.5us apart) until the
0xFE is seen on DO. Then 514 bytes are picked up
Post by Christopher Head
(which appears as 0xFF) between the R1 and the first byte of the
Start Block Token (which appears as 0xFE)
That's what I generally see on DO. A constant high (FF) until 0xFE
Post by Christopher Head
Are you checking for the difference between a Start Block Token
and a Data Error Token? If you ever see the latter, what error flags
is it showing?
Because of the unpredictable nature of the failure I'm going to have
to write some s/w to trigger the analyser at the right time. It needs to
be capturing at 20-50MHz so I can clearly see bit values. At those
speeds only 25-10ms can be viewed. Fortunately I can make it a
retrospective capture, ie the trigger point in the display is way over
on the right of the screen and the display to the left shows what
happened before the trigger. This probably means writing some sort
of time-out detector which is reset on 0xFE. I'd like to see what
happens immediately before it or the dsPIC gets lost. It might not
be relevant but at least I'll know
Post by Christopher Head
After the data part, are you reading the CRC16 at the end of the token?
No, I just pick it up as two last bytes. Apart from during the
initialisation I don't use the CRC at all. I know the polynomial and
could, but don't. As CRC is at the very end of the read, of course
the preceding 512 data bytes have already been processed by the
dsPIC. If the sudden appearance of an intermittent click hadn't
happened in the last test I'd have said the correctness of the CRC
was a matter of interest rather than importance. Now I'll have to
consider it, perhaps not as a cause but a symptom
Post by Christopher Head
If so, are you checking it? If so, when you get the bad data out of
the card, is the CRC16 correct? If it’s correct, then the card must
be sending bad data to start with. If it’s wrong, then the data is
being corrupted on the SPI bus, and the card is (probably) innocent.
It's not impossible that the card file or FTL is not the source of the
problem, but I just get the feeling that one of them is. For example,
with the dsPIC powered down for some hours you'd think that any
f/w problem in the dsPIC would have resolved itself on power-up,
but the problem persists

As you can imagine, this is an unexpected hiccup and there are a
lot of commands, timings and data to examine and fiddle with.

I'd put it side for a couple of days so I don't get behind with other
parts of the project. I've also posted the question to an SD forum
which has industry representatives. No replies yet

Hopefully today or tomorrow I'll be able to capture the moment

Thanks

Joe



-----
No virus found in this message.
Checked by AVG - www.avg.com
Version: 2015.0.5751 / Virus Database: 4315/9381 - Release Date: 03/25/15
--
http://www.piclist.com/techref/piclist PIC/SX FAQ & list archive
View/change your membership options at
http://mailman.mit.ed
Christopher Head
2015-03-26 16:18:18 UTC
Permalink
Post by IVP
It's not impossible that the card file or FTL is not the source of the
problem, but I just get the feeling that one of them is. For example,
with the dsPIC powered down for some hours you'd think that any
f/w problem in the dsPIC would have resolved itself on power-up,
but the problem persists
I just can’t get over the fact that the computer can read the file intact after the problem happens. That means that the data itself is still in perfect condition on the card—there’s simply no possible argument against that fact.

To me, it seems more likely that perhaps the card’s behaviour—maybe the timing of some things—is changing in a way that is still within spec but perhaps violates a brittle assumption in your code somewhere. I can’t see a card outright *failing* to deliver data simply because it’s speaking SPI, when the data itself is OK. Maybe, but to me it seems improbable.
--
Christopher Head
--
http://www.piclist.com/techref/piclist PIC/SX FAQ & list archive
View/change your membership options at
http://mailman.mit.edu/mail
IVP
2015-03-26 19:34:26 UTC
Permalink
Post by Christopher Head
I just can’t get over the fact that the computer can read the file
intact after the problem happens. That means that the data itself
is still in perfect condition on the card—there’s simply no
possible argument against that fact.
I know. There's also the fact that when it's in "the condition" you
can cycle the power and it'll behave for up to several minutes
Post by Christopher Head
To me, it seems more likely that perhaps the card’s behaviour -
maybe the timing of some things - is changing in a way that is
still within spec but perhaps violates a brittle assumption in your
code somewhere. I can’t see a card outright *failing* to deliver
data simply because it’s speaking SPI, when the data itself is OK.
Maybe, but to me it seems improbable.
Reading the card is a pretty simple affair and I can't imagine what
could drift, after such a long time and then for a subsequently shorter
time. Almost certain it's not the PIC because formatting the card
resets the problem without touching anything else. The PIC does
supply the timing via the SPI clock, which I think is about all the
influence I can have from the outside.

I'm running a test now after some changes, and waiting for it to
break down so I can start gathering evidence. I'm not blinkered
to the PIC being at fault - had my share of time-wasting silicon
issues, including dodgy SPI flags on this very device - but it
does seem to be the bystander in this case

Won't give up on it because this affects the viability of quite a few
projects. Maybe this card is symptomatic of a general problem
for large cards. As the pdf I posted a couple of days ago says,
as more and more billions of transistors are being put onto a
smaller and smaller piece of silicon the performance and reliability
is going to suffer

Does the average consumer *really* need a 256GB card ? That's
an awful lot of lost selfies and mp3

Joe



-----
No virus found in this message.
Checked by AVG - www.avg.com
Version: 2015.0.5751 / Virus Database: 4315/9388 - Release Date: 03/26/15
--
http://www.piclist.com/techref/piclist PIC/SX FAQ & list archive
View/change your membership options at
http://mailman.mit.ed
Richard Prosser
2015-03-26 20:02:35 UTC
Permalink
Silly question Joe,

Have you tried heating / cooling the card and changing the supply voltage
to the limits?

RP
Post by IVP
I just can't get over the fact that the computer can read the file
intact after the problem happens. That means that the data itself
is still in perfect condition on the card--there's simply no
possible argument against that fact.
I know. There's also the fact that when it's in "the condition" you
can cycle the power and it'll behave for up to several minutes
To me, it seems more likely that perhaps the card's behaviour -
maybe the timing of some things - is changing in a way that is
still within spec but perhaps violates a brittle assumption in your
code somewhere. I can't see a card outright *failing* to deliver
data simply because it's speaking SPI, when the data itself is OK.
Maybe, but to me it seems improbable.
Reading the card is a pretty simple affair and I can't imagine what
could drift, after such a long time and then for a subsequently shorter
time. Almost certain it's not the PIC because formatting the card
resets the problem without touching anything else. The PIC does
supply the timing via the SPI clock, which I think is about all the
influence I can have from the outside.
I'm running a test now after some changes, and waiting for it to
break down so I can start gathering evidence. I'm not blinkered
to the PIC being at fault - had my share of time-wasting silicon
issues, including dodgy SPI flags on this very device - but it
does seem to be the bystander in this case
Won't give up on it because this affects the viability of quite a few
projects. Maybe this card is symptomatic of a general problem
for large cards. As the pdf I posted a couple of days ago says,
as more and more billions of transistors are being put onto a
smaller and smaller piece of silicon the performance and reliability
is going to suffer
Does the average consumer *really* need a 256GB card ? That's
an awful lot of lost selfies and mp3
Joe
-----
No virus found in this message.
Checked by AVG - www.avg.com
Version: 2015.0.5751 / Virus Database: 4315/9388 - Release Date: 03/26/15
--
http://www.piclist.com/techref/piclist PIC/SX FAQ & list archive
View/change your membership options at
http://mailman.mit.edu/mailman/listinfo/piclist
--
http://www.piclist.com/techref/piclist PIC/SX FAQ & list archive
View/change your membership options at
http://mailman.mit.edu/mailman/listinfo/piclist
IVP
2015-03-26 20:36:16 UTC
Permalink
Post by Richard Prosser
Have you tried heating / cooling the card and changing the supply
voltage to the limits?
No, I haven't. I can try things like that when it fails. Changing the
card supply is going to mean adding an interface as I think 1.8V
won't register as a High even if PIC Vcc is as low as 3.0V. d/s
says 0.8 * Vcc

I believe I'm within speed spec @ 3V3 (which does tend to be
for the lower speed range) but I'll be double-checking the card's
mode registers to confirm that.

As I mentioned to Christopher though, you'd think a power cycle
would restore the system to another 18+ hours of operation, not
just a few minutes, if this was a signal or synch problem

Joe


-----
No virus found in this message.
Checked by AVG - www.avg.com
Version: 2015.0.5751 / Virus Database: 4315/9388 - Release Date: 03/26/15
--
http://www.piclist.com/techref/piclist PIC/SX FAQ & list archive
View/change your membership options at
http://mailman.mit.edu/mailman/listinfo/piclist
IVP
2015-03-22 23:00:42 UTC
Permalink
I came across this at the w/e

http://www.sandisk.com/assets/docs/WP001_Flash_Management_Final_FINAL.pdf

Not very encouraging

And then I started wondering about the PIC's Flash longevity

There's even mention of read disturb in SRAMs

Is anything in my project going to last as long as I thought
it would ?

Do I need to be digging out the EPROMs ?

Joe


-----
No virus found in this message.
Checked by AVG - www.avg.com
Version: 2015.0.5751 / Virus Database: 4311/9359 - Release Date: 03/22/15
--
http://www.piclist.com/techref/piclist PIC/SX FAQ & list archive
View/change your membership options at
http://mailman.mit.edu/mailman/listinfo/piclist
IVP
2015-03-29 02:42:43 UTC
Permalink
Hi all,

Well, the little dickens kept me waiting for 54 hours this time, but
fail it did

This time I was set up to catch it fail, repeatedly

Here are a couple of screen shots

http://home.clear.net.nz/pages/joecolquitt/sdhc.html

The analyser labels are

busy - dsPIC has received file number (fn data, fn clk), and holds
this line up until the file has finished, as a signal to the 18F

clk - SPI clock (picture resolution to low to see bunches of 8)

di / do - data I/O. To the right, at the start of each of the three blocks,
you'll see the CMD17 go out, a short response from the card, a wait
(high with 0xFF) for the 0xFE token and then card data coming out

fe det - shows the detection of the 0xFE token of the first block
of the file

block swap - shows the swapping of two RAM blocks, one is full
of data to be processed, the other is being loaded

cs - card /CS

The first picture is of a good read. It shows the file number being
received, '>74 clocks with CS high', a few preparatory CMD and
then the block reading. fe det is the analyser trigger (red line).

The second picture is of a fail. The file number is received, busy
goes high, > 74 clocks, then a whole lot of nothing except clk

I used the time from fn clk high to fe det as a 16ms timeout period
for a another PIC to trigger the analyser. IOW if there was no
activity for 16ms then the analyser would be triggered. Red line is
out of shot to the right

The first thing I tried was a couple of other dsPICs, but that made
no difference, the card repeatedly failed again within a minute, and
with that bad data (?) 'click' noise too, so I *think* I can rule out
the dsPIC's SPI module. I also re-programmed the original dsPIC
but same result

Hmmm. Now what ? ................

The time to failure has me stumped now. It was at first about 18
hours (60,000+ accesses), then 30 hours (100,000+) and this
latest one, 54 hours (190,000+)

I'm going to have to get some more cards and circuits running
and resolve myself to a lot of waiting

Joe


-----
No virus found in this message.
Checked by AVG - www.avg.com
Version: 2015.0.5751 / Virus Database: 4315/9404 - Release Date: 03/28/15
--
http://www.piclist.com/techref/piclist PIC/SX FAQ & list archive
View/change your membership options at
http://mailman.mit.edu/mailman/listinfo/piclist
James Cameron
2015-03-29 04:35:42 UTC
Permalink
Post by IVP
Hi all,
Well, the little dickens kept me waiting for 54 hours this time, but
fail it did
This time I was set up to catch it fail, repeatedly
Here are a couple of screen shots
http://home.clear.net.nz/pages/joecolquitt/sdhc.html
That's very clear. In the failure case, why didn't "do" go low during
"cs" high? Are you properly detecting and handling that possibility?
--
James Cameron
http://quozl.linux.org.au/
--
http://www.piclist.com/techref/piclist PIC/SX FAQ & list archive
View/change your membership options at
http://mailman.mit.edu/mailman/listinfo/piclist
James Cameron
2015-03-29 05:14:44 UTC
Permalink
Post by James Cameron
Post by IVP
http://home.clear.net.nz/pages/joecolquitt/sdhc.html
That's very clear. In the failure case, why didn't "do" go low
during "cs" high? Are you properly detecting and handling that
possibility?
Also, as I wrote on 20th March, when DO floats when you assert CS, it
means the card is busy programming.

The card may intend to begin programming because of a read disturb
counter reaching zero.

The specification doesn't tell us exactly what to do then, but my
guess is to keep clocking and try to assert CS a few tens of
milliseconds later, or whatever the programming time is.

Conjecture; eventually programming will complete and DO should be
pulled down in response to CS.

You might also measure the programming time for future reference.

Conjecture; since you aren't trying CS again, you aren't facilitating
the read disturb programming, and it never completes, which is why the
card seems to become "infected".

Conjecture; the delay before failure is unpredictable, because it will
depend on mapping from virtual block to flash page, and the read
disturb counters on that page and adjacent pages.

--

And a different observation; in the failure case you are raising CS,
then a delay before DI and CLK rise. The delay is about the size of a
clock period.

Try changing your dsPIC code to raise DI and CLK at the same time as
CS.

e.g. are you handling a time consuming interrupt at this point;
perhaps you could mask interrupts?
--
James Cameron
http://quozl.linux.org.au/
--
http://www.piclist.com/techref/piclist PIC/SX FAQ & list archive
View/change your membership options at
http://mailman.mit.edu/mailman/listinfo/piclist
IVP
2015-03-29 06:37:50 UTC
Permalink
Other comments read and being thought about
Post by James Cameron
And a different observation; in the failure case you are raising
CS, then a delay before DI and CLK rise. The delay is about
the size of a clock period
Time from CS = 1 to DI = 1 is 100us. That is, the delay between
CS raised and the first write of 0xFF to the SPI module

Time from DI = 1 to first clock L-H is 140ns. Clock period is 90ns
Post by James Cameron
Try changing your dsPIC code to raise DI and CLK at the same
time as CS.
Yes, I can try that. The past couple of posts concerns a simple
routine that can be twiddled with. It does at the moment include
a deliberate 100us delay. When the circuit is running properly the
next routine would issue the CMD0. As the card is currently in the
fault condition and fails in a few seconds to a few minutes, it
hopefully will not be too time-consuming to see what, if anything,
has an effect
Post by James Cameron
e.g. are you handling a time consuming interrupt at this point;
perhaps you could mask interrupts?
Interrupts (TMR1) are off - actually not needed - whilst comms to
the card/file are being established. The timer is used to extract data
from the buffer RAM and deliver it to the DAC at the required audio
sample rate

Joe

PS congratulations on winning the Cricket World Cup. OK, NZ
hasn't even finished batting yet but, c'mon, 167-7 ?


-----
No virus found in this message.
Checked by AVG - www.avg.com
Version: 2015.0.5751 / Virus Database: 4315/9404 - Release Date: 03/28/15
--
http://www.piclist.com/techref/piclist PIC/SX FAQ & list archive
View/change your membership options at
http://mailman.mit.edu/mailman/listinfo/piclist
IVP
2015-03-29 06:08:33 UTC
Permalink
Post by James Cameron
Post by IVP
http://home.clear.net.nz/pages/joecolquitt/sdhc.html
That's very clear. In the failure case, why didn't "do" go low
during "cs" high? Are you properly detecting and handling that
possibility?
No. In my defence, M'lud, you see something that apparently
works robustly so you aren't thinking of failure detection

The attached is a zoom-in of when it goes right

"when DO floats when you assert CS, it means the card is busy
programming"

If DO is actually floating as hi-Z, and the dsPIC input is also
floating (as an input), it sounds like I should at least add a
pull-down, if only to eliminate a possibility

Thanks

Joe
James Cameron
2015-03-29 06:34:06 UTC
Permalink
Post by IVP
Post by James Cameron
Post by IVP
http://home.clear.net.nz/pages/joecolquitt/sdhc.html
That's very clear. In the failure case, why didn't "do" go low
during "cs" high? Are you properly detecting and handling that
possibility?
No. In my defence, M'lud, you see something that apparently
works robustly so you aren't thinking of failure detection
The attached is a zoom-in of when it goes right
"when DO floats when you assert CS, it means the card is busy
programming"
If DO is actually floating as hi-Z, and the dsPIC input is also
floating (as an input), it sounds like I should at least add a
pull-down, if only to eliminate a possibility
You need a way to detect when DO is floating, rather than hide it.

When floating, the signal will stay at the same level for some time.
It will swing if it is driven or has a resistor. Does your dsPIC have
software switchable pull-down resistors?

Detecting a missing fall of DO is probably easier. You can see in
your zoom-in when it should be.

When your code detects this, delay, then go back and try CS again.

If this delay will become critical for your application, you might
move it to after playback finishes; e.g. select the card but don't
give it a command.

That way, a read disturb reprogramming can be done at a time that
isn't critical.
--
James Cameron
http://quozl.linux.org.au/
--
http://www.piclist.com/techref/piclist PIC/SX FAQ & list archive
View/change your membership options at
http://mailman.mit.edu/mailman/listinfo/piclist
IVP
2015-03-29 07:38:28 UTC
Permalink
Post by James Cameron
You need a way to detect when DO is floating, rather than
hide it.
When floating, the signal will stay at the same level for some
time. It will swing if it is driven or has a resistor. Does your
dsPIC have software switchable pull-down resistors?
Having another look at the PCB to refresh my memory, DO
has, and always has had, a 47k pull-up to Vcard. ISTR the
actual recommended value is 50k

Joe


-----
No virus found in this message.
Checked by AVG - www.avg.com
Version: 2015.0.5751 / Virus Database: 4315/9406 - Release Date: 03/29/15
--
http://www.piclist.com/techref/piclist PIC/SX FAQ & list archive
View/change your membership options at
http://mailman.mit.edu/mailman/listinfo/piclist
James Cameron
2015-03-29 09:29:11 UTC
Permalink
Post by IVP
Post by James Cameron
You need a way to detect when DO is floating, rather than
hide it.
When floating, the signal will stay at the same level for some
time. It will swing if it is driven or has a resistor. Does your
dsPIC have software switchable pull-down resistors?
Having another look at the PCB to refresh my memory, DO
has, and always has had, a 47k pull-up to Vcard. ISTR the
actual recommended value is 50k
Okay, so code for missing DO low at the expected time.
--
James Cameron
http://quozl.linux.org.au/
--
http://www.piclist.com/techref/piclist PIC/SX FAQ & list archive
View/change your membership options at
http://mailman.mit.edu/mailman/listinfo/piclist
IVP
2015-03-29 11:31:51 UTC
Permalink
Post by James Cameron
Post by IVP
Having another look at the PCB to refresh my memory, DO
has, and always has had, a 47k pull-up to Vcard. ISTR the
actual recommended value is 50k
Okay, so code for missing DO low at the expected time.
Hopefully I can put tomorrow aside to pull this apart and put it
back together with tests like that. At least I've got something to
look at now

I'm hoping the mysterious occassional click will get fixed too

Joe


-----
No virus found in this message.
Checked by AVG - www.avg.com
Version: 2015.0.5751 / Virus Database: 4315/9406 - Release Date: 03/29/15
--
http://www.piclist.com/techref/piclist PIC/SX FAQ & list archive
View/change your membership options at
http://mailman.mit.edu/mailman/listinfo/piclist
James Cameron
2015-03-29 19:54:07 UTC
Permalink
Post by IVP
Post by James Cameron
Post by IVP
Having another look at the PCB to refresh my memory, DO
has, and always has had, a 47k pull-up to Vcard. ISTR the
actual recommended value is 50k
Okay, so code for missing DO low at the expected time.
Hopefully I can put tomorrow aside to pull this apart and put it
back together with tests like that. At least I've got something to
look at now
Aye, there's nothing like a good scope trace to apply skeptical
physicist brain to. It can bring up questions that are rarely thought
of when coding.
Post by IVP
I'm hoping the mysterious occassional click will get fixed too
I hope not. Can't stand not knowing the cause of problems. ;-)
--
James Cameron
http://quozl.linux.org.au/
--
http://www.piclist.com/techref/piclist PIC/SX FAQ & list archive
View/change your membership options at
http://mailman.mit.edu/mailman/listinfo/piclist
IVP
2015-03-30 03:42:56 UTC
Permalink
I had a closer look at the DO low. At the first byte of the CMD0
(0x40+0000) the card appears to send back "Illegal command".

I might be wrong and it might be irrelevant

Working through the numerous datasheets looking for references,
explanations and nuances

The power-up sequence says to keep "74 clocks" under 1ms, so
I shortened that time, which was nudging 1ms. I wonder though
if that makes a difference when the card is already powered up

Loading Image...

Also, whilst searching for sdhc 74 clocks, just to see if there were
any useful hints, Google and Firefox returned a result for the pics I
posted yesterday, and quite near the top. The really odd / creepy
thing is that the result includes a snippet of my dsPIC code, which
is definitely not in the simple HTML I wrote for that page

Loading Image...
Post by James Cameron
Post by IVP
I'm hoping the mysterious occassional click will get fixed too
I hope not. Can't stand not knowing the cause of problems. ;-)
Well, truth be told, me neither

Joe


-----
No virus found in this message.
Checked by AVG - www.avg.com
Version: 2015.0.5751 / Virus Database: 4315/9411 - Release Date: 03/29/15
--
http://www.piclist.com/techref/piclist PIC/SX FAQ & list archive
View/change your membership options at
http://mailman.mit.edu/mailman/listinfo/piclist
James Cameron
2015-03-30 04:11:39 UTC
Permalink
Post by IVP
I had a closer look at the DO low. At the first byte of the CMD0
(0x40+0000) the card appears to send back "Illegal command".
I might be wrong and it might be irrelevant
Working through the numerous datasheets looking for references,
explanations and nuances
The power-up sequence says to keep "74 clocks" under 1ms, so
I shortened that time, which was nudging 1ms. I wonder though
if that makes a difference when the card is already powered up
http://home.clear.net.nz/pages/joecolquitt/r1_04.gif
Oh, I thought I saw DO not going low in the failure case. Has this
changed?
Post by IVP
Also, whilst searching for sdhc 74 clocks, just to see if there were
any useful hints, Google and Firefox returned a result for the pics I
posted yesterday, and quite near the top. The really odd / creepy
thing is that the result includes a snippet of my dsPIC code, which
is definitely not in the simple HTML I wrote for that page
http://home.clear.net.nz/pages/joecolquitt/google.gif
Google relates the URL of the image to copies of mail that Gmail users
have received from PIClist, which have included snippets of your code.

The correlation is reasonable.

Your search results can also be biased by past searches, using cookies
kept by your browser. To get around that, do your search in Firefox
using File, New Private Window.
--
James Cameron
http://quozl.linux.org.au/
--
http://www.piclist.com/techref/piclist PIC/SX FAQ & list archive
View/change your membership options at
http://mailman.mit.edu/mailman/listinfo/piclist
IVP
2015-03-30 04:54:57 UTC
Permalink
Post by James Cameron
Post by IVP
http://home.clear.net.nz/pages/joecolquitt/r1_04.gif
Oh, I thought I saw DO not going low in the failure case. Has
this changed?
No. That capture is of a good read. I was going through various
ones to compare with the datasheet and noticed what seemed to
be a response
Post by James Cameron
Google relates the URL of the image to copies of mail
Your search results can also be biased by past searches
Ah

Joe


-----
No virus found in this message.
Checked by AVG - www.avg.com
Version: 2015.0.5751 / Virus Database: 4315/9411 - Release Date: 03/29/15
--
http://www.piclist.com/techref/piclist PIC/SX FAQ & list archive
View/change your membership options at
http://mailman.mit.edu/mailman/listinfo/piclist
John Ferrell
2015-03-30 18:36:06 UTC
Permalink
Nice picture Joe. How about telling us what you used to get it?

On 3/29/2015 11:42 PM, IVP wrote:
--
John Ferrell W8CCW
Julian NC 27283
It is better to walk alone,
than with a crowd going the wrong direction.
--Diane Grant
--
http://www.piclist.com/techref/piclist PIC/SX FAQ & list archive
View/change your membership options at
http://mailman.mit.edu/mailman/listinfo/piclist
IVP
2015-03-30 20:14:56 UTC
Permalink
Post by John Ferrell
Nice picture Joe. How about telling us what you used to get it?
It's two screen grabs of an Acute analyser. One med-res (20MHz
sampling) for background, with a section of a high-res (200MHz
sampling) pasted on top

An analyser sorts out no end of problems and insecurities

Joe


-----
No virus found in this message.
Checked by AVG - www.avg.com
Version: 2015.0.5751 / Virus Database: 4321/9417 - Release Date: 03/30/15
--
http://www.piclist.com/techref/piclist PIC/SX FAQ & list archive
View/change your membership options at
http://mailman.mit.edu/mailman/listinfo/piclist
IVP
2015-03-31 03:24:26 UTC
Permalink
Post by James Cameron
Okay, so code for missing DO low at the expected time.
Frustratingly, DO going low isn't as predictable as expected. Often/
usually it will go low after the 11th SPI write (ie the 0xFF with CS
low after the 10x8 clocks with CS high), but now I'm seeing it stay
high during that time and go low in the middle of the CMD0 bytes
following, and the file will play successfully. A file fail always happens
before this point so it's not looking possible to use DO as a reliable
indicator. If DO was reliable, then I could get the s/w to go back and
keep doing the 74 clocks again

Only a time-out trigger reliably detects inactivity
Post by James Cameron
Post by IVP
I'm hoping the mysterious occassional click will get fixed too
I hope not. Can't stand not knowing the cause of problems. ;-)
The click is getting worse, noticeably louder and noisier and more
frequent. 3 out of 4 plays are now bad. And it plays badly in 3
distinct ways too. There's a pop, a crunch and a knock. Not in a
regular order and mixed up with good plays

I'm thinking of capturing what's coming into the SPI from the card
and comparing it with the original file. I'll have to get that 512kB
SRAM board running first though

Since the first fail after 54 hours, the card is definitely getting
worse. After just a few minutes of running time since then, which
includes a lot of fails a few seconds apart that needed power
cycling, the fails are getting closer together. Several times on the
first access to the file.

The card always does the power-up initialisation though, and eg
file names are bit-bang transferred from its FAT to the 18F for
display

What I will try next is a block write after reading the FAT and
before accessing any files. Also read the card's registers so I've
something to compare with after the next format

Joe


-----
No virus found in this message.
Checked by AVG - www.avg.com
Version: 2015.0.5751 / Virus Database: 4321/9417 - Release Date: 03/30/15
--
http://www.piclist.com/techref/piclist PIC/SX FAQ & list archive
View/change your membership options at
http://mailman.mit.edu/mailman/listinfo/piclist
James Cameron
2015-03-31 05:39:42 UTC
Permalink
Post by IVP
Post by James Cameron
Okay, so code for missing DO low at the expected time.
Frustratingly, DO going low isn't as predictable as expected. Often/
usually it will go low after the 11th SPI write (ie the 0xFF with CS
low after the 10x8 clocks with CS high), but now I'm seeing it stay
high during that time and go low in the middle of the CMD0 bytes
following, and the file will play successfully. A file fail always happens
before this point so it's not looking possible to use DO as a reliable
indicator. If DO was reliable, then I could get the s/w to go back and
keep doing the 74 clocks again
Only a time-out trigger reliably detects inactivity
Post by James Cameron
Post by IVP
I'm hoping the mysterious occassional click will get fixed too
I hope not. Can't stand not knowing the cause of problems. ;-)
The click is getting worse, noticeably louder and noisier and more
frequent. 3 out of 4 plays are now bad. And it plays badly in 3
distinct ways too. There's a pop, a crunch and a knock. Not in a
regular order and mixed up with good plays
I'm thinking of capturing what's coming into the SPI from the card
and comparing it with the original file. I'll have to get that 512kB
SRAM board running first though
Since the first fail after 54 hours, the card is definitely getting
worse. After just a few minutes of running time since then, which
includes a lot of fails a few seconds apart that needed power
cycling, the fails are getting closer together. Several times on the
first access to the file.
An increasing failure rate is a symptom of electrostatic discharge
damage. If I recall correctly, the original ESD event can make a hole
in a gate that later changes behaviour as a result of thermal cycling.
Post by IVP
The card always does the power-up initialisation though, and eg
file names are bit-bang transferred from its FAT to the 18F for
display
What I will try next is a block write after reading the FAT and
before accessing any files. Also read the card's registers so I've
something to compare with after the next format
Joe
-----
No virus found in this message.
Checked by AVG - www.avg.com
Version: 2015.0.5751 / Virus Database: 4321/9417 - Release Date: 03/30/15
--
http://www.piclist.com/techref/piclist PIC/SX FAQ & list archive
View/change your membership options at
http://mailman.mit.edu/mailman/listinfo/piclist
--
James Cameron
http://quozl.linux.org.au/
--
http://www.piclist.com/techref/piclist PIC/SX FAQ & list archive
View/change your membership options at
http://mailman.mit.edu/mailman/listinfo/piclist
IVP
2015-04-01 03:17:20 UTC
Permalink
Post by James Cameron
An increasing failure rate is a symptom of electrostatic discharge
damage. If I recall correctly, the original ESD event can make a
hole in a gate that later changes behaviour as a result of thermal
cycling.
To re-cap

Card has two files, File1 and File2 which are played alternately
about a second apart. File1 plays perfectly, despite the failures
mentioned previously. File2 is deteriorating (not necessarily in
content but in what is presented to the dsPIC's DAC)

In today's episode of CSI:New Lynn

I hope you can follow this -

Instead of alternating I first tried continuous File1. Sounds OK
every time. Then continuous File2. Terribly noisy every time

Made a copy of Files 1 & 2, which are still on the card, and
accessed them as File3 and File4. This test was done before
modifying the PIC code to read the full 32-bit cluster address.

This meant that that File3 and File4 were actually played as
snippets of another file, because their true addresses are way
up at the 0x001d0000 cluster. Previously they had been at
0x00008500. The snippets played are from a file which also
has a 0x0000 high word. This file does not have any unwanted
noise when played.

The snippet played as File3 sounds OK, the snippet played as
File4 has a click in it. Not nearly as bad as the deteriorating
one but audible, and also consistent. Tried the continuous thing
as above, File3 always good, File4 always clicky

Modified the PIC code to read the full cluster address. File3 and
File4 are now played as they should be, which is copies of the
originals, from another part of the card.

Basically, original File2 played from 0x8500 is bad. The copy,
File4, played from 0x001d0000 seems OK

So far, no card failure (3 hours in), no noise, nothing unexpected

I wonder then if I just got "lucky" and File2 happened to be
sitting in a weak part of the card

I'll have to leave this running for a while, maybe a couple of days,
to see if anything does happen. Which I really really hope doesn't

Joe


-----
No virus found in this message.
Checked by AVG - www.avg.com
Version: 2015.0.5863 / Virus Database: 4321/9425 - Release Date: 03/31/15
--
http://www.piclist.com/techref/piclist PIC/SX FAQ & list archive
View/change your membership options at
http://mailman.mit.edu/mailman/listinfo/piclist
James Cameron
2015-04-01 06:15:00 UTC
Permalink
In the dsPIC, read the file and send it by serial port at 115200 baud
or so. Capture this on a bigger CPU. Compare the result byte by
byte, using tools for the job. (e.g. hexdump, diff).

This will at least prove whether the problem is bad data?
--
James Cameron
http://quozl.linux.org.au/
--
http://www.piclist.com/techref/piclist PIC/SX FAQ & list archive
View/change your membership options at
http://mailman.mit.edu/mailman/listinfo/piclist
IVP
2015-04-01 07:37:42 UTC
Permalink
Post by James Cameron
This will at least prove whether the problem is bad data?
Yeah, there's still work to do. Getting it going with files in a
different location is encouraging. Will be interesting to see
how long it lasts. In the meantime it gives me a breather to
get back to the PIC(s) side of the project(s)

Thanks for your help so far, very much appreciated

Joe


-----
No virus found in this message.
Checked by AVG - www.avg.com
Version: 2015.0.5863 / Virus Database: 4321/9425 - Release Date: 03/31/15
--
http://www.piclist.com/techref/piclist PIC/SX FAQ & list archive
View/change your membership options at
http://mailman.mit.edu/mailman/listinfo/piclist
James Cameron
2015-04-01 10:34:24 UTC
Permalink
Post by IVP
Post by James Cameron
This will at least prove whether the problem is bad data?
Yeah, there's still work to do. Getting it going with files in a
different location is encouraging. [...]
I guess I don't understand how a different location can have any
effect, given that the FTL maps that location to some other location
after every write, or read disturb re-write.
--
James Cameron
http://quozl.linux.org.au/
--
http://www.piclist.com/techref/piclist PIC/SX FAQ & list archive
View/change your membership options at
http://mailman.mit.edu/mailman/listinfo/piclist
Bob Ammerman
2015-04-01 10:52:34 UTC
Permalink
Post by IVP
Yeah, there's still work to do. Getting it going with files in a
different location is encouraging. [...]
Could it have something to do with the data's alignment within a write block
on the flash?

~ Bob Ammerman
RAm Systems
--
http://www.piclist.com/techref/piclist PIC/SX FAQ & list archive
View/change your membership options at
http://mailman.mit.edu/mailman/listinfo/piclist
James Cameron
2015-04-01 11:15:15 UTC
Permalink
Post by Bob Ammerman
Post by IVP
Yeah, there's still work to do. Getting it going with files in a
different location is encouraging. [...]
Could it have something to do with the data's alignment within a
write block on the flash?
Yes, good point.

Although alignment is almost entirely uncontrollable.

Erasing the device may be useful. In theory that may force all
alignments back to zero, because all blocks will have been released.
But it is difficult to be sure unless you have the source code for the
FTL, or have probes between the FTL and the array.

Erasing the device means sending the ERASE command sequence for the
whole block range of the device. CMD32 with the first block number.
CMD33 with the last block number. Then a CMD38 with wait for busy.

Erasing the device does NOT mean writing zero blocks, or reformatting
the filesystem. All that does is allocate blocks from pages in the
flash array.

Implementation references for Open Firmware:

http://dev.laptop.org/git/users/quozl/openfirmware.git/.git/tree/dev/mmc/sdhci/sdhci.fth#n1139

User guidance:

http://wiki.laptop.org/go/Firmware/Storage#How_to_quickly_erase_everything
--
James Cameron
http://quozl.linux.org.au/
--
http://www.piclist.com/techref/piclist PIC/SX FAQ & list archive
View/change your membership options at
http://mailman.mit.edu/mailman/listinfo/piclist
James Cameron
2015-03-17 19:01:46 UTC
Permalink
Post by IVP
It's most perplexing. I've looked around the web but not yet found
any mention of reads causing data degradation. If it doesn't then
it's hard to explain how the file can be read successfully thousands
of times and then start giving trouble.
On the other side of the FTL, in the managed flash cells, reads do
cause degradation, but the FTL is supposed to handle that as normal
operation of the firmware, and automatically recharge the cells.
Post by IVP
The seemingly random time to failure after each power-up is also
odd. You'd think that it would not work instantly rather than at any
time up to a few minutes.
All I can do for the time being is just let it run until it
fails. After the format/reload the file has been accessed almost
20,000 times so perhaps it'll fail again tomorrow.
Since there's an FTL, and the FTL CPU has some non-flash memory, it is
possible for the number of accesses by your firmware to be much
greater than the number of accesses to the flash cells. It may depend
also on the pattern of access.
Post by IVP
If/when it does I hope I'll learn something useful.
Do let us know. I agree with another post that reading the card with
a computer operating system may give more detail as to the stored
failure state.
--
James Cameron
http://quozl.linux.org.au/
--
http://www.piclist.com/techref/piclist PIC/SX FAQ & list archive
View/change your membership options at
http://mailman.mit.edu/mailman/listinfo/piclist
IVP
2015-03-18 00:21:43 UTC
Permalink
Post by James Cameron
On the other side of the FTL, in the managed flash cells, reads
do cause degradation, but the FTL is supposed to handle that
as normal operation of the firmware, and automatically recharge
the cells.
I found this reference to read errors

http://en.wikipedia.org/wiki/Flash_memory#Read_disturb

This card's controller doesn't seem to be doing its job if
errors are happening and not being corrected at a fraction
of the expected number of cycles

With bunniestudios expose in mind, can I trust any card ?

I don't want this problem to be apparently fixed by trying
other "reputable" cards only to re-appear several months
from now

Plan B I think it'll have to be. Copy some files to SRAM.

Fortunately I have a part-tray of 512k x 8 TSOP

Joe


-----
No virus found in this message.
Checked by AVG - www.avg.com
Version: 2015.0.5751 / Virus Database: 4306/9326 - Release Date: 03/17/15
--
http://www.piclist.com/techref/piclist PIC/SX FAQ & list archive
View/change your membership options at
http://mailman.mit.edu/mailman/listinfo/piclist
John Gardner
2015-03-18 00:43:28 UTC
Permalink
...With bunniestudios expose in mind, can I trust any card ?

My very thought.

My 128MB Sandisk cards seem to work flawlessly - Have'nt bought

one in years, though. New ones, who knows...

My current design uses a 128K Serial RAM array.
--
http://www.piclist.com/techref/piclist PIC/SX FAQ & list archive
View/change your membership options at
http://mailman.mit.edu/mailman/listinfo/piclist
IVP
2015-03-18 01:14:59 UTC
Permalink
Post by John Gardner
...With bunniestudios expose in mind, can I trust any card ?
My very thought.
It's not hard to find tales of woe concerning modern cards in
phones and cameras. I'm sure those devices have much better
procedures for handling errors that I have

If error detection and correction is behind the controller and
the controller isn't working properly, can I even do anything ?
Post by John Gardner
From the wiki link
"To avoid the read disturb problem the flash controller will
typically count the total number of reads to a block since the
last erase. When the count exceeds a target limit the affected
block is copied over to a new block".........

So, if the error occurs before the target limit, then what ? How
does the controller know that the block is "affected"

If the block read returns a Data Error Token, then surely that
means the data is corrupted, hasn't been fixed and is therefore
lost ?

Talk about chasing rabbits down holes

Joe


-----
No virus found in this message.
Checked by AVG - www.avg.com
Version: 2015.0.5751 / Virus Database: 4306/9326 - Release Date: 03/17/15
--
http://www.piclist.com/techref/piclist PIC/SX FAQ & list archive
View/change your membership options at
http://mailman.mit.edu/mailman/listinfo/piclist
James Cameron
2015-02-17 22:43:43 UTC
Permalink
Post by IVP
Post by Harrison Cooper
let the controller refresh the charge
Thanks, I didn't know that
When you say "refresh the charge" do you mean just power it
up or actually re-write all the data ?
The flash cells store a charge.

Powering the device won't necessarily refresh the charge, it depends
on the firmware. Without seeing the firmware, you can't be sure,
unless you probe the interface between the controller and the flash
chip.

However, reading the data must refresh the charge, because reading
without refreshing is destructive to the charge.

Writing will erase cells and set the charge anew. The controller will
not necessarily use the same cells that held the same disk blocks;
there's a performance advantage in using a queue of pre-erased flash
blocks.

(My comments above are mainly for high density flash, in the several
gigabit range. Very low density, as used in SPI flash chips attached
to microcontrollers or microprocessors for firmware or configuration
state, have somewhat different problems. ;-) With much clearer
datasheets.)
Post by IVP
That would be a pain and more hassle than copying a DVD every few
years
You get other problems with DVDs.

My first worry is whether the drives will continue to work, and if
they'll continue to be available for purchase. I'm already unable to
use particularly old CD-R media, against new drives, as the standards
and capabilities have changed.

For the moment, I use hard drives, and replace them every year or two.

I don't expect them to last, either.

The other trick is to divide things between archive and ephemeral, and
don't let the archive data set grow too much.

The next Carrington Event will be interesting.
--
James Cameron
http://quozl.linux.org.au/
--
http://www.piclist.com/techref/piclist PIC/SX FAQ & list archive
View/change your membership options at
http://mailman.mit.edu/mailman/listinfo/piclist
James Cameron
2015-03-18 02:21:50 UTC
Permalink
Summary: how to reduce risk of failure, how to assess ESD consequence.
Post by IVP
I found this reference to read errors
http://en.wikipedia.org/wiki/Flash_memory#Read_disturb
This card's controller doesn't seem to be doing its job if
errors are happening and not being corrected at a fraction
of the expected number of cycles
Some FTLs may use ECC or other forms of redundant data, but as the
FTLs are closely guarded secret code, we don't know.
Post by IVP
With bunniestudios expose in mind, can I trust any card ?
Yes, after testing.

Batching, yield measurement, warranty agreements with manufacturer;
all these things are useful to some extent, but eventually it boils
down to either you test them or they test them.
Post by IVP
Post by James Cameron
I'd also consider ESD to the card ... not as a solution, but
as a source of problems.
I think the card is fairly well connected in situ. All data and
comms lines have the recommended pull-ups, nothing is floating,
good power supply etc
Consider ESD in handling, between the time the card was fabricated,
and the time you inserted it.

ESD isn't an always an instantaneous failure; it can do damage that
develops over time, such that a part will work fine for months before
failing.

But you're dealing with a single part, which is kinda outside my
experience; what matters are the statistics for the batch, and you can
never tell where a single part sits on the distribution.
--
James Cameron
http://quozl.linux.org.au/
--
http://www.piclist.com/techref/piclist PIC/SX FAQ & list archive
View/change your membership options at
http://mailman.mit.edu/mailman/listinfo/piclist
IVP
2015-03-18 03:17:50 UTC
Permalink
Post by James Cameron
Post by IVP
With bunniestudios expose in mind, can I trust any card ?
Yes, after testing.
Thanks for the observations and comments

As far as this particular card (brand and size) goes, I guess
I can say how much I trust it and whether it's suitable for the
application

You can understand why I'd be reluctant to start what
could probably be weeks and weeks of testing only to
find that other cards may be better but still ultimately
unsuitable.

At this point I believe a re-think about the memory to use
is going to be more productive. A card is convenient due
to its capacity of course but I'll look at something else to
achieve most of what I wanted

Joe


-----
No virus found in this message.
Checked by AVG - www.avg.com
Version: 2015.0.5751 / Virus Database: 4306/9326 - Release Date: 03/17/15
--
http://www.piclist.com/techref/piclist PIC/SX FAQ & list archive
View/change your membership options at
http://mailman.mit.edu/mailman/listinfo/piclist
James Cameron
2015-03-18 03:41:54 UTC
Permalink
Post by IVP
Post by James Cameron
Post by IVP
With bunniestudios expose in mind, can I trust any card ?
Yes, after testing.
Thanks for the observations and comments
As far as this particular card (brand and size) goes, I guess
I can say how much I trust it and whether it's suitable for the
application
Heh.
Post by IVP
You can understand why I'd be reluctant to start what
could probably be weeks and weeks of testing only to
find that other cards may be better but still ultimately
unsuitable.
Yes, of course. You have to balance the effort of testing against the
benefit. For me, testing a small batch of cards that are a
representative sample of a very large batch, for the purposes of
qualifying for mass production ... was both worth it, and a far cry
from your application. ;-)
Post by IVP
At this point I believe a re-think about the memory to use
is going to be more productive. A card is convenient due
to its capacity of course but I'll look at something else to
achieve most of what I wanted
No worries. Not trying to constrain your design choices, just sharing
my limited knowledge.
--
James Cameron
http://quozl.linux.org.au/
--
http://www.piclist.com/techref/piclist PIC/SX FAQ & list archive
View/change your membership options at
http://mailman.mit.edu/mailman/listinfo/piclist
IVP
2015-03-18 04:31:12 UTC
Permalink
Post by James Cameron
Not trying to constrain your design choices, just sharing
my limited knowledge.
Thanks again

I hadn't realised there were so many issues, technical and
commercial. I suppose I should be grateful that I wasn't
embarrassed by product coming back

SDHC seems to have at least a whiff of the dodgy battery
market

Joe


-----
No virus found in this message.
Checked by AVG - www.avg.com
Version: 2015.0.5751 / Virus Database: 4306/9326 - Release Date: 03/17/15
--
http://www.piclist.com/techref/piclist PIC/SX FAQ & list archive
View/change your membership options at
http://mailman.mit.edu/mailman/listinfo/piclist
Loading...