Symptoms
Application logs showing
double register of key xxx
ERROR: invalid page in block x of relation pg_tblspc/x/x/x/x
ERROR: missing chunk number * for toast value * in pg_toast_*
checksum mismatch: disk has 0, should be xxx
DETAIL could not open file “$psql/pg_clog/####”: No such file or directory
cp: cannot stat `pg_xlog/
Error “MultiXactId has not been created yet — apparent wraparound”
Purpose
Help engineers resolve corruption cases. We’d like if engineers could attach the SR’s to these cases and also record the errors messages in a SAP for the case.
Cause
Storage issue. APD. Disk full.
Resolution
Double register of key
This indicates that a primary key violation has been inserted into a table.
export pbin=`ps -ea -o cmd | grep ‘[p]ostgres -D’ | awk ‘{ gsub (“postgres$”,””,$1); print $1}’`
export pdata=`ps -ea -o cmd | grep ‘[p]ostgres -D’ | awk ‘{print $3}’`
$pbin/reindexdb -d VCDB -U postgres
- If you get permission problems running the re-index see, permission problems on executing postgres binaries(2147283)
- For windows open the command prompt to where postgres binaries are.For example:C:\Program Files\VMware\vCenter Server\vPostgres\bin>reindexdb -U postgres -d VCDBNote: This will error out on a table if duplicate entries exist for a Primary Key.
- Login to the database../psql -U secureall -d secureall
./psql -U vc -d VCDB
C:\Program Files\VMware\vCenter Server\vPostgres\bin>psql -U postgres -d VCDB - Try to re index the suspect table to confirm:REINDEX TABLE vpx_resource_pool; error rebuilding pk_resources
- Find the duplicate entry.select Column1, Column2, count(*)
from yourTable
group by Column1, Column2
HAVING count(*) > 1Note: Column1,Column2 — are the columns in the primary key. - To find columns in the primary key.\dt PK_NAME shows the column(s) in the PKFor example:VCDB=> \d “pk_vpx_resource_pool”
Index “vc.pk_vpx_resource_pool”
Column | Type | Definition
——–+———+————
id | integer | id
primary key, btree, for table “vc.vpx_resource_pool”
select id, count(*)
from vpx_resource_pool
group by idHAVING count(*) > 1;
- The next step is get the unique ID of the 2 same rowsselect ctid, id
from vpx_resource_pool
where id = ‘id returned above’; - Now delete the most recent entry, The largest ctid.delete from vpx_resource_pool
where ctid in (select max(ctid) from
vpx_resource_pool where id=’id returned above’); - Once this is removed you should be able to reindex the table.reindex table vpx_resource_pool;
ERROR: invalid page in block x of relation pg_tblspc/x/x/x/x
It’s very important to take a snapshot of your appliance before executing this step.
./psql -U secureall -d secureall — see KB 2147249 for help logging in
./psql -U vc -d VCDB
SET zero_damaged_pages = on;
vacuum full verbose;
ERROR: invalid page in block x of index xxxx
For example: ERROR: invalid page in block 0 of index vpx_text_array_m2
./psql -U vc -d VCDB — see KB 2147249 for help logging in.
reindex index vpx_text_array_m2;
ERROR: missing chunk number * for toast value * in pg_toast_* or ERROR: unexpected chunk number 0 (expected 1) for toast
The log may have indicated which toast value is corrupt. or toast value 92347756 in pg_toast_29042;
You need to remove the record with the corrupt toast.
- Login to the database../psql -U secureall -d secureall
./psql -U vc -d VCDB - To identify the table with the corrupt rows.select 29042::regclass; –29042 is the appendix after the pg_toast
- Another way to identify the corruption is by doing a vacuum fullsecureall=# vacuum full verbose; — This will throw out any corruption.
- Once you find hte table you have to do selects on the table to find which row is corrupt and remove it.
- To find the row do some selects.For Example:The corruption is on vpx_vm,VCDB=> select count(*) from vpx_vm;
count
——-
37
VCDB=> select * from vpx_vm where id < 10; No problem rows returned. VCDB=> select * from vpx_vm where id < 20; Error – unexpected chunk number — corruption between 10 and 20 VCDB=> select * from vpx_vm where id < 15; Error – unexpected chunk number — corruption between 10 and 15 VCDB=> select * from vpx_vm where id =12;
Error – unexpected chunk number — This is the corrupt vm. - It’s not safe just to delete one row as there will be dependencies in the application.
- Look for other KB’s to see if this is safe.
- In the example above, see [Internal] Manually removing a virtual machine from the vCenter Server SQL database (2128217).
Checksum mismatch: disk has 0, should be xxx
- This will require you to shutdown the application. For more information on the values on variables inside <> see the ‘How to login’ sectionexport pbin=`ps -ea -o cmd | grep ‘[p]ostgres -D’ | awk ‘{ gsub (“postgres$”,””,$1); print $1}’`
export pdata=`ps -ea -o cmd | grep ‘[p]ostgres -D’ | awk ‘{print $3}’`$pbin/pg_ctl stop -m smart -D $pdataNote: If you get permission p problems running the pg_ctlsu – vpostgres -c “$pbin/pg_ctl stop -m smart -D $pdata”
- For further information, see permission problems executing postgres binaries (2147283)pg_ctl stop -m smart -D $pdata
- Then fix the corruption.postgres –single -D $pdata -c block_path“For example:su – postgres –c “$pbin/postgres –single -D $pdata -c fix_block_checksum=”1663/1636/1694/0/978″”
- Start up the database againpg_ctl start -D $pdataFor example:su – secureall –c “/opt/vmware/vpostgres/9.1/bin/pg_ctl start -D /common/db/vshield”
su – vpostgres –c “$pbin/pg_ctl start -D $pdata”For further information on permissions, see permission problems executing postgres binaries (2147283).
cp: cannot stat `pg_xlog/
export pbin=`ps -ea -o cmd | grep ‘[p]ostgres -D’ | awk ‘{ gsub (“postgres$”,””,$1); print $1}’`
export pdata=`ps -ea -o cmd | grep ‘[p]ostgres -D’ | awk ‘{print $3}’`
Shutdown postgres.
su –secureall -c “$pbin/pg_resetxlog $pdata”
Windows
Find the postgres executables, search for psql.
For Virtual Center this C:\Program Files\VMware\vCenter Server\vPostgres\bin>
Open a command prompt to this directory
pg_resetxlog.exe c:\programdata\vmware\vCenterServer\data\vpostgres\
For further information on permissions, see permission problems executing postgres binaries (2147283).
Error “MultiXactId has not been created yet — apparent wraparound”
- Login to the database, see Instructions for logging into postgres when its embedded in applications or apppliances. Internal (2147249)./psql -U secureall -d secureall
./psql -U vc -d VCDB - For the table the application is complaining about.SELECT count(*) from table for update;”
followed by “VACUUM FREEZE table;”For example:SET LOCAL lock_timeout = ‘4s’;
SELECT count(*) from vpx_host for update;”
“VACUUM FREEZE vpx_host;”